-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fix(python): Guard against dictionaries being passed to projection keywords #22928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #22928 +/- ##
==========================================
- Coverage 80.99% 80.64% -0.36%
==========================================
Files 1674 1677 +3
Lines 236899 222202 -14697
Branches 2787 2801 +14
==========================================
- Hits 191882 179196 -12686
+ Misses 44350 42338 -2012
- Partials 667 668 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ble is not a dictionary like object
py-polars/polars/lazyframe/frame.py
Outdated
@@ -5851,6 +5851,16 @@ def with_columns( | |||
│ 4 ┆ 13.0 ┆ {1,3.0} │ | |||
└─────┴──────┴─────────────┘ | |||
""" | |||
# Ensures that the outermost element cannot be a Dictionary (as an iterable) | |||
if len(exprs) == 1 and isinstance(exprs[0], Mapping): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guard position should be at the generic location. I believe that is in _parse_inputs_as_iterable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move it to the generic location _parse_inputs_as_iterable
- then we move the radius of the guard to impact more syntax than just with_columns
.
For example, moving it will cause a test failure of test_projection_join_names_9955
where a select
statement takes in a dict
equivalent:
def test_projection_join_names_9955() -> None:
batting = pl.LazyFrame(
{
"playerID": ["abercda01"],
"yearID": [1871],
"lgID": ["NA"],
}
)
awards_players = pl.LazyFrame(
{
"playerID": ["bondto01"],
"yearID": [1877],
"lgID": ["NL"],
}
)
right = awards_players.filter(pl.col("lgID") == "NL").select("playerID")
q = batting.join(
right,
left_on=[pl.col("playerID")],
right_on=[pl.col("playerID")],
how="inner",
)
q = q.select(batting.collect_schema())
assert q.collect().schema == {
"playerID": pl.String,
"yearID": pl.Int64,
"lgID": pl.String,
}
Which is why i'm hesitant to move it inwards to the generic, because that theoretically is a breaking change, instead of leaving it at with_columns
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ritchie46 As an alternative, if I was to move the guard down to the generic function - I can do the following:
- Update the above failing test to just make it work in the new
_parse_inputs_as_iterable
-q = q.select(batting.collect_schema())
+q = q.select(*batting.collect_schema().keys())
- Update the message so it's relevant for all
projection
keywords, not justwith_columns
msg = (
"Cannot pass a dictionary as a single positional argument.\n"
"If you merely want the *keys*, use:\n"
" • df.method(*your_dict.keys())\n"
"If you need the key–value pairs, use one of:\n"
" • unpack as keywords: df.method(**your_dict)\n"
" • build expressions: df.method(expr.alias(k) for k, expr in your_dict.items())"
)
- Change the PR scope / title to point out it impacts all such column
projection
keywords, not justwith_columns
Let me know if that sounds reasonable!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's good. In select
we also should not accept dictionaries.
Looks great. Thank you! |
with_columns
projection
keywords
projection
keywords
Fixes #17879
Rationale
This enables dictionary information (which is a raw python
iterable
type under the hood) to not be passed to columnprojection
operators likeselect
andwith_columns
.By guarding at the python level, this prevents it ever hitting the query planner and doing further work.
Information in the format of a dictionary should be explicitly stated to meet the API requirements.
Changes
_parse_inputs_as_iterable
method with an instructional error messageTests
test_df.py
(if there's a better place to put these let me know)make tests
test suite passesmake pre-commit
run before PR