Skip to content

with_columns(dict) silently ignores dictionary values #17879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
gab23r opened this issue Jul 25, 2024 · 11 comments · May be fixed by #22928
Open
2 tasks done

with_columns(dict) silently ignores dictionary values #17879

gab23r opened this issue Jul 25, 2024 · 11 comments · May be fixed by #22928
Labels
A-input-parsing Area: parsing input arguments accepted Ready for implementation bug Something isn't working good first issue Good for newcomers P-low Priority: low python Related to Python Polars

Comments

@gab23r
Copy link
Contributor

gab23r commented Jul 25, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame({"a": 1})
# shape: (1, 1)
# ┌─────┐
# │ a   │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# └─────┘

cols_to_double = ["a"]

df.with_columns(
    {
        col: pl.col(col) * 2 for col in cols_to_double
    }
)

# shape: (1, 1)
# ┌─────┐
# │ a   │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# └─────┘

Log output

No response

Issue description

The code I wrote is not correct I forgot the ** to unpack the dictionary. This exemple is just to show how this bug (?) can happen and can be hard to pinpoint.

Expected behavior

Nevertheless, I think it should raise here. it raises in this case for exemple df.with_columns({"not_exist": 1})

Installed versions

--------Version info---------
Polars:               1.2.1
Index type:           UInt32
Platform:             macOS-14.1.1-x86_64-i386-64bit
Python:               3.10.11 (v3.10.11:7d4cc5aa85, Apr  4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                <not installed>
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>```

</details>
@gab23r gab23r added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jul 25, 2024
@gab23r gab23r changed the title with_columns(dict) fail silently if keys exists as column with_columns(dict) fail silently if key exists as column Jul 25, 2024
@ritchie46
Copy link
Member

I don't understand this issue? Can you explain more elaborately?

@cmdlineluser
Copy link
Contributor

It seems the issue is that Polars will "iterate" a dict, so you get just the .keys()

So the the examples effectively become:

pl.DataFrame({"a": 1}).with_columns(["a"])

pl.DataFrame({"a": 1}).with_columns(["not_exist"])
# ColumnNotFoundError: not_exist

@gab23r
Copy link
Contributor Author

gab23r commented Jul 26, 2024

The issue is the behavior of with_columns if you give a dictionary as input .
As @cmdlineluser explain, it only reads the keys of the dictionary and ignore the values. It is not really intuitive and error prone. I do not really see the purpose of the behavior. I would prefer that a dict is not a valid input of with_columns.

@mcrumiller
Copy link
Contributor

This is just a side-effect of a dictionary being an Iterable, I don't believe using a dictionary is intended.

>>> list({1: 2, 3: 4})
[1, 3]

@mcrumiller
Copy link
Contributor

mcrumiller commented Jul 26, 2024

I think we can just fix this by requiring a Sequence instead of an Iterable, no?

Edit: I don't think Sequence would work, we need generators to work here.

@kcajf
Copy link

kcajf commented Nov 27, 2024

Just ran into this as well. It would be great for this to error rather than fail silently.

@mcrumiller
Copy link
Contributor

I no longer think that this should raise. This is just user error, and the fact that python yields the keys when a dictionary is iterated is simply something to watch out for.

@kcajf
Copy link

kcajf commented Nov 27, 2024

Oh, I hadn't realised that with_columns could accept a list / iterable at all, with names being taken from the expressions (so .with_columns(["x"]) is valid). I had thought that with_columns required a name -> value mapping.

@nameexhaustion nameexhaustion changed the title with_columns(dict) fail silently if key exists as column with_columns(dict) silently ignores dictionary values Nov 28, 2024
@nameexhaustion nameexhaustion added good first issue Good for newcomers accepted Ready for implementation P-low Priority: low A-input-parsing Area: parsing input arguments and removed needs triage Awaiting prioritization by a maintainer labels Nov 28, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Nov 28, 2024
@nameexhaustion
Copy link
Collaborator

We should just raise an error when a dictionary input is given

It's possible to correctly use a dictionary for with_columns() either of these syntaxes would work -

>>> data = {"a": pl.col("a") * 2}
>>> pl.select(a=1).with_columns(**data)
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i32 │
╞═════╡
│ 2   │
└─────┘
>>> pl.select(a=1).with_columns(expr.alias(name) for name, expr in data.items())
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ i32 │
╞═════╡
│ 2   │
└─────┘

@Biswas-N
Copy link
Contributor

Biswas-N commented Mar 2, 2025

Hi team, can I take up this issue and come up with a fix for it?

@nameexhaustion
Copy link
Collaborator

Hi team, can I take up this issue and come up with a fix for it?

Feel free to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-input-parsing Area: parsing input arguments accepted Ready for implementation bug Something isn't working good first issue Good for newcomers P-low Priority: low python Related to Python Polars
Projects
Status: Ready
Development

Successfully merging a pull request may close this issue.

7 participants