Skip to content

Reinterpret a binary column as a fixed shape array #22126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adamreeve opened this issue Apr 4, 2025 · 9 comments · May be fixed by #22840
Open

Reinterpret a binary column as a fixed shape array #22126

adamreeve opened this issue Apr 4, 2025 · 9 comments · May be fixed by #22840
Labels
enhancement New feature or an improvement of an existing feature

Comments

@adamreeve
Copy link
Contributor

Description

You can use bin.reinterpret to reinterpret a binary column as another type, but this is limited to scalar numeric types. It would be useful to extend this to fixed size arrays too.

For example, this currently fails:

import numpy as np
import polars as pl

df = pl.DataFrame({
    'x': [
        np.array([0.0, 1.0, 2.0, 3.0], dtype=np.float32).tobytes(),
        np.array([4.0, 5.0, 6.0, 7.0], dtype=np.float32).tobytes(),
        np.array([8.0, 9.0, 10.0, 11.0], dtype=np.float32).tobytes(),
        np.array([12.0, 13.0, 14.0, 15.0], dtype=np.float32).tobytes(),
    ],
})

reinterpreted = df.select(
        pl.col('x').bin.reinterpret(dtype=pl.Array(pl.Float32, width=4)))

The error raised is:

polars.exceptions.InvalidOperationError: unsupported data type in from_buffer. Only numerical types are allowed.
@adamreeve adamreeve added the enhancement New feature or an improvement of an existing feature label Apr 4, 2025
@adamreeve
Copy link
Contributor Author

adamreeve commented Apr 4, 2025

I thought I might be able to work around this with some casting and reshaping, but although you can cast from Binary to List[u8], you can't cast back from a List[u8] to Binary:

df.select(
    pl.col('x')
        .cast(pl.List(pl.UInt8))
        .list.to_array(16)
        .reshape((-1, 4))
        .arr.to_list()
        .cast(pl.Binary())
        .bin.reinterpret(dtype=pl.Float32)
        .reshape((-1, 4))
)
polars.exceptions.InvalidOperationError: cannot cast List type (inner: 'UInt8', to: 'Binary')

@coastalwhite
Copy link
Collaborator

The List(u8) -> Binary cast was mentioned before in #21549.

@adamreeve
Copy link
Contributor Author

I guess that it might be more natural to support reinterpreting as a list, as the binary type is variable sized, and then you can use list.to_array if you need/want an array and the binary values are all the same width.

@ritchie46
Copy link
Member

I think we should support List<u8> -> Binary . Let's start with that.

@adamreeve
Copy link
Contributor Author

I missed that there was already a PR open to implement this (the bin.reinterpret part, not List<u8> to Binary casting): #20456

@ritchie46
Copy link
Member

It seems stale.

@itamarst
Copy link
Contributor

itamarst commented May 5, 2025

As mentioned in #21549 I'm working on List<u8> -> Binary.

@adamreeve
Copy link
Contributor Author

The workaround in #22126 (comment) now works after #22611 was merged (thanks Itamar), although it's a bit convoluted and supporting this directly with bin.reinterpret somehow would make sense to me.

@itamarst
Copy link
Contributor

itamarst commented May 9, 2025

Next I will take a look at #20456 and see if I can finish it up.

@itamarst itamarst linked a pull request May 20, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants