Skip to content

fix: Handle file:// URIs in scans by adding PythonScanSourceInput::Uri variant #22767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Matt711
Copy link
Contributor

@Matt711 Matt711 commented May 15, 2025

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels May 15, 2025
@@ -28,13 +28,20 @@ use crate::{PyDataFrame, PyExpr, PyLazyGroupBy};
fn pyobject_to_first_path_and_scan_sources(
obj: PyObject,
) -> PyResult<(Option<PathBuf>, ScanSources)> {
use crate::file::{PythonScanSourceInput, get_python_scan_source_input};
use crate::file::{PythonScanSourceInput, get_python_scan_source_input, parse_file_uri};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try a simpler fix where we strip the file: prefix in this function if we see it?

@nameexhaustion
Copy link
Collaborator

I also just checked, we have similar issues with any cloud path -

q = pl.scan_parquet("s3://bucket/file")
print(q._ldf.visit().view_current_node().paths)
[PosixPath('s3:/bucket/file')]

A proper fix should probably be done in a generic manner over any scheme:// URI. But if you currently only need to handle file:// URIs, I would suggest going for the approach I mentioned above on stripping the prefix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants