Skip to content

feat: bucketed scan for native_datafusion Parquet scan #1719

Open
@mbutrovich

Description

@mbutrovich

What is the problem the feature request solves?

The native_datafusion Parquet scan does not support bucketed scan, and fails most of the tests in Spark's BucketedReadSuite without a fallback. With a bucketed scan, some partitions end up without a file to read so their PartitionedFile is empty.

Describe the potential solution

I don't think DataSourceExec will take no file at construction. We might need to replace that node with a different no-op node that generates an empty data set with the correct schema in the case of a bucketed scan when a partition has no corresponding file.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions