feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` #1747

andygrove · 2025-05-17T17:31:37Z

Which issue does this PR close?

N/A

Follows on from #1746

Rationale for this change

Rather than require the user to manually configure the scan implementation, let's add an "auto" option so that we can just pick the best one based on whether the schema is supported.

What changes are included in this PR?

Add the new option "auto". The default value is not changed yet because that would involve rewriting many tests.

How are these changes tested?

For now, there is a manual workflow where we can run the Spark SQL tests with auto enabled.

codecov-commenter · 2025-05-17T18:34:17Z

Codecov Report

Attention: Patch coverage is 16.66667% with 20 lines in your changes missing coverage. Please review.

Project coverage is 59.33%. Comparing base (f09f8af) to head (972d446).
Report is 235 commits behind head on main.

Files with missing lines	Patch %	Lines
...n/scala/org/apache/comet/rules/CometScanRule.scala	4.76%	17 Missing and 3 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1747      +/-   ##
============================================
+ Coverage     56.12%   59.33%   +3.21%     
- Complexity      976     1151     +175     
============================================
  Files           119      130      +11     
  Lines         11743    12680     +937     
  Branches       2251     2380     +129     
============================================
+ Hits           6591     7524     +933     
+ Misses         4012     3945      -67     
- Partials       1140     1211      +71

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

parthchandra · 2025-05-19T17:21:58Z

Looking good so far. Will do a final review once it is ready.

andygrove · 2025-05-20T18:47:55Z

docs/source/user-guide/compatibility.md

-# Compatibility Guide
-
-Comet aims to provide consistent results with the version of Apache Spark that is being used.
-
-This guide offers information about areas of functionality where there are known differences.
-


This section appeared twice

andygrove · 2025-05-20T18:50:11Z

@parthchandra @mbutrovich This is ready for review now. I don't know if we want to keep in draft until more complete or merge and iterate. I also did not make auto the default yet.

parthchandra · 2025-05-21T23:45:19Z

spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala

+          if (COMET_EXEC_ENABLED
+              .get() && schemaSupported && partitionSchemaSupported &&
+            !scanExec.bucketedScan && !knownIssues) {
+            scanImpl = SCAN_NATIVE_ICEBERG_COMPAT


native_iceberg_compat should be able to handle bucketed scans

thanks, I will update this

parthchandra · 2025-05-21T23:46:03Z

spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala

+        }
+
+        if (scanImpl == SCAN_AUTO) {
+          scanImpl = SCAN_NATIVE_COMET


We would never choose native_datafusion?

I figured that we have more Spark SQL tests passing with native_iceberg_compat so we should start with that.

parthchandra · 2025-05-21T23:47:11Z

Not sure why this would cause the ci failures that we see here. Maybe defer this until some more of the known issues are fixed?

andygrove · 2025-05-29T19:26:45Z

Not sure why this would cause the ci failures that we see here. Maybe defer this until some more of the known issues are fixed?

Some of the tests need updating because they make assumptions based on the configured default scan and have not been updated to handle the new "auto" option.

andygrove · 2025-06-03T15:19:13Z

@parthchandra @mbutrovich Could I get a review? I changed the scope to adding the "auto" option without changing the default. There is a manual workflow where we can run the Spark SQL tests using the new auto mode to see which tests fail (if any).

andygrove added 29 commits May 16, 2025 07:40

Move some logic into scan execs

e47fc0a

improve type checking for sinks

7a50855

move usingDataSourceExecWithIncompatTypes* to CometTestBase

2a78694

scalastyle

8592b8f

scalastyle

52875ff

scalastyle

f5fd69b

fix regression

0d11599

add shuffle fuzz test:

4d5c4b0

scalastyle

332be9b

scalastyle

ada181a

oops

e8b95d4

fix?

e3c5b09

remove some config uses and add TODO comments for others

0920360

fix?

ddcee36

scalastyle

1a106dc

fix?

3634fc1

Merge branch 'scan-refactor-2' into scan-refactor-3

13eea7a

improve

e5e89f3

improve

a6fd752

fix

6e32787

fix

773743e

refactor

ead2362

improve

d963d31

improve

0f49dc0

fix

f20f846

update diffs

ac3c99e

update diffs

deccbfe

add auto scan impl mode

64876cd

fix

9bf4aeb

andygrove added 6 commits May 17, 2025 12:35

Merge branch 'scan-refactor-3' into scan-auto-mode

728d8bd

update test

8868276

update test

82a930c

scalastyle

e6ae57b

fix miri

86351d1

Merge branch 'scan-refactor-3' into scan-auto-mode

5638dda

andygrove added 3 commits May 20, 2025 12:26

upmerge

28f13ea

update docs

be4bac2

update docs

580fdeb

andygrove commented May 20, 2025

View reviewed changes

andygrove marked this pull request as ready for review May 20, 2025 18:48

andygrove marked this pull request as draft May 20, 2025 18:50

andygrove added 3 commits May 20, 2025 14:41

add CI workflow

1dd4f08

experimenting

77e7e39

scalastyle

987608f

parthchandra reviewed May 21, 2025

View reviewed changes

upmerge

fbde112

andygrove added 3 commits June 2, 2025 10:41

revert change to default value

6e2ac70

upmerge

e765956

fix

5719a38

andygrove changed the title ~~feat: Add auto mode for COMET_PARQUET_SCAN_IMPL~~ feat: Add experimental auto mode for COMET_PARQUET_SCAN_IMPL Jun 2, 2025

andygrove marked this pull request as ready for review June 3, 2025 15:17

address feedback

972d446

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` #1747

feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` #1747

Uh oh!

andygrove commented May 17, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented May 17, 2025 •

edited

Loading

Uh oh!

parthchandra commented May 19, 2025

Uh oh!

andygrove May 20, 2025

Uh oh!

andygrove commented May 20, 2025

Uh oh!

parthchandra May 21, 2025

Uh oh!

andygrove Jun 3, 2025

Uh oh!

parthchandra May 21, 2025

Uh oh!

andygrove Jun 3, 2025

Uh oh!

parthchandra commented May 21, 2025

Uh oh!

andygrove commented May 29, 2025

Uh oh!

andygrove commented Jun 3, 2025

Uh oh!

Uh oh!

feat: Add experimental auto mode for COMET_PARQUET_SCAN_IMPL #1747

Are you sure you want to change the base?

feat: Add experimental auto mode for COMET_PARQUET_SCAN_IMPL #1747

Uh oh!

Conversation

andygrove commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

parthchandra commented May 19, 2025

Uh oh!

andygrove May 20, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove commented May 20, 2025

Uh oh!

parthchandra May 21, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra May 21, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra commented May 21, 2025

Uh oh!

andygrove commented May 29, 2025

Uh oh!

andygrove commented Jun 3, 2025

Uh oh!

Uh oh!

feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` #1747

feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` #1747

andygrove commented May 17, 2025 •

edited

Loading

codecov-commenter commented May 17, 2025 •

edited

Loading