Skip to content

[SPARK-52335][CONNET][SQL] Unify the 'invalid bucket count' error for both Connect and Classic #51039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

heyihong
Copy link
Contributor

@heyihong heyihong commented May 28, 2025

What changes were proposed in this pull request?

This PR unifies the error handling for invalid bucket count validation between Spark Connect and Classic Spark. The main changes are:

  1. Updated the error message in error-conditions.json for INVALID_BUCKET_COUNT to be more descriptive and consistent
  2. Removed the legacy error condition _LEGACY_ERROR_TEMP_1083 since its functionality is now merged into INVALID_BUCKET_COUNT
  3. Removed the InvalidCommandInput class and its usage in Connect since we're now using the standard AnalysisException with INVALID_BUCKET_COUNT error condition
  4. Updated the bucket count validation in SparkConnectPlanner to rely on the standard error handling path
  5. Updated the test case in SparkConnectProtoSuite to verify the new unified error handling

The key improvement is that both Connect and Classic now use the same error condition and message format for invalid bucket count errors, making the error handling more consistent across Spark's different interfaces. The error message now includes both the maximum allowed bucket count and the invalid value received, providing better guidance to users.

This change simplifies the error handling codebase by removing duplicate error definitions and standardizing on a single error condition for this validation case.

Why are the changes needed?

The changes are needed to:

  1. Provide consistent error messages across Spark Connect and Classic interfaces
  2. Simplify error handling by removing duplicate error definitions
  3. Improve error message clarity by including the maximum allowed bucket count in the error message
  4. Maintain better code maintainability by reducing code duplication in error handling

The unified error message now clearly indicates both the requirement (bucket count > 0) and the upper limit (≤ bucketing.maxBuckets), making it more helpful for users to understand and fix the issue.

Does this PR introduce any user-facing change?

No

How was this patch tested?

build/sbt "connect/testOnly *SparkConnectProtoSuite"

Was this patch authored or co-authored using generative AI tooling?

No

@heyihong heyihong force-pushed the SPARK-52335 branch 2 times, most recently from 49ee744 to d51b7dc Compare May 28, 2025 14:26
@xinrong-meng
Copy link
Member

LGTM thank you!

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants