-
Notifications
You must be signed in to change notification settings - Fork 264
feat: pass ignore_nulls flag to first and last #1866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: pass ignore_nulls flag to first and last #1866
Conversation
2b122ed to
692332f
Compare
|
The linked PR did not add a test. Would you be able to? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1866 +/- ##
============================================
+ Coverage 56.12% 59.00% +2.88%
- Complexity 976 1141 +165
============================================
Files 119 130 +11
Lines 11743 12827 +1084
Branches 2251 2412 +161
============================================
+ Hits 6591 7569 +978
- Misses 4012 4035 +23
- Partials 1140 1223 +83 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
All the tests for first and last are disabled currently: |
Yes, we should figure out a test approach for these non-deterministic functions before we start making changes to the implementation, |
|
@andygrove perhaps we can merge this while we wait for the tests to be made more accurate? |
|
I will create a PR today to add correctness tests for first/last. We can then rebase this PR and make sure that there are no regressions. |
|
@rluvaton Here is a test that fails in main and passes with your changes in this PR. Could you add this to test("first/last with ignore null") {
val data = Range(0, 8192).flatMap(n => Seq((n, 1), (n, 2))).toDF("a", "b")
withTempDir { dir =>
val filename = s"${dir.getAbsolutePath}/first_last_ignore_null.parquet"
data.write.parquet(filename)
withSQLConf(CometConf.COMET_BATCH_SIZE.key -> "100") {
spark.read.parquet(filename).createOrReplaceTempView("t1")
for (expr <- Seq("first", "last")) {
// deterministic query that should return one non-null value per group
val df = spark.sql(s"SELECT a, $expr(IF(b==1,null,b)) IGNORE NULLS FROM t1 GROUP BY a ORDER BY a")
checkSparkAnswerAndOperator(df)
}
}
}
} |
|
sorry, missed your comment, added |
parthchandra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rluvaton
Which issue does this PR close?
N/A
Rationale for this change
Actually use
ignore_nullsthat was used in:
ignoreNullsflag infirst_valueandlast_value#1626What changes are included in this PR?
forward ignore_nulls in first and last and no longer mark as unsupported
How are these changes tested?
this have the same problem as #1626, all the tests are disabled for first and last