Skip to content

Conversation

@Athsus
Copy link
Contributor

@Athsus Athsus commented May 23, 2025

What’s Changed

Improved cum_min / cum_max for floating-point numbers, and NaN handling.

This update refines the

  1. Value init when doing cum_max_numeric and cum_min_numeric.
    It now checks the type: if it's a float, the starting value for the calculation is NaN. Otherwise, it uses the standard min/max possible value for that type.

  2. Correct NaN handling in det_min and det_max
    Now it uses MinMax::min_ignore_nan and MinMax::max_ignore_nan to compare the values. For example, if a series starts with NaN, the result will also start with NaN, as expected.

  3. New cum_scan_numeric function.
    A new helper function to handle the common scanning logic.

Added Test

def test_cum_agg_with_infs() -> None:
    # confirm that inf values are handled correctly
    s = pl.Series([float("inf"), 0.0, 1.0])
    assert_series_equal(s.cum_min(), pl.Series([float("inf"), 0.0, 0.0]))

    s = pl.Series([float("-inf"), 0.0, 1.0])
    assert_series_equal(s.cum_max(), pl.Series([float("-inf"), 0.0, 1.0]))

Would close the issue #22855

@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars labels May 23, 2025
@codecov
Copy link

codecov bot commented May 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.31%. Comparing base (f230f12) to head (cc01fb0).
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #22896      +/-   ##
==========================================
- Coverage   80.61%   80.31%   -0.31%     
==========================================
  Files        1677     1682       +5     
  Lines      222297   223181     +884     
  Branches     2801     2804       +3     
==========================================
+ Hits       179205   179248      +43     
- Misses      42425    43264     +839     
- Partials      667      669       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Athsus Athsus marked this pull request as ready for review May 24, 2025 12:32
@Athsus
Copy link
Contributor Author

Athsus commented May 24, 2025

Hi @orlp , please review and suggest if needed anything.

@orlp orlp self-assigned this May 26, 2025
@orlp
Copy link
Member

orlp commented May 26, 2025

You can save a lot of repetition by defining

fn cum_scan_numeric<T, F>(ca: &ChunkedArray<T>, reverse: bool, init: T::Native, update: F) -> ChunkedArray<T>
where
    T: PolarsNumericType,
    ChunkedArray<T>: FromIterator<Option<T::Native>>,
    F: Fn(&mut T::Native, Option<T::Native>) -> Option<Option<T::Native>>
{
    let out: ChunkedArray<T> = match reverse {
        false => ca.iter().scan(init, update).collect_trusted(),
        true => ca.iter().rev().scan(init, update).collect_reversed(),
    };
    out.with_name(ca.name().clone())
}

and then using

let init = if <$T>::is_float() { <$T>::nan_value() } else { <$T>::min_value() };
cum_scan_numeric(ca, reverse, init, det_max)

and similarly for the minimum in the dispatch.

Note that your current init value for floats is incorrect, it needs to be NaN (as a test case try adding a series with NaN as the first element).

Copy link
Member

@orlp orlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@Athsus
Copy link
Contributor Author

Athsus commented May 27, 2025

You can save a lot of repetition by defining

fn cum_scan_numeric<T, F>(ca: &ChunkedArray<T>, reverse: bool, init: T::Native, update: F) -> ChunkedArray<T>
where
    T: PolarsNumericType,
    ChunkedArray<T>: FromIterator<Option<T::Native>>,
    F: Fn(&mut T::Native, Option<T::Native>) -> Option<Option<T::Native>>
{
    let out: ChunkedArray<T> = match reverse {
        false => ca.iter().scan(init, update).collect_trusted(),
        true => ca.iter().rev().scan(init, update).collect_reversed(),
    };
    out.with_name(ca.name().clone())
}

and then using

let init = if <$T>::is_float() { <$T>::nan_value() } else { <$T>::min_value() };
cum_scan_numeric(ca, reverse, init, det_max)

and similarly for the minimum in the dispatch.

Note that your current init value for floats is incorrect, it needs to be NaN (as a test case try adding a series with NaN as the first element).

Thanks! That's very helpful. I've made some new changes on it. @orlp

@Athsus
Copy link
Contributor Author

Athsus commented May 27, 2025

Wait, I missed implementing cum_scan_numeric to cum_sum_numeric and cum_prod_numeric

@Athsus
Copy link
Contributor Author

Athsus commented May 27, 2025

Hi @orlp , I've done new commits on it.
Though the CI fails, I think it's not because of my modifications. As I don't have the permission to re-run it, could you do the re-run if it's available and necessary?

@Athsus Athsus requested a review from orlp May 27, 2025 11:05
@orlp orlp merged commit 0e0ab52 into pola-rs:main May 27, 2025
28 checks passed
@orlp
Copy link
Member

orlp commented May 27, 2025

Thank you for helping out :)

@Athsus
Copy link
Contributor Author

Athsus commented May 28, 2025

Thank you for helping out :)

All good! Appreciations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Bug fix rust Related to Rust Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants