Skip to content

Conversation

@AKHIL-149
Copy link
Contributor

Summary

fixes #63314 - pivot_table creating duplicate indices on python 3.14 with numpy 1.26

tracked down the actual bug. wasn't in compress_group_index like i thought - it's numpy's searchsorted that's broken with this version combo.

What was happening

  • unstack uses searchsorted to build the compressor array
  • with py3.14 + numpy 1.26, searchsorted returns duplicate values instead of unique positions
  • this causes multiple different index values to map to the same output row

The fix

fallback to the np.unique approach when on python 3.14 + numpy < 2.0. this is the same method the non-sorted path already uses, so it's tested.

Testing

tested with the reproduction case from the issue (100k rows, 3 metrics). works correctly now.

WillAyd and others added 30 commits October 9, 2024 20:09
…maybe_converts_object`) if requested (pandas-dev#59487)

* String dtype: maybe_converts_object give precedence to nullable dtype

* update datetimelike input validation

* update tests and remove xfails

* explicitly test pd.array() behaviour (remove xfail)

* fixup allow_2d

* undo changes related to datetimelike input validation

* fix test for str on current main

---------

Co-authored-by: Matthew Roeschke <[email protected]>
… Python :: 3.13 added to pyproject.toml) (pandas-dev#60012)

Backport PR pandas-dev#59985: Programming Language :: Python :: 3.13 added to pyproject.toml

Co-authored-by: LOCHAN PAUDEL <[email protected]>
* REF: avoid copy in StringArray factorize

* mypy fixup

* un-xfail
* DOC: Add whatsnew for 2.3.0

* fix duplicate label
* BUG (string): str.replace with negative n

* update GH ref
* TST (string) fix xfailed groupby tests (3)

* TST: non-pyarrow build
meeseeksmachine and others added 25 commits September 13, 2025 08:08
… JSON datetime serialization) (pandas-dev#62253)

Co-authored-by: Álvaro Kothe <[email protected]>
…lmatch for Arrow backend with optional groups) (pandas-dev#62401)

Co-authored-by: ptth222 <[email protected]>
…ct_dtypes(include=object) selecting string columns) (pandas-dev#62400)

Co-authored-by: Joris Van den Bossche <[email protected]>
…els workflow (pandas-dev#61669) (pandas-dev#61718) (pandas-dev#62395)

Co-authored-by: Evgenii Mosikhin <[email protected]>
Co-authored-by: Evgenii Mosikhin <[email protected]>
Co-authored-by: Laurie O <[email protected]>
…case to 3.0 string migration guide) (pandas-dev#62413)

Co-authored-by: Joris Van den Bossche <[email protected]>
…ch for Arrow backend with optional groups) (pandas-dev#62412)

Co-authored-by: Joris Van den Bossche <[email protected]>
…n 3.14 support in pyproject.toml and release notes) (pandas-dev#62415)

Co-authored-by: Joris Van den Bossche <[email protected]>
found the real issue - searchsorted is broken with python 3.14 + numpy 1.26. it's not compress_group_index, it's the compressor calculation in unstack that uses searchsorted.

just fallback to the unique/return_index approach for this combo, same as what the non-sorted path does.

works with 100k rows now.
@AKHIL-149
Copy link
Contributor Author

closing - will reopen with correct base branch

@AKHIL-149 AKHIL-149 closed this Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: large pivot_table has incorrect output with Python 3.14