Skip to content

Commit 08b234c

Browse files
Merge branch 'main' into to_iceberg
2 parents 1290931 + c708e15 commit 08b234c

35 files changed

+590
-245
lines changed

doc/source/development/contributing_environment.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@ This option allows you to configure where meson stores your built C extensions,
251251
Sometimes, it might be useful to compile pandas with debugging symbols, when debugging C extensions.
252252
Appending ``-Csetup-args="-Ddebug=true"`` will do the trick.
253253

254-
With pip, it is possible to chain together multiple config settings (for example specifying both a build directory
254+
With pip, it is possible to chain together multiple config settings. For example, specifying both a build directory
255255
and building with debug symbols would look like
256256
``-Cbuilddir="your builddir here" -Csetup-args="-Dbuildtype=debug"``.
257257

doc/source/user_guide/indexing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -325,7 +325,7 @@ The ``.loc`` attribute is the primary access method. The following are valid inp
325325

326326
* A single label, e.g. ``5`` or ``'a'`` (Note that ``5`` is interpreted as a *label* of the index. This use is **not** an integer position along the index.).
327327
* A list or array of labels ``['a', 'b', 'c']``.
328-
* A slice object with labels ``'a':'f'`` (Note that contrary to usual Python
328+
* A slice object with labels ``'a':'f'``. Note that contrary to usual Python
329329
slices, **both** the start and the stop are included, when present in the
330330
index! See :ref:`Slicing with labels <indexing.slicing_with_labels>`.
331331
* A boolean array.

doc/source/user_guide/io.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1415,7 +1415,7 @@ of multi-columns indices.
14151415
14161416
.. note::
14171417
If an ``index_col`` is not specified (e.g. you don't have an index, or wrote it
1418-
with ``df.to_csv(..., index=False)``, then any ``names`` on the columns index will
1418+
with ``df.to_csv(..., index=False)``), then any ``names`` on the columns index will
14191419
be *lost*.
14201420

14211421
.. ipython:: python

doc/source/user_guide/timeseries.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2458,7 +2458,7 @@ you can use the ``tz_convert`` method.
24582458

24592459
For ``pytz`` time zones, it is incorrect to pass a time zone object directly into
24602460
the ``datetime.datetime`` constructor
2461-
(e.g., ``datetime.datetime(2011, 1, 1, tzinfo=pytz.timezone('US/Eastern'))``.
2461+
(e.g., ``datetime.datetime(2011, 1, 1, tzinfo=pytz.timezone('US/Eastern'))``).
24622462
Instead, the datetime needs to be localized using the ``localize`` method
24632463
on the ``pytz`` time zone object.
24642464

doc/source/user_guide/user_defined_functions.rst

Lines changed: 211 additions & 97 deletions
Large diffs are not rendered by default.

doc/source/whatsnew/v0.11.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ See the section :ref:`Selection by Position <indexing.integer>` for substitutes.
7070
Dtypes
7171
~~~~~~
7272

73-
Numeric dtypes will propagate and can coexist in DataFrames. If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``, or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste.
73+
Numeric dtypes will propagate and can coexist in DataFrames. If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``, or a passed ``Series``), then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste.
7474

7575
.. ipython:: python
7676

doc/source/whatsnew/v0.12.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,7 @@ IO enhancements
245245
format. (:issue:`3571`, :issue:`1651`, :issue:`3141`)
246246

247247
- If an ``index_col`` is not specified (e.g. you don't have an index, or wrote it
248-
with ``df.to_csv(..., index=False``), then any ``names`` on the columns index will
248+
with ``df.to_csv(..., index=False)``), then any ``names`` on the columns index will
249249
be *lost*.
250250

251251
.. ipython:: python

doc/source/whatsnew/v0.16.1.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -353,7 +353,7 @@ Deprecations
353353
Index representation
354354
~~~~~~~~~~~~~~~~~~~~
355355

356-
The string representation of ``Index`` and its sub-classes have now been unified. These will show a single-line display if there are few values; a wrapped multi-line display for a lot of values (but less than ``display.max_seq_items``; if lots of items (> ``display.max_seq_items``) will show a truncated display (the head and tail of the data). The formatting for ``MultiIndex`` is unchanged (a multi-line wrapped display). The display width responds to the option ``display.max_seq_items``, which is defaulted to 100. (:issue:`6482`)
356+
The string representation of ``Index`` and its sub-classes have now been unified. These will show a single-line display if there are few values; a wrapped multi-line display for a lot of values (but less than ``display.max_seq_items``); if lots of items (> ``display.max_seq_items``) will show a truncated display (the head and tail of the data). The formatting for ``MultiIndex`` is unchanged (a multi-line wrapped display). The display width responds to the option ``display.max_seq_items``, which is defaulted to 100. (:issue:`6482`)
357357

358358
Previous behavior
359359

doc/source/whatsnew/v0.19.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1547,7 +1547,7 @@ Bug fixes
15471547
- Bug in checking for any null objects in a ``TimedeltaIndex``, which always returned ``True`` (:issue:`13603`)
15481548
- Bug in ``Series`` arithmetic raises ``TypeError`` if it contains datetime-like as ``object`` dtype (:issue:`13043`)
15491549
- Bug ``Series.isnull()`` and ``Series.notnull()`` ignore ``Period('NaT')`` (:issue:`13737`)
1550-
- Bug ``Series.fillna()`` and ``Series.dropna()`` don't affect to ``Period('NaT')`` (:issue:`13737`
1550+
- Bug ``Series.fillna()`` and ``Series.dropna()`` don't affect to ``Period('NaT')`` (:issue:`13737`)
15511551
- Bug in ``.fillna(value=np.nan)`` incorrectly raises ``KeyError`` on a ``category`` dtyped ``Series`` (:issue:`14021`)
15521552
- Bug in extension dtype creation where the created types were not is/identical (:issue:`13285`)
15531553
- Bug in ``.resample(..)`` where incorrect warnings were triggered by IPython introspection (:issue:`13618`)

doc/source/whatsnew/v0.24.0.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1019,7 +1019,7 @@ operations has been changed to match the arithmetic operations in these cases.
10191019
The affected cases are:
10201020

10211021
- operating against a 2-dimensional ``np.ndarray`` with either 1 row or 1 column will now broadcast the same way a ``np.ndarray`` would (:issue:`23000`).
1022-
- a list or tuple with length matching the number of rows in the :class:`DataFrame` will now raise ``ValueError`` instead of operating column-by-column (:issue:`22880`.
1022+
- a list or tuple with length matching the number of rows in the :class:`DataFrame` will now raise ``ValueError`` instead of operating column-by-column (:issue:`22880`).
10231023
- a list or tuple with length matching the number of columns in the :class:`DataFrame` will now operate row-by-row instead of raising ``ValueError`` (:issue:`22880`).
10241024

10251025
.. ipython:: python
@@ -1556,7 +1556,7 @@ Performance improvements
15561556
(i.e. ``x in cat``-style checks are much faster). :meth:`CategoricalIndex.contains`
15571557
is likewise much faster (:issue:`21369`, :issue:`21508`)
15581558
- Improved performance of :meth:`HDFStore.groups` (and dependent functions like
1559-
:meth:`HDFStore.keys`. (i.e. ``x in store`` checks are much faster)
1559+
:meth:`HDFStore.keys` (i.e. ``x in store`` checks) are much faster)
15601560
(:issue:`21372`)
15611561
- Improved the performance of :func:`pandas.get_dummies` with ``sparse=True`` (:issue:`21997`)
15621562
- Improved performance of :func:`IndexEngine.get_indexer_non_unique` for sorted, non-unique indexes (:issue:`9466`)

doc/source/whatsnew/v1.2.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -793,7 +793,7 @@ Groupby/resample/rolling
793793
- Bug in :meth:`DataFrame.resample` that would throw a ``ValueError`` when resampling from ``"D"`` to ``"24H"`` over a transition into daylight savings time (DST) (:issue:`35219`)
794794
- Bug when combining methods :meth:`DataFrame.groupby` with :meth:`DataFrame.resample` and :meth:`DataFrame.interpolate` raising a ``TypeError`` (:issue:`35325`)
795795
- Bug in :meth:`.DataFrameGroupBy.apply` where a non-nuisance grouping column would be dropped from the output columns if another groupby method was called before ``.apply`` (:issue:`34656`)
796-
- Bug when subsetting columns on a :class:`.DataFrameGroupBy` (e.g. ``df.groupby('a')[['b']])``) would reset the attributes ``axis``, ``dropna``, ``group_keys``, ``level``, ``mutated``, ``sort``, and ``squeeze`` to their default values (:issue:`9959`)
796+
- Bug when subsetting columns on a :class:`.DataFrameGroupBy` (e.g. ``df.groupby('a')[['b']]``) would reset the attributes ``axis``, ``dropna``, ``group_keys``, ``level``, ``mutated``, ``sort``, and ``squeeze`` to their default values (:issue:`9959`)
797797
- Bug in :meth:`.DataFrameGroupBy.tshift` failing to raise ``ValueError`` when a frequency cannot be inferred for the index of a group (:issue:`35937`)
798798
- Bug in :meth:`DataFrame.groupby` does not always maintain column index name for ``any``, ``all``, ``bfill``, ``ffill``, ``shift`` (:issue:`29764`)
799799
- Bug in :meth:`.DataFrameGroupBy.apply` raising error with ``np.nan`` group(s) when ``dropna=False`` (:issue:`35889`)

doc/source/whatsnew/v1.4.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -666,7 +666,7 @@ be removed in a future version. Use :func:`pandas.concat` instead (:issue:`35407
666666

667667
.. code-block:: ipython
668668
669-
In [1]: pd.Series([1, 2]).append(pd.Series([3, 4])
669+
In [1]: pd.Series([1, 2]).append(pd.Series([3, 4]))
670670
Out [1]:
671671
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
672672
0 1

doc/source/whatsnew/v1.5.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,7 @@ and attributes without holding entire tree in memory (:issue:`45442`).
287287
288288
In [1]: df = pd.read_xml(
289289
... "/path/to/downloaded/enwikisource-latest-pages-articles.xml",
290-
... iterparse = {"page": ["title", "ns", "id"]})
290+
... iterparse = {"page": ["title", "ns", "id"]}
291291
... )
292292
df
293293
Out[2]:

doc/source/whatsnew/v2.2.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -826,7 +826,7 @@ Strings
826826
- Bug in :meth:`Index.str.cat` always casting result to object dtype (:issue:`56157`)
827827
- Bug in :meth:`Series.__mul__` for :class:`ArrowDtype` with ``pyarrow.string`` dtype and ``string[pyarrow]`` for the pyarrow backend (:issue:`51970`)
828828
- Bug in :meth:`Series.str.find` when ``start < 0`` for :class:`ArrowDtype` with ``pyarrow.string`` (:issue:`56411`)
829-
- Bug in :meth:`Series.str.fullmatch` when ``dtype=pandas.ArrowDtype(pyarrow.string()))`` allows partial matches when regex ends in literal //$ (:issue:`56652`)
829+
- Bug in :meth:`Series.str.fullmatch` when ``dtype=pandas.ArrowDtype(pyarrow.string())`` allows partial matches when regex ends in literal //$ (:issue:`56652`)
830830
- Bug in :meth:`Series.str.replace` when ``n < 0`` for :class:`ArrowDtype` with ``pyarrow.string`` (:issue:`56404`)
831831
- Bug in :meth:`Series.str.startswith` and :meth:`Series.str.endswith` with arguments of type ``tuple[str, ...]`` for :class:`ArrowDtype` with ``pyarrow.string`` dtype (:issue:`56579`)
832832
- Bug in :meth:`Series.str.startswith` and :meth:`Series.str.endswith` with arguments of type ``tuple[str, ...]`` for ``string[pyarrow]`` (:issue:`54942`)

doc/source/whatsnew/v3.0.0.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ Other enhancements
5252
- :class:`Rolling` and :class:`Expanding` now support ``pipe`` method (:issue:`57076`)
5353
- :class:`Series` now supports the Arrow PyCapsule Interface for export (:issue:`59518`)
5454
- :func:`DataFrame.to_excel` argument ``merge_cells`` now accepts a value of ``"columns"`` to only merge :class:`MultiIndex` column header header cells (:issue:`35384`)
55+
- :func:`set_option` now accepts a dictionary of options, simplifying configuration of multiple settings at once (:issue:`61093`)
5556
- :meth:`DataFrame.corrwith` now accepts ``min_periods`` as optional arguments, as in :meth:`DataFrame.corr` and :meth:`Series.corr` (:issue:`9490`)
5657
- :meth:`DataFrame.cummin`, :meth:`DataFrame.cummax`, :meth:`DataFrame.cumprod` and :meth:`DataFrame.cumsum` methods now have a ``numeric_only`` parameter (:issue:`53072`)
5758
- :meth:`DataFrame.ewm` now allows ``adjust=False`` when ``times`` is provided (:issue:`54328`)
@@ -73,6 +74,7 @@ Other enhancements
7374
- :meth:`DataFrameGroupBy.transform`, :meth:`SeriesGroupBy.transform`, :meth:`DataFrameGroupBy.agg`, :meth:`SeriesGroupBy.agg`, :meth:`RollingGroupby.apply`, :meth:`ExpandingGroupby.apply`, :meth:`Rolling.apply`, :meth:`Expanding.apply`, :meth:`DataFrame.apply` with ``engine="numba"`` now supports positional arguments passed as kwargs (:issue:`58995`)
7475
- :meth:`Rolling.agg`, :meth:`Expanding.agg` and :meth:`ExponentialMovingWindow.agg` now accept :class:`NamedAgg` aggregations through ``**kwargs`` (:issue:`28333`)
7576
- :meth:`Series.map` can now accept kwargs to pass on to func (:issue:`59814`)
77+
- :meth:`Series.map` now accepts an ``engine`` parameter to allow execution with a third-party execution engine (:issue:`61125`)
7678
- :meth:`Series.str.get_dummies` now accepts a ``dtype`` parameter to specify the dtype of the resulting DataFrame (:issue:`47872`)
7779
- :meth:`pandas.concat` will raise a ``ValueError`` when ``ignore_index=True`` and ``keys`` is not ``None`` (:issue:`59274`)
7880
- :py:class:`frozenset` elements in pandas objects are now natively printed (:issue:`60690`)
@@ -705,7 +707,7 @@ Timedelta
705707

706708
Timezones
707709
^^^^^^^^^
708-
-
710+
- Bug in :meth:`DatetimeIndex.union`, :meth:`DatetimeIndex.intersection`, and :meth:`DatetimeIndex.symmetric_difference` changing timezone to UTC when merging two DatetimeIndex objects with the same timezone but different units (:issue:`60080`)
709711
-
710712

711713
Numeric
@@ -789,6 +791,7 @@ I/O
789791
- Bug in :meth:`read_stata` where extreme value integers were incorrectly interpreted as missing for format versions 111 and prior (:issue:`58130`)
790792
- Bug in :meth:`read_stata` where the missing code for double was not recognised for format versions 105 and prior (:issue:`58149`)
791793
- Bug in :meth:`set_option` where setting the pandas option ``display.html.use_mathjax`` to ``False`` has no effect (:issue:`59884`)
794+
- Bug in :meth:`to_csv` where ``quotechar``` is not escaped when ``escapechar`` is not None (:issue:`61407`)
792795
- Bug in :meth:`to_excel` where :class:`MultiIndex` columns would be merged to a single row when ``merge_cells=False`` is passed (:issue:`60274`)
793796

794797
Period
@@ -822,6 +825,7 @@ Groupby/resample/rolling
822825
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` were not keeping the index name when the index had :class:`ArrowDtype` timestamp dtype (:issue:`61222`)
823826
- Bug in :meth:`DataFrame.resample` changing index type to :class:`MultiIndex` when the dataframe is empty and using an upsample method (:issue:`55572`)
824827
- Bug in :meth:`DataFrameGroupBy.agg` that raises ``AttributeError`` when there is dictionary input and duplicated columns, instead of returning a DataFrame with the aggregation of all duplicate columns. (:issue:`55041`)
828+
- Bug in :meth:`DataFrameGroupBy.agg` where applying a user-defined function to an empty DataFrame returned a Series instead of an empty DataFrame. (:issue:`61503`)
825829
- Bug in :meth:`DataFrameGroupBy.apply` and :meth:`SeriesGroupBy.apply` for empty data frame with ``group_keys=False`` still creating output index using group keys. (:issue:`60471`)
826830
- Bug in :meth:`DataFrameGroupBy.apply` that was returning a completely empty DataFrame when all return values of ``func`` were ``None`` instead of returning an empty DataFrame with the original columns and dtypes. (:issue:`57775`)
827831
- Bug in :meth:`DataFrameGroupBy.apply` with ``as_index=False`` that was returning :class:`MultiIndex` instead of returning :class:`Index`. (:issue:`58291`)
@@ -847,6 +851,7 @@ Reshaping
847851
- Bug in :meth:`DataFrame.stack` with the new implementation where ``ValueError`` is raised when ``level=[]`` (:issue:`60740`)
848852
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
849853
- Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`)
854+
- Bug in :func:`melt` where calling with duplicate column names in ``id_vars`` raised a misleading ``AttributeError`` (:issue:`61475`)
850855

851856
Sparse
852857
^^^^^^

pandas/_config/config.py

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -199,9 +199,9 @@ def set_option(*args) -> None:
199199
200200
Parameters
201201
----------
202-
*args : str | object
203-
Arguments provided in pairs, which will be interpreted as (pattern, value)
204-
pairs.
202+
*args : str | object | dict
203+
Arguments provided in pairs, which will be interpreted as (pattern, value),
204+
or as a single dictionary containing multiple option-value pairs.
205205
pattern: str
206206
Regexp which should match a single option
207207
value: object
@@ -239,6 +239,8 @@ def set_option(*args) -> None:
239239
240240
Examples
241241
--------
242+
Option-Value Pair Input:
243+
242244
>>> pd.set_option("display.max_columns", 4)
243245
>>> df = pd.DataFrame([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
244246
>>> df
@@ -247,8 +249,23 @@ def set_option(*args) -> None:
247249
1 6 7 ... 9 10
248250
[2 rows x 5 columns]
249251
>>> pd.reset_option("display.max_columns")
252+
253+
Dictionary Input:
254+
255+
>>> pd.set_option({"display.max_columns": 4, "display.precision": 1})
256+
>>> df = pd.DataFrame([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
257+
>>> df
258+
0 1 ... 3 4
259+
0 1 2 ... 4 5
260+
1 6 7 ... 9 10
261+
[2 rows x 5 columns]
262+
>>> pd.reset_option("display.max_columns")
263+
>>> pd.reset_option("display.precision")
250264
"""
251-
# must at least 1 arg deal with constraints later
265+
# Handle dictionary input
266+
if len(args) == 1 and isinstance(args[0], dict):
267+
args = tuple(kv for item in args[0].items() for kv in item)
268+
252269
nargs = len(args)
253270
if not nargs or nargs % 2 != 0:
254271
raise ValueError("Must provide an even number of non-keyword arguments")
@@ -440,9 +457,10 @@ def option_context(*args) -> Generator[None]:
440457
441458
Parameters
442459
----------
443-
*args : str | object
460+
*args : str | object | dict
444461
An even amount of arguments provided in pairs which will be
445-
interpreted as (pattern, value) pairs.
462+
interpreted as (pattern, value) pairs. Alternatively, a single
463+
dictionary of {pattern: value} may be provided.
446464
447465
Returns
448466
-------
@@ -471,7 +489,12 @@ def option_context(*args) -> Generator[None]:
471489
>>> from pandas import option_context
472490
>>> with option_context("display.max_rows", 10, "display.max_columns", 5):
473491
... pass
492+
>>> with option_context({"display.max_rows": 10, "display.max_columns": 5}):
493+
... pass
474494
"""
495+
if len(args) == 1 and isinstance(args[0], dict):
496+
args = tuple(kv for item in args[0].items() for kv in item)
497+
475498
if len(args) % 2 != 0 or len(args) < 2:
476499
raise ValueError(
477500
"Provide an even amount of arguments as "

pandas/core/frame.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -337,15 +337,15 @@
337337
to SQL left anti join; preserve key order.
338338
* right_anti: use only keys from right frame that are not in left frame, similar
339339
to SQL right anti join; preserve key order.
340-
on : label or list
340+
on : Hashable or a sequence of the previous
341341
Column or index level names to join on. These must be found in both
342342
DataFrames. If `on` is None and not merging on indexes then this defaults
343343
to the intersection of the columns in both DataFrames.
344-
left_on : label or list, or array-like
344+
left_on : Hashable or a sequence of the previous, or array-like
345345
Column or index level names to join on in the left DataFrame. Can also
346346
be an array or list of arrays of the length of the left DataFrame.
347347
These arrays are treated as if they are columns.
348-
right_on : label or list, or array-like
348+
right_on : Hashable or a sequence of the previous, or array-like
349349
Column or index level names to join on in the right DataFrame. Can also
350350
be an array or list of arrays of the length of the right DataFrame.
351351
These arrays are treated as if they are columns.
@@ -7443,7 +7443,7 @@ def value_counts(
74437443

74447444
Parameters
74457445
----------
7446-
subset : label or list of labels, optional
7446+
subset : Hashable or a sequence of the previous, optional
74477447
Columns to use when counting unique combinations.
74487448
normalize : bool, default False
74497449
Return proportions rather than frequencies.
@@ -7594,7 +7594,7 @@ def nlargest(
75947594
----------
75957595
n : int
75967596
Number of rows to return.
7597-
columns : label or list of labels
7597+
columns : Hashable or a sequence of the previous
75987598
Column label(s) to order by.
75997599
keep : {'first', 'last', 'all'}, default 'first'
76007600
Where there are duplicate values:

0 commit comments

Comments
 (0)