-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
DOC: Improve documentation for DataFrame.__setitem__ and .loc assignment from Series #61804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
c4e1c18
e1a893d
cfa767f
699a9db
be86001
f792b39
0d938a0
ed3b173
eb9db3c
626f0ae
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1732,3 +1732,53 @@ Why does assignment fail when using chained indexing? | |
This means that chained indexing will never work. | ||
See :ref:`this section <copy_on_write_chained_assignment>` | ||
for more context. | ||
|
||
.. _indexing.series_assignment: | ||
|
||
Series Assignment and Index Alignment | ||
------------------------------------- | ||
|
||
When assigning a Series to a DataFrame column, pandas performs automatic alignment | ||
based on index labels. This is a fundamental behavior that can be surprising to | ||
new users who might expect positional assignment. | ||
|
||
Key Points: | ||
~~~~~~~~~~~ | ||
|
||
* Series values are matched to DataFrame rows by index label | ||
* Position/order in the Series doesn't matter | ||
* Missing index labels result in NaN values | ||
* This behavior is consistent across df[col] = series and df.loc[:, col] = series | ||
|
||
Examples: | ||
.. ipython:: python | ||
|
||
import pandas as pd | ||
|
||
# Create a DataFrame | ||
df = pd.DataFrame({'values': [1, 2, 3]}, index=['x', 'y', 'z']) | ||
|
||
# Series with matching indices (different order) | ||
s1 = pd.Series([10, 20, 30], index=['z', 'x', 'y']) | ||
df['aligned'] = s1 # Aligns by index, not position | ||
print(df) | ||
|
||
# Series with partial index match | ||
s2 = pd.Series([100, 200], index=['x', 'z']) | ||
df['partial'] = s2 # Missing 'y' gets NaN | ||
print(df) | ||
|
||
# Series with non-matching indices | ||
s3 = pd.Series([1000, 2000], index=['a', 'b']) | ||
df['nomatch'] = s3 # All values become NaN | ||
print(df) | ||
|
||
|
||
#Avoiding Confusion: | ||
#If you want positional assignment instead of index alignment: | ||
# Convert Series to array/list for positional assignment | ||
|
||
df['positional'] = s1.values # or s1.tolist() | ||
|
||
# Or reset the Series index to match DataFrame index | ||
df['reset_index'] = s1.reindex(df.index) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit but I think naming the column |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4214,6 +4214,78 @@ def isetitem(self, loc, value) -> None: | |
self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs) | ||
|
||
def __setitem__(self, key, value) -> None: | ||
""" | ||
Set item(s) in DataFrame by key. | ||
|
||
This method allows you to set the values of one or more columns in the | ||
DataFrame using a key. The key can be a single column label, a list of | ||
labels, or a boolean array. If the key does not exist, a new | ||
column will be created. | ||
|
||
Parameters | ||
---------- | ||
key : str, list of str, or tuple | ||
Column label(s) to set. Can be a single column name, list of column names, | ||
or tuple for MultiIndex columns. | ||
value : scalar, array-like, Series, or DataFrame | ||
Value(s) to set for the specified key(s). | ||
|
||
Returns | ||
------- | ||
None | ||
This method does not return a value. | ||
|
||
See Also | ||
-------- | ||
DataFrame.loc : Access and set values by label-based indexing. | ||
DataFrame.iloc : Access and set values by position-based indexing. | ||
DataFrame.assign : Assign new columns to a DataFrame. | ||
|
||
Notes | ||
----- | ||
When assigning a Series to a DataFrame column, pandas aligns the Series | ||
by index labels, not by position. This means: | ||
|
||
* Values from the Series are matched to DataFrame rows by index label | ||
* If a Series index label doesn't exist in the DataFrame index, it's ignored | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't follow the difference between this and the line directly following it with the distinction of ignored versus NaN - can you help me understand? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Added the ignored example to documentation. |
||
* If a DataFrame index label doesn't exist in the Series index, NaN is assigned | ||
* The order of values in the Series doesn't matter; only the index labels matter | ||
|
||
Examples | ||
-------- | ||
Basic column assignment: | ||
|
||
>>> df = pd.DataFrame({"A": [1, 2, 3]}) | ||
>>> df["B"] = [4, 5, 6] # Assigns by position | ||
>>> df | ||
A B | ||
0 1 4 | ||
1 2 5 | ||
2 3 6 | ||
|
||
Series assignment with index alignment: | ||
|
||
>>> df = pd.DataFrame({"A": [1, 2, 3]}, index=[0, 1, 2]) | ||
>>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df | ||
>>> df["B"] = s # Assigns by index label, not position | ||
>>> df | ||
A B | ||
0 1 NaN | ||
1 2 10 | ||
2 3 NaN | ||
|
||
Series assignment with partial index match: | ||
|
||
>>> df = pd.DataFrame({"A": [1, 2, 3, 4]}, index=["a", "b", "c", "d"]) | ||
>>> s = pd.Series([100, 200], index=["b", "d"]) | ||
>>> df["B"] = s | ||
>>> df | ||
A B | ||
a 1 NaN | ||
b 2 100 | ||
c 3 NaN | ||
d 4 200 | ||
""" | ||
if not PYPY: | ||
if sys.getrefcount(self) <= 3: | ||
warnings.warn( | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -610,6 +610,23 @@ def loc(self) -> _LocIndexer: | |
|
||
Please see the :ref:`user guide<advanced.advanced_hierarchical>` | ||
for more details and explanations of advanced indexing. | ||
|
||
**Assignment with Series** | ||
|
||
When assigning a Series to .loc[row_indexer, col_indexer], pandas aligns | ||
the Series by index labels, not by order or position. This is consistent | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you can remove the line |
||
with pandas' general alignment behavior. | ||
|
||
Series assignment with .loc and index alignment: | ||
|
||
>>> df = pd.DataFrame({"A": [1, 2, 3]}, index=[0, 1, 2]) | ||
>>> s = pd.Series([10, 20], index=[1, 0]) # Note reversed order | ||
>>> df.loc[:, "B"] = s # Aligns by index, not order | ||
>>> df | ||
A B | ||
0 1 20.0 | ||
1 2 10.0 | ||
2 3 NaN | ||
""" | ||
return _LocIndexer("loc", self) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -378,3 +378,22 @@ def test_inspect_getmembers(self): | |
# GH38740 | ||
df = DataFrame() | ||
inspect.getmembers(df) | ||
|
||
def test_setitem_series_alignment_documentation(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I definitely appreciate you adding tests, but since this is a documentation change you shouldn't need to add anything here. Feel free to remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok! Will generate another PR to add these tests in. Would be good to have. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #61822 for follow up for adding tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was there an original discussion on tests being required? From a glance at them I would think our existing test base already covers those use cases, but maybe I am overlooking something There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No the issue does not mention anything about the missing test cases, but I haven;t seen the two cases I mentioned is covered. Hence the addition. I addressed your comments, please let me know how's that look. |
||
# Test that Series assignment aligns by index as documented. | ||
df = DataFrame({"A": [1, 2, 3]}, index=[0, 1, 2]) | ||
s = Series([10, 20], index=[1, 3]) | ||
df["B"] = s | ||
expected = DataFrame({"A": [1, 2, 3], "B": [np.nan, 10, np.nan]}) | ||
tm.assert_frame_equal(df, expected) | ||
|
||
def test_setitem_series_partial_alignment(self): | ||
# Test Series assignment with partial index match. """ | ||
df = DataFrame({"A": [1, 2, 3, 4]}, index=["a", "b", "c", "d"]) | ||
s = Series([100, 200], index=["b", "d"]) | ||
df["B"] = s | ||
expected = DataFrame( | ||
{"A": [1, 2, 3, 4], "B": [np.nan, 100, np.nan, 200]}, | ||
index=["a", "b", "c", "d"], | ||
) | ||
tm.assert_frame_equal(df, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
.values
is typically discouraged, so I think should remove this