-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
API / COW: ensure every new Series/DataFrame also has new (shallow copy) index #53699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
API / COW: ensure every new Series/DataFrame also has new (shallow copy) index #53699
Conversation
|
Note about the status of the PR: I fixed it for the indexing operations and tested this, and for shallow It does maybe raise the question where this is best to handle? Currently I handle this in many places where it is needed. In theory, if the view is very cheap, we could also "just" always take a view of the passed axes in the BlockManager(..) constructor, and then it would need much fewer code changes (and you have less chance to forget this in some new code). But that would of course do this too many times (I should check the performance overhead of EDIT: The shallow |
|
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
The issue at #53529 was about the index still sharing mutable state (because of being the same object) in case of getting a series out of a DataFrame (even with CoW turned on):
and thus if you would modify for example the index name of the Series, also the DataFrame gets updated (this PR ensures the above return value is False).
I think under the CoW rules, it makes sense to also ensure mutation of such index attributes don't propagate, similarly to mutating values, by ensuring we use a shallow copy of the Index whenever a new DataFrame/Series object is being returned from some operation or method.
Now, this goes quite a bit further than just the typical indexing operation above. To start, methods that return a shallow copy under CoW should also do this, as a start the
copy()method itself:But even for new objects that actually don't share data even with CoW, we do share the index / columns:
I think that in the CoW spirit that every new DataFrame/Series object should be independent, none of those cases should ever share the index and always use shallow copies (so essentially
df1.index is df2.indexcan only be true ifdf1 is df2, i.e. if we have identical objects).doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.