Skip to content

ENH: Add coalesce_keys option to join #61033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
tylerriccio33 opened this issue Mar 3, 2025 · 1 comment
Open
1 of 3 tasks

ENH: Add coalesce_keys option to join #61033

tylerriccio33 opened this issue Mar 3, 2025 · 1 comment
Assignees
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@tylerriccio33
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

It would be useful to retain keys used in a join instead of automatically coalescing them. This is most useful in full outer joins. I am happy to implement myself :)

Feature Description

A test for this would pass w/the below data.

df1 = {"id": [1, 2, 3], "value1": ["A", "B", "C"]}
df2 = {"id": [2, 3, 4], "value2": ["X", "Y", "Z"]}

res = df1.join(df2, on = 'id', coalesce_keys = False)

Note the preservation of the id columns:
expected_no_coalesce = {
"id": [None, 1, 2, 3],
"value1": [None, "A", "B", "C"],
"id_right": [4, None, 2, 3],
"value2": ["Z", None, "X", "Y"],
}

Alternative Solutions

Arrow and polars have this option. I bring this up because I'm implementing a common full join where keys are preserved in the Narwhals package and noticed Pandas does not allow this out of the box. https://github.com/narwhals-dev/narwhals/pull/2126/files#diff-ff8314856956318d0da461d7cc2710a6b18d3c052581be7990ae0023a9e689ee

Additional Context

No response

@tylerriccio33 tylerriccio33 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 3, 2025
@rit4rosa
Copy link

rit4rosa commented Jun 4, 2025

take

rit4rosa added a commit to rit4rosa/pandas that referenced this issue Jun 15, 2025
This adds a coalesce_keys keyword to DataFrame.join to allow
preservation of both join key columns (id and id_right),
instead of automatically coalescing them into a single column.

This is especially useful in full outer joins, where retaining
information about unmatched keys from both sides is important.

Example:
    df1.join(df2, on=id, coalesce_keys=False)

This will result in both id and id_right columns being preserved,
rather than merged into a single id.

Includes:
- Modifications to join internals (core/reshape/merge.py)
- A dedicated test file (test_merge_coalesce.py) covering:
    - Preservation of join keys when coalesce_keys=False
    - Comparison with default behavior (coalesce_keys=True)
    - Full outer joins with asymmetric key presence

Co-authored-by: Maria Pereira <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants