Skip to content

[BUG]: DataFrame.join inconsistent behavior, accepts overlapping columns provided suffixes is specified #13659

Open
@dragonator4

Description

@dragonator4

Here is a sample code to reproduce the error:

In [1]: df1 = pd.DataFrame(np.random.rand(5,2))
        df2 = pd.DataFrame(np.random.rand(5,2))

In [2]: df2.join(df1)
Out[2]: ---------------------------------------------------------------------------
        ValueError: columns overlap but no suffix specified: RangeIndex(start=0, stop=2, step=1)

In [3]: df2.join(df1, lsuffix='_x', rsuffix='_x')
Out[2]:     0_x         1_x         0_x         1_x
        0   0.904888    0.491802    0.509346    0.367847
        1   0.282420    0.092652    0.672786    0.358450
        2   0.339018    0.318990    0.359977    0.640366
        3   0.775293    0.767872    0.820965    0.018728
        4   0.543648    0.412799    0.650457    0.712789

So ultimately one does get a merged DataFrame with overlapping column names. Then why raise an error in the first place?

Note, I am using the latest Pandas, Python and Numpy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementError ReportingIncorrect or improved errors from pandasReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions