Skip to content

Indexing into a dataframe re-casts / "forgets" dtypes #16508

@makmanalp

Description

@makmanalp

Code Sample, a copy-pastable example if possible

In [62]: df = pd.DataFrame([(float(x) for x in range(0, 10)), (float(x) for x in range(10,20))])

In [63]: df
Out[63]:
      0     1     2     3     4     5     6     7     8     9
0   0.0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0
1  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0

In [64]: df[0]
Out[64]:
0     0.0
1    10.0
Name: 0, dtype: float64

In [65]: df[0].astype(int)
Out[65]:
0     0
1    10
Name: 0, dtype: int64

In [66]: df[0] = df[0].astype(int)

In [67]: df
Out[67]:
    0     1     2     3     4     5     6     7     8     9
0   0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0
1  10  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0

In [68]: df.iloc[0]
Out[68]:
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
8    8.0
9    9.0
Name: 0, dtype: float64

Problem description

After I reassign the 0th column as int, I expect it to be int, and it appears to be that way. But when I do a .iloc on the dataframe, it seems to be returning back into being a float somehow!

This is the narrowed-down version of a more insidious problem where instead of doing an iloc[] I was running a .apply(f) on a dataframe and the dtypes of the resulting dataframe were all messed up even when the function f wasn't doing anything discernible with the types, so I narrowed it down to this.

Current workaround is to re-cast all the types in f, but that can get frustrating real quick depending on the number of columns.

Expected Output

I expect the row to be a mixed dtype object, with the dtype of each cell matching that of the column:

In [68]: df.iloc[0]
Out[68]:
0    0
1    1.0
2    2.0
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
8    8.0
9    9.0
Name: 0, dtype: object

Output of pd.show_versions()

Details INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: 1.5.1
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.0.9
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Dtype ConversionsUnexpected or buggy dtype conversionsDuplicate ReportDuplicate issue or pull requestIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions