Description
Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: dates_range = pd.date_range('2016/10/1', periods=3).repeat(4)
In [4]: statuses = np.tile(['A', 'B', None, 'C'], 3)
In [5]: values = np.random.randn(12)
In [6]: df = pd.DataFrame({'date': dates_range, 'status': statuses, 'value': values})
In [7]: df
Out[7]:
date status value
0 2016-10-01 A -1.946876
1 2016-10-01 B 1.080243
2 2016-10-01 None 0.165715
3 2016-10-01 C -0.615913
4 2016-10-02 A 0.662645
5 2016-10-02 B 1.448593
6 2016-10-02 None -1.392233
7 2016-10-02 C 1.534083
8 2016-10-03 A 0.801988
9 2016-10-03 B -0.689987
10 2016-10-03 None -0.150036
11 2016-10-03 C -0.197410
In [8]: df.set_index(['date', 'status']).sort_index()
Out[8]:
value
date status
2016-10-01 A -1.946876
B 1.080243
C -0.615913
NaN 0.165715
2016-10-02 A 0.662645
B 1.448593
C 1.534083
NaN -1.392233
2016-10-03 A 0.801988
B -0.689987
C -0.197410
NaN -0.150036
In [9]: df.set_index(['date', 'status']).sort_index().index.is_lexsorted()
Out[9]: False
In [10]: df.set_index(['date', 'status']).sort_index(level=['date', 'status']).index.is_lexsorted()
Out[10]: True
Problem description
By giving sort_index()
no level
argument I expect to have the dataframe sorted on all the levels, as indeed it happens. However the is_lexsorted()
method afterwards reports that it is not. The problem is fixed if I explicitly pass the level
argument.
The behavior is particularly problematic when the MultiIndex
has many levels, say n
, and the NaN
s appear at level m
< n
. Then slicing the MultiIndex
for levels up to m
will work, while for levels from m
and above it will crash.
Note that this was not so in previous versions. Unfortunately not sure which ones; it worked OK in spring 2017, I came upon the issue by re-running code from that time.
Expected Output
The expected behavior is the one that I would get with passing the level
argument with all index levels, as in the transcript above.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.5.0
Cython: None
numpy: 1.13.3
scipy: 0.19.1
xarray: None
IPython: 6.2.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.14
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None