Skip to content
  • Sponsor pandas-dev/pandas

  • Notifications You must be signed in to change notification settings
  • Fork 18.7k

BUG: to_latex() output broken when the index has a name #10660

Closed
@jakbaum

Description

@jakbaum

Hey folks,

I posted this on SO and was asked to file a report here as well.

I'm trying to export pandas.DataFrame.describe() to LaTex using the to_latex()-method. This works all fine as long as I don't apply the groupby()-method beforehand. With a grouped DataFrame, the first row has no values, even though its label is count. Note that the first row of a grouped dataframe is used to mark down the variable used for grouping in iPython notebook.

I'm using pandas 0.16.2, python 3.
Is this a bug or am I doing something wrong?

Cheers,
Jakob

Here some examples:

Without groupby:

\begin{tabular}{lr}
\toprule
{} &    IS\_FEMALE \\
\midrule
count &  2267.000000 \\
mean  &     0.384649 \\
...
...
75\%   &     1.000000 \\
max   &     1.000000 \\
\bottomrule
\end{tabular}

enter image description here

With groupby:

\begin{tabular}{llr}
\toprule
  &       &    IS\_FEMALE \\
\midrule
0 & count &              \\     % <-- note missing value here
  & mean &  1134.000000 \\
  & std &     0.554674 \\
...
...
  & 75\% &     0.000000 \\
  & max &     0.000000 \\
\bottomrule
\end{tabular}

enter image description here

Output in the notebook:

enter image description here

Activity

jorisvandenbossche

jorisvandenbossche commented on Jul 23, 2015

@jorisvandenbossche
Member

Thanks for the report! Can you:

  • try to provide a small reproducible example? (so some code we can run that makes up a dummy dataframe and that reproduces the error)
  • check if it is an issue with groupby, or just with to_latex. For example, if you create a similar dataframe comparable to the output of the groupby by hand, and then export it to latex, do you experience the same error?
jakbaum

jakbaum commented on Jul 23, 2015

@jakbaum
Author

Sure. This snippet re-creates the issue. Sorry for the messy DataFrame-construction. First time I create one with numpy.

import pandas as pd
import numpy as np

cols = ['Group','Value']
group = np.random.randint(2, size=10)
values = np.random.random_sample(10)
df = pd.DataFrame([group, values]).T
df.columns = cols

print(df.groupby('Group').describe().to_latex())

I don't really know how to test your second point, to be honest. The first 'blank' row of a groupby is just visualization, I reckon?

jorisvandenbossche

jorisvandenbossche commented on Jul 23, 2015

@jorisvandenbossche
Member

Thanks for the reproducible example! That indeeds triggers the error for me as well.

Here is an example of just a small dataframe that also shows the error (as it has as such nothing to do with the groupby, it is just that it creates a multi-index that to_latex handles incorrectly):


In [22]: df = pd.DataFrame({'a':[0,0,1,1], 'b':list('abab'), 'c':[1,2,3,4]})

In [23]: df = df.set_index(['a', 'b'])

In [24]: df
Out[24]:
     c
a b
0 a  1
  b  2
1 a  3
  b  4

In [25]: print(df.to_latex())
\begin{tabular}{llr}
\toprule
  &   &  c \\
\midrule
0 & a &    \\
  & b &  1 \\
1 & a &  2 \\
  & b &  3 \\
\bottomrule
\end{tabular}

It seems that all values are shifted one line below.

jorisvandenbossche

jorisvandenbossche commented on Jul 23, 2015

@jorisvandenbossche
Member

It seems this has something to do with the index level names:

In [35]: df.index.names = [None, None]

In [36]: df
Out[36]:
     c
0 a  1
  b  2
1 a  3
  b  4

In [37]: print df.to_latex()
\begin{tabular}{llr}
\toprule
  &   &  c \\
\midrule
0 & a &  1 \\
  & b &  2 \\
1 & a &  3 \\
  & b &  4 \\
\bottomrule
\end{tabular}

And possibly related: #9908

jreback

jreback commented on Jul 23, 2015

@jreback
Contributor

dupe if #2942 ?

changed the title [-].groupby().to_latex() output broken[/-] [+]BUG: to_latex() output broken when the index has a name[/+] on Jul 23, 2015
jorisvandenbossche

jorisvandenbossche commented on Jul 23, 2015

@jorisvandenbossche
Member

No, I don't think so, as this one not only applies to multi-index:

In [45]: df = pd.DataFrame({'a':list('abc'), 'b':[1,2,3]})

In [46]: df = df.set_index(['a'])

In [47]: df
Out[47]:
   b
a
a  1
b  2
c  3

In [49]: print df.to_latex()
\begin{tabular}{lr}
\toprule
{} &  b \\
\midrule
a &    \\
a &  1 \\
b &  2 \\
c &  3 \\
\bottomrule
\end{tabular}

So it is something with the index name.

jakbaum

jakbaum commented on Jul 23, 2015

@jakbaum
Author

Is the proposed fix of #9908 implemented in 0.16.2?

jorisvandenbossche

jorisvandenbossche commented on Jul 23, 2015

@jorisvandenbossche
Member

@jakbaum yes, it is already in 0.16.1. But it does not fix this one, it possibly fixed a related issue, but should look into more detail into that.

And very welcome to look into the problem if you want! It shouldn't be that hard I think.

jreback

jreback commented on Jul 23, 2015

@jreback
Contributor

also #8336

jakbaum

jakbaum commented on Jul 23, 2015

@jakbaum
Author

@jorisvandenbossche Your belief in my coding qualities honor me, but quite honestly: I don't think I'm capable of fixing this. I wouldn't even know how to start and I don't want to mess things up. Actually, I'm more of a copy-paste coder than anything else. :)

jorisvandenbossche

jorisvandenbossche commented on Jul 23, 2015

@jorisvandenbossche
Member

@jakbaum no problem, thanks for reporting it anyway!

10 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      BUG: to_latex() output broken when the index has a name · Issue #10660 · pandas-dev/pandas