Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
The float_format
parameter in pd.DataFrame.to_csv
doesn't support modern (i.e. since Python 2.6) format strings yet.
The documentation says about the float_format
parameter:
float_format : float_formatstr, Callable, default None Format string for floating
point numbers. If a Callable is given, it takes precedence over other
numeric formatting parameters, like decimal.
The word "format string" apparently means an old-style %-format string like "%.6f"
. However, the Python docs tend to use that word for a modern format string like "{:.6f}"
.
import pandas as pd
df = pd.DataFrame([0.1, 0.2])
for float_format in ["%.6f", "{:.6f}".format, "{:.6f}"]:
print(float_format, ":\n", df.to_csv(float_format=float_format))
Out:
%.6f :
,0
0,0.100000
1,0.200000
<built-in method format of str object at 0x7f62be4e01f0> :
,0
0,0.100000
1,0.200000
{:.6f} :
,0
0,{:.6f}
1,{:.6f}
Feature Description
pseudo code:
class DataFrame:
...
def to_csv(..., float_format, ...):
if isinstance(float_format, str):
if "%" in float_format:
float_format = lambda x: float_format % x
else:
float_format = float_format.format
...
out = float_format(value)
...
Alternative Solutions
Workaround
df.to_csv(float_format="{.6f}".format)
Documentation change
Document that:
- only old-style % format strings are supported.
- one can pass in a string's format method (e.g.
float_format = "{:.6f}".format
) if one wants to use a modern format string.
Additional Context
If this feature is implemented and used, one could get a minor speed-up in Python 3.10 compared to using %-strings (The same speedup is accessible by using the workaround described above.):
from timeit import timeit
num = 0.1
print("%:", timeit('"%.6f" % 0.1'))
print("format:", timeit('"{:.6f}".format(0.1)'))
setup = '''
from numpy.random import rand
import pandas as pd
df = pd.DataFrame(rand(1000, 1000))
'''
print("% to_csv:", timeit(
'df.to_csv(float_format="%.6f")', setup=setup, number=10))
print("format to_csv:", timeit(
'df.to_csv(float_format="{:.6f}".format)', setup=setup, number=10))
Output:
%: 0.10213060600017343
format: 0.10648653099997318
% to_csv: 7.168424273999335
format to_csv: 5.367143424999995