Closed
Description
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
s = pd.Series(
[
pd.Timestamp(2024, 6, 5, 16, 35, 1),
pd.Timestamp(2024, 6, 5, 16, 35, 4),
pd.Timestamp(2024, 6, 5, 16, 35, 0), # earlier than previous entry
pd.Timestamp(2024, 6, 5, 16, 35, 3),
]
)
s.diff()
Issue Description
Actual output :
0 NaT
1 0 days 00:00:00.000003
2 -1 days +23:59:56
3 0 days 00:00:00.000003
dtype: timedelta64[ns]
I just understood after submitting the bug that the output is technically correct (−1 day + (1 day - 4 s)…), but :
- it’s confusing ;
- it becomes wrong when you apply
dt.seconds
afterwards :
s.diff().dt.seconds
0 NaN
1 0.0
2 86396.0
3 0.0
dtype: float64
Expected Behavior
Expected output :
0 NaT
1 0 days 00:00:03
2 0 days -00:00:04
3 0 days 00:00:03
dtype: timedelta64[ns]
Installed Versions
INSTALLED VERSIONS
------------------
commit : d9cdd2e
python : 3.12.2.final.0
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.22621
machine : AMD64
processor : Intel64 Family 6 Model 154 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : fr_FR.cp1252
pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 70.0.0
pip : 24.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.9
jinja2 : 3.1.4
IPython : None
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.6.0
gcsfs : None
matplotlib : 3.9.0
numba : 0.59.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 16.1.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.13.1
sqlalchemy : 2.0.30
tables : None
tabulate : None
xarray : 2024.5.0
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
Activity
[-]BUG: `diff()` returns wrong values with negative timestamp deltas[/-][+]BUG: `diff()` returns confusing output when dealing with negative timestamp deltas[/+]G-Guillard commentedon Jun 5, 2024
Ok, I figured I should use
dt.total_seconds()
, and rethinking about it, although confusing, the reported behaviour does make sense.