Skip to content

BUG: Error when trying to use pd.date_range #57215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
khider opened this issue Feb 2, 2024 · 8 comments
Open
3 tasks done

BUG: Error when trying to use pd.date_range #57215

khider opened this issue Feb 2, 2024 · 8 comments
Labels
Bug datetime.date stdlib datetime.date support Regression Functionality that used to work in a prior pandas version

Comments

@khider
Copy link

khider commented Feb 2, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

Edit[rhshadrach]: On pandas >= 3.0, you need to use freq='1000000000YS-JAN' below.

import pandas as pd
import numpy as np

pd.date_range(start = np.datetime64('500000000-01-01', 's'), end = np.datetime64('1500000000-01-01', 's'), freq='1000000000AS-JAN', unit='s')

Issue Description

Error:
raise ValueError(f"Offset {offset} did not increment date")

ValueError: Offset <YearBegin: month=1> did not increment date

Expected Behavior

Create my date range

Installed Versions

INSTALLED VERSIONS

commit : fd3f571
python : 3.11.7.final.0
python-bits : 64
OS : Darwin
OS-release : 23.1.0
Version : Darwin Kernel Version 23.1.0: Mon Oct 9 21:32:11 PDT 2023; root:xnu-10002.41.9~7/RELEASE_ARM64_T6030
machine : arm64
processor : arm
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.0
numpy : 1.26.3
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : None
pytest : 8.0.0
hypothesis : None
sphinx : 5.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.20.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : 0.58.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 15.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.12.0
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.4
qtpy : 2.4.1
pyqt5 : None

@khider khider added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 2, 2024
@rhshadrach
Copy link
Member

Thanks for the report; it's not clear to me if date_range should be able to handle values at this scale. Further investigations are welcome.

cc @MarcoGorelli

@rhshadrach rhshadrach added the datetime.date stdlib datetime.date support label Feb 4, 2024
@khider
Copy link
Author

khider commented Feb 5, 2024

This error came up as part of CI testing of a package. See here: https://github.com/LinkedEarth/Pyleoclim_util/blob/bf9716c22329b6bdefc2c9c2a39dc4982289ad5a/pyleoclim/tests/test_core_Series.py#L1274

The tests used to work so it was supported in previous versions of Pandas.

Just to confirm, it works with dates in the ~1500.

@rhshadrach
Copy link
Member

Thanks for adding that, best to always indicate it's a regression! That said, I'm seeing this fail in pandas 2.1.0 and pandas 2.0.0. Prior to that, pd.date_range did not take the unit argument.

Can you produce an environment where the example is successful, and report the output of pd.show_versions()?

@rhshadrach rhshadrach added Regression Functionality that used to work in a prior pandas version Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 5, 2024
@rhshadrach rhshadrach added this to the 2.2.1 milestone Feb 5, 2024
@khider
Copy link
Author

khider commented Feb 6, 2024

Tried it on our JupyterHub and it worked (although note that this was before AS deprecation warning).

image

Output of pd.show_versions():

------------------
commit           : 965ceca9fd796940050d6fc817707bba1c4f9bff
python           : 3.10.11.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.15.120+
Version          : #1 SMP Fri Jul 21 03:39:30 UTC 2023
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : en_US.UTF-8
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 2.0.2
numpy            : 1.23.5
pytz             : 2023.3
dateutil         : 2.8.2
setuptools       : 68.0.0
pip              : 23.1.2
Cython           : 0.29.35
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.9.2
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.1.2
IPython          : 8.14.0
pandas_datareader: None
bs4              : 4.12.2
bottleneck       : None
brotli           : 
fastparquet      : None
fsspec           : 2023.6.0
gcsfs            : None
matplotlib       : 3.7.1
numba            : 0.57.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 12.0.0
pyreadstat       : None
pyxlsb           : None
s3fs             : 2023.6.0
scipy            : 1.10.1
snappy           : None
sqlalchemy       : 2.0.15
tables           : None
tabulate         : 0.9.0
xarray           : 2023.5.0
xlrd             : 2.0.1
zstandard        : None
tzdata           : 2023.3
qtpy             : 2.3.1
pyqt5            : None
None

The container is available here: https://quay.io/repository/linkedearth/linkedearthhub?tab=tags

Latest tag should work. The problem with the GitHub tests is very new. It was passing as of a week ago. Unfortunately, we don't pin versions for CI so I have no idea what was actually installed then.

@rhshadrach
Copy link
Member

rhshadrach commented Feb 6, 2024

Thanks - I've edited the OP to be AS for now. I was able to confirm on 2.0.2 but could not run a git-bisect due to the change in build systems. @phofl - I recall you had a solve for this?

Further investigations and PRs to fix are welcome!

@rhshadrach rhshadrach removed the Needs Info Clarification about behavior needed to assess issue label Feb 6, 2024
@quangngd
Copy link
Contributor

quangngd commented Feb 7, 2024

Inside pandas/core/arrays/datetimes.py:_generate_range, when applying the offset to the current date

next_date = offset._apply(cur)

As Timestamp cur is converted to cython object _Timestamp, its year is set to 1972. See pandas/_libs/tslibs/timestamps.pyx:create_timestamp_from_ts

The change originates from #47720. Don't really know where to go next tho. Maybe depends on how we want to handle out-of-bound pydatetime

@rhshadrach
Copy link
Member

cc @jbrockmendel

@jbrockmendel
Copy link
Member

Looks like the issue is in shift_month. Changing year = stamp.year + dy to year = (<object>stamp).year + dy seems to fix it. That isn't a great solution though. Might be preferable to change the declaration on shift_month from datetime to _Timestamp, though i expect that will take more work.

@lithomas1 lithomas1 modified the milestones: 2.2.1, 2.2.2 Feb 23, 2024
@lithomas1 lithomas1 modified the milestones: 2.2.2, 2.2.3 Apr 10, 2024
@lithomas1 lithomas1 modified the milestones: 2.2.3, 2.3 Sep 21, 2024
@mroeschke mroeschke removed this from the 2.3 milestone Jun 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug datetime.date stdlib datetime.date support Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

6 participants