Skip to content

Error with downsampling intraday data where end.time() < start.time() #1772

@dalejung

Description

@dalejung

Simple Example

import pandas as pd
start = datetime.datetime(1999, 3, 1, 5)
# end hour is less than start
end = datetime.datetime(2012, 7, 31, 4)
bad_ind = pd.date_range(start, end, freq="30min")
df = pd.DataFrame({'close':1}, index=bad_ind)
try:
    df.resample('AS', 'sum')
except ValueError as e:
    print e

Long example:
http://nbviewer.maxdrawdown.com/3344040/intraday%20binning%20error.ipynb

Tracking it down, it appears that the problem is that _get_range_edges carries the time over when downsampling intraday data. So when generate_range is called during the DatetimeIndex creation, the final bin doesn't pass the while cur <= end check.

Thinking about it, there are two issues.

  1. generate_range should never output an index that doesn't include end. Maybe something
    while True:                                                                          
        yield cur                                                                        

        # last                                                                           
        if cur >= end:                                                                   
            break 
  1. _generate_range_edges should generate a range that is perfectly divisible by the freq. For the downsampling, we'd have to change the time by adjusting the end time or just zeroing both out. I don't know how many rely on this behavior though.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions