Open
Description
covid_hosp state daily has been failing since June 17 with the following error:
Traceback (most recent call last):
File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/state_daily/update.py", line 42, in <module>
Utils.launch_if_main(Update.run, __name__)
File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 38, in launch_if_main
entrypoint()
File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/state_daily/update.py", line 38, in run
return Utils.update_dataset(Database, network)
File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 220, in update_dataset
dataset = Utils.merge_by_key_cols([network.fetch_dataset(url, logger=logger) for url, _ in revisions],
File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 162, in merge_by_key_cols
dfs = [df.set_index(key_cols) for df in dfs
File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 162, in <listcomp>
dfs = [df.set_index(key_cols) for df in dfs
File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pandas/core/frame.py", line 4727, in set_index
raise KeyError(f"None of {missing} are in the columns")
KeyError: "None of ['reporting_cutoff_start'] are in the columns"
This suggests the file format changed for state daily. Indeed, there's a line on the state-daily healthdata.gov site that says the name of this column is now date
:
We should:
- Figure out exactly which date the format change was made (the screenshot above claims June 26 but the traceback above occurred on June 17)
- Update the code to use
date
for files posted on or after that date, andreporting_cutoff_start
for files posted before that date- or maybe check to see if
date
is present and if not usereporting_cutoff_start
instead?
- or maybe check to see if