Skip to content

Fix covid_hosp state_daily #1225

Open
@krivard

Description

@krivard

covid_hosp state daily has been failing since June 17 with the following error:

Traceback (most recent call last):
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/state_daily/update.py", line 42, in <module>
    Utils.launch_if_main(Update.run, __name__)
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 38, in launch_if_main
    entrypoint()
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/state_daily/update.py", line 38, in run
    return Utils.update_dataset(Database, network)
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 220, in update_dataset
    dataset = Utils.merge_by_key_cols([network.fetch_dataset(url, logger=logger) for url, _ in revisions],
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 162, in merge_by_key_cols
    dfs = [df.set_index(key_cols) for df in dfs
  File "/home/automation/driver/delphi/epidata/acquisition/covid_hosp/common/utils.py", line 162, in <listcomp>
    dfs = [df.set_index(key_cols) for df in dfs
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pandas/core/frame.py", line 4727, in set_index
    raise KeyError(f"None of {missing} are in the columns")
KeyError: "None of ['reporting_cutoff_start'] are in the columns"

This suggests the file format changed for state daily. Indeed, there's a line on the state-daily healthdata.gov site that says the name of this column is now date:

image

We should:

  • Figure out exactly which date the format change was made (the screenshot above claims June 26 but the traceback above occurred on June 17)
  • Update the code to use date for files posted on or after that date, and reporting_cutoff_start for files posted before that date
    • or maybe check to see if date is present and if not use reporting_cutoff_start instead?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions