-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
These lines make it possible for multiple updates in the same day to create either duplicate or conflicting rows representing the same datapoint as of the same as-of date if the function is run twice. It will append to the existing table as many times as it is run:
hubhelpr/R/update_hub_target_data.R
Lines 143 to 154 in b637363
| output_file <- fs::path(output_dirpath, "time-series", ext = "parquet") | |
| if (fs::file_exists(output_file)) { | |
| existing_data <- forecasttools::read_tabular_file(output_file) | |
| } else { | |
| existing_data <- NULL | |
| } | |
| dplyr::bind_rows( | |
| existing_data, | |
| hubverse_format_nhsn_data, | |
| hubverse_format_nssp_data | |
| ) |> | |
| forecasttools::write_tabular_file(output_file) |
The function should instead:
- take an
overwrite_existingkeyword argument, defaultFALSE - When
overwrite_existingisTRUE, overwrite all and only the shared rows between the new data and the old with the givenas_ofdate. That is, if there are datapoints with the currentas_ofdate that are not contained within the update (say the update only contains NSSP, and we've previously updated NHSN), they are not deleted. In the case of conflicts, we defer to the update and overwrite the old as-of-today entries. - When
overwrite_existingisFALSE, error if there are any conflicts.
A complete PR should include unit tests that all of the above occurs. This would be a good opportunity for test-driven development (i.e. write the test first and confirm it fails).
Metadata
Metadata
Assignees
Labels
No labels