Skip to content

Handle repeated target data updates with the same as-of date #66

@dylanhmorris

Description

@dylanhmorris

These lines make it possible for multiple updates in the same day to create either duplicate or conflicting rows representing the same datapoint as of the same as-of date if the function is run twice. It will append to the existing table as many times as it is run:

output_file <- fs::path(output_dirpath, "time-series", ext = "parquet")
if (fs::file_exists(output_file)) {
existing_data <- forecasttools::read_tabular_file(output_file)
} else {
existing_data <- NULL
}
dplyr::bind_rows(
existing_data,
hubverse_format_nhsn_data,
hubverse_format_nssp_data
) |>
forecasttools::write_tabular_file(output_file)

The function should instead:

  • take an overwrite_existing keyword argument, default FALSE
  • When overwrite_existing is TRUE, overwrite all and only the shared rows between the new data and the old with the given as_of date. That is, if there are datapoints with the current as_of date that are not contained within the update (say the update only contains NSSP, and we've previously updated NHSN), they are not deleted. In the case of conflicts, we defer to the update and overwrite the old as-of-today entries.
  • When overwrite_existing is FALSE, error if there are any conflicts.

A complete PR should include unit tests that all of the above occurs. This would be a good opportunity for test-driven development (i.e. write the test first and confirm it fails).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions