Skip to content

Conversation

@O957
Copy link
Collaborator

@O957 O957 commented Dec 5, 2025

This PR:

  • Adds two new functions (in files) write_ref_date_summary.R and summarize_ref_date_forecasts.R, which DRYify processes across get_forecast_data.R and get_map_data.R.

@O957 O957 self-assigned this Dec 5, 2025
@O957 O957 linked an issue Dec 5, 2025 that may be closed by this pull request
@O957
Copy link
Collaborator Author

O957 commented Dec 5, 2025

lintr produced:

[object_length_linter] Variable and function names should not be longer than 30 characters.
write_ref_date_summary_ensemble <- function(
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Error: File R/write_ref_date_summary.R is not lint free
Execution halted

I am going to update the lintr behavior.

@codecov-commenter
Copy link

codecov-commenter commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 0.47847% with 208 lines in your changes missing coverage. Please review.
✅ Project coverage is 8.74%. Comparing base (cfad627) to head (56aa356).

Files with missing lines Patch % Lines
R/write_ref_date_summary.R 0.00% 105 Missing ⚠️
R/summarize_ref_date_forecasts.R 0.00% 97 Missing ⚠️
R/check_authorized_users.R 0.00% 1 Missing ⚠️
R/check_changes_for_autoapproval.R 0.00% 1 Missing ⚠️
R/generate_hub_baselines.R 0.00% 1 Missing ⚠️
R/generate_hub_ensemble.R 0.00% 1 Missing ⚠️
R/generate_oracle_output.R 0.00% 1 Missing ⚠️
R/update_authorized_users.R 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##            main     #71      +/-   ##
========================================
+ Coverage   8.32%   8.74%   +0.42%     
========================================
  Files         10      10              
  Lines        769     743      -26     
========================================
+ Hits          64      65       +1     
+ Misses       705     678      -27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines 42 to 45
checkmate::assert_character(excluded_locations)
checkmate::assert_character(targets, null.ok = TRUE)
checkmate::assert_character(model_ids, null.ok = TRUE)
checkmate::assert_data_frame(population_data, null.ok = TRUE)
Copy link
Contributor

@dylanhmorris dylanhmorris Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these type checks can all be omitted. AFAICT, everything we do with them below is can rely on duck typing principles

forecasts_data <- forecasts_data |>
dplyr::arrange(.data$location_sort_order, .data$location_name)

if (include_metadata && !is.null(model_metadata)) {
Copy link
Contributor

@dylanhmorris dylanhmorris Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you filter to desired columns when producing the output tables, you can always fetch and include the metadata, which will avoid extra switches

checkmate::assert_data_frame(population_data, null.ok = TRUE)
checkmate::assert_logical(include_metadata, len = 1)

if (!is.null(population_data)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to https://github.com/CDCgov/hubhelpr/pull/71/files#r2593041996, let's just make the population data table required for this function (even if some downstream tables we may choose to produce don't end up including the per 100k columns(). We'll always have the population data available and that way the output data frame for this function has a predictable set of columns

Comment on lines 74 to 77
if (!is.null(model_ids)) {
current_forecasts <- current_forecasts |>
dplyr::filter(.data$model_id %in% !!model_ids)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this down to alongside the target filter and use nullable_comparison

#' model metadata (team_name, model_name). Default: TRUE.
#' @param column_selection named character vector
#' specifying which columns to select and rename. If NULL,
#' includes all columns. Default: NULL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly doesn't have to be named; if it isn't you just wont get renamings. You can even mix and match a bit, though there are quirks. See last example

Compare

df <- tibble::tibble(x=1:5,y=10:14)
dplyr::select(df, tidyselect::all_of(c(a="x", b="y", c="x")))
dplyr::select(df, tidyselect::all_of(c("x", "y", "x")))
dplyr::select(df, tidyselect::all_of(c("y", "x", c="x")))


if (!is.null(column_selection)) {
summary_data <- summary_data |>
dplyr::select(!!!column_selection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dplyr::select(!!!column_selection)
dplyr::select(tidyselect::all_of(column_selection))

model_ids = NULL,
population_data = NULL,
include_metadata = TRUE,
column_selection = NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have this default to tidyselect::everything() you can avoid the if is NULL check

Suggested change
column_selection = NULL
column_selection = tidyselect::everything()

Comment on lines 53 to 54
checkmate::assert_string(file_suffix)
checkmate::assert_character(column_selection, null.ok = TRUE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Suggested change
checkmate::assert_string(file_suffix)
checkmate::assert_character(column_selection, null.ok = TRUE)

Comment on lines 129 to 133
checkmate::assert_data_frame(population_data)
checkmate::assert_names(
colnames(population_data),
must.include = c("location", "population")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defer checking the population data to summarize_ref_date_forecasts

Suggested change
checkmate::assert_data_frame(population_data)
checkmate::assert_names(
colnames(population_data),
must.include = c("location", "population")
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the thought to retain this for backward compatibility? I think it is not used enough yet to make this worthwhile. I would just delete and make sure we port over the (currently one) hub that uses it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally wrote a deprecation comment on both get_*.R files, but I left them with minimal comment, in hopes that you might afford some remarks. I think deleting them makes the most sense (given there is not really a user base who've become accustomed to them) but didn't want to make this decision myself.

R/get_map_data.R Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as get_forecast_data.R

Copy link
Contributor

@dylanhmorris dylanhmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. Thanks, @O957. A few changes needed.

@O957 O957 requested a review from dylanhmorris December 5, 2025 19:11
@@ -0,0 +1,141 @@
#' Summarize forecast hub data for a specific reference
#' date. This function generates a tibble of forecast data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line becomes the header in R function descriptions, so it is worth having a line break.

Suggested change
#' date. This function generates a tibble of forecast data
#' date.
#'
#' This function generates a tibble of forecast data

excluded_locations = character(0),
targets = NULL,
model_ids = NULL,
population_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this before the keyword args


model_metadata <- hubData::load_model_metadata(
base_hub_path,
model_ids = NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the user-passed model_ids here?

forecast_due_date_formatted = format(.data$forecast_due_date, "%B %d, %Y")
)

forecasts_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer explicit returns. Some R linters disagree on the default setting, but this can be adjusted.

Suggested change
forecasts_data
return(forecasts_data)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added explicit returns elsewhere across the codebase where they didn't occur.

Comment on lines 28 to 30
#' @param column_selection character vector specifying
#' which columns to select. Uses tidyselect semantics.
#' Default: tidyselect::everything().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @param column_selection character vector specifying
#' which columns to select. Uses tidyselect semantics.
#' Default: tidyselect::everything().
#' @param column_selection Columns to include in the output table.
#' Uses [tidy selection](https://dplyr.tidyverse.org/articles/programming.html).
#' Default: [tidyselect::everything()].

Copy link
Contributor

@dylanhmorris dylanhmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. A few remaining questions.

@O957 O957 requested a review from dylanhmorris December 8, 2025 17:48
@O957
Copy link
Collaborator Author

O957 commented Dec 8, 2025

This is looking good. A few remaining questions.

I think I got everything, including the renamings. Thank you thus far for the review comments. There were also a few more lines breaks that I failed to add for other functions; I forgot about the fact that these first lines become the R function descriptions.

Copy link
Contributor

@dylanhmorris dylanhmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @O957!

@dylanhmorris
Copy link
Contributor

@O957 I think you can merge. The CI failure looks to be a runner issue, not an issue with the codebase

@O957
Copy link
Collaborator Author

O957 commented Dec 9, 2025

@O957 I think you can merge. The CI failure looks to be a runner issue, not an issue with the codebase

Thank you, I thought so, but got distracted after I re-ran the failed job.

@O957 O957 merged commit 02fe0c9 into main Dec 9, 2025
8 of 10 checks passed
@O957 O957 deleted the 70-dryify-get_map_data-and-get_forecast_data branch December 9, 2025 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DRYify get_map_data() and get_forecast_data().

4 participants