Consider what `epi_slide(.window_size = Inf)` should output when min `time_value` differs by epikey

``` r
suppressPackageStartupMessages({
  library(dplyr)
  library(epiprocess)
})
vctrs::vec_rbind(
  tibble::tibble(geo_value = 1, time_value = 1:4 + 0, value = 1:4),
  tibble::tibble(geo_value = 2, time_value = 3:5 + 0, value = 11:13)
) %>%
as_epi_df() %>%
epi_slide(~ sum(.x$value), .window_size = Inf)
#> An `epi_df` object, 7 x 4 with metadata:
#> * geo_type  = hhs
#> * time_type = integer
#> * as_of     = 2025-04-08 16:57:38.919515
#> 
#> # A tibble: 7 × 4
#>   geo_value time_value value slide_value
#>       <dbl>      <dbl> <int>       <int>
#> 1         1          1     1           1
#> 2         1          2     2           3
#> 3         1          3     3           6
#> 4         1          4     4          10
#> 5         2          3    11          NA
#> 6         2          4    12          NA
#> 7         2          5    13          NA

# (We get the same result with epi_slide_sum; something like this is in our test suite.)
vctrs::vec_rbind(
  tibble::tibble(geo_value = 1, time_value = 1:4 + 0, value = 1:4),
  tibble::tibble(geo_value = 2, time_value = 3:5 + 0, value = 11:13)
) %>%
as_epi_df() %>%
epi_slide_sum(value, .window_size = Inf)
#> An `epi_df` object, 7 x 4 with metadata:
#> * geo_type  = hhs
#> * time_type = integer
#> * as_of     = 2025-04-08 16:57:39.021215
#> 
#> # A tibble: 7 × 4
#>   geo_value time_value value value_running_sum
#>       <dbl>      <dbl> <int>             <dbl>
#> 1         1          1     1                 1
#> 2         1          2     2                 3
#> 3         1          3     3                 6
#> 4         1          4     4                10
#> 5         2          3    11                NA
#> 6         2          4    12                NA
#> 7         2          5    13                NA
```

<sup>Created on 2025-04-08 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>

The NAs in the second group are presumably coming from completing time values 1&2 with NAs.  Is this what we want?  On one hand, it makes the input `time_value`s contributing to each output `time_value` the same for each `geo_value`.  On the other hand, it makes the result inconsistent with what one might expect from explicitly spelling out  `edf %>% group_by(geo_value) %>% epi_slide(....) %>% ungroup()`, i.e., that it'd be the same as group-splitting/mapping and performing the same operation, and recombining.  (We might have some other lesser violations of this expectation with period-inference somewhere, maybe `epix_slide`, but in general I think we've been following this as well.  [Another violation is in handling of explicit `.ref_time_values`; if we split out into geos with partial ref time availability then we would raise an error.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider what `epi_slide(.window_size = Inf)` should output when min `time_value` differs by epikey #660

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider what epi_slide(.window_size = Inf) should output when min time_value differs by epikey #660

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Consider what `epi_slide(.window_size = Inf)` should output when min `time_value` differs by epikey #660