Skip to content

detect_outlr_stl() requires gap filling while detect_outlr_rm() does not? Document this. #253

@rachlobay

Description

@rachlobay

It seems like detect_outlr_stl() requires gap filling. For example, if we manually remove two rows from the below epi_df & pop that into detect_outlr_stl(), then we inevitably get an error about the data containing implicit gaps in time. In contrast, if we use the same data in detect_outlr_rm(), there are no complaints from that function (likely due to epi_slide). So, there should be strong documentation that explains how missing rows are handled by each function. As well, we should probably update the epi_slide vignette to explain how it handles missing rows of data (which is not likely to be uncommon for users of this package).

Ex. showing error in detect_outlr_stl() and no problem in detect_outlr_rm()

library(epidatr)
library(epiprocess)
library(dplyr)
library(tidyr)

# Load #s of new confirmed COVID-19 cases, daily, for FL
# over a fairly large time window
x <- covidcast(
  data_source = "jhu-csse",
  signals = "confirmed_incidence_num",
  time_type = "day",
  geo_type = "state",
  time_values = epirange(20200601, 20220601),
  geo_values = "fl",
  as_of = 20220606
) %>%
  fetch_tbl() %>%
  select(geo_value, time_value, cases = value) %>%
  as_epi_df()

x<- x[-c(2,10),] # Remove some rows from x

y = x$cases
x = x$time_value
# The below should all be the default values from detect_outlr_stl()
n_trend = 21
n_seasonal = 21
n_threshold = 21
seasonal_period = NULL
log_transform = FALSE
detect_negatives = FALSE
detection_multiplier = 2.5
min_radius = 0
replacement_multiplier = 0

# Below is the first part of the detect_outlr_stl() function 
  # Transform if requested
  if (log_transform) {
    # Replace all negative values with 0
    y = pmax(0, y)
    offset = as.integer(any(y == 0))
    y = log(y + offset)
  }
  
  # Make a tsibble for fabletools, setup and run STL
  z_tsibble = tsibble::tsibble(x = x, y = y, index = x)
  
  stl_formula = y ~ trend(window = n_trend) +
    season(period = seasonal_period, window = n_seasonal)
  
  stl_components = z_tsibble %>%
    fabletools::model(feasts::STL(stl_formula, robust = TRUE)) %>%
    generics::components() %>%
    tibble::as_tibble() %>%
    dplyr::select(trend:remainder) %>%
    dplyr::rename_with(~ "seasonal", tidyselect::starts_with("season")) %>% 
    dplyr::rename(resid = remainder)


# Now, the same data when inputted into detect_outlr_rm() has no apparent problem

  x <- covidcast(
    data_source = "jhu-csse",
    signals = "confirmed_incidence_num",
    time_type = "day",
    geo_type = "state",
    time_values = epirange(20200601, 20220601),
    geo_values = "fl",
    as_of = 20220606
  ) %>%
    fetch_tbl() %>%
    select(geo_value, time_value, cases = value) %>%
    as_epi_df()
  
  x<- x[-c(2,10),] # Remove some rows from x
  
  x <- x %>%
    group_by(geo_value) %>%
    mutate(outlier_info  = detect_outlr_rm(
      x = time_value, y = cases),
      detection_multiplier = 2.5) %>% #%% change this to something larger potentially or nah?
    unnest(outlier_info)
  
  x

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3very low prioritydocumentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions