Forecast Submission Evaluation #184

AFg6K7h4fhy2 · 2024-12-17T21:00:09Z

This issue covers evaluating evaluating and scoring forecast submissions by submitting teams, akin to the following vignette.

AFg6K7h4fhy2 · 2024-12-19T15:10:12Z

This task can likely reuse some code or patterns from cfa-flu-eval, which is not public. Additionally, the repository pyrenew-hew likely has elements of scoring and evaluation that can be used here.

AFg6K7h4fhy2 · 2025-01-06T16:36:46Z

Following a conversation with DHM regarding this issue, the author now intends to first follow the FluSight vignette on scoring. Moreover, the following resources are relevant to STF's use of scoringutils, for both this repository and pyrenew-hew:

scoring.R: Simple wrapper of [scoringutils::transform_forecasts()] and [scoringutils::score()] with reasonable defaults for pyrenew-hew production.
create a scoringutils-ready table (forecasttools vignette)
create a table for scoring hub model quantile forecasts from a local copy of a hub (forecasttools vignette)
plot median predictions and prediction intervals against observed data by forecast date
hubverse_score.R (pyrenew-hew specific usage of scoringutils, DHM remarked that this should not be followed too closely).

AFg6K7h4fhy2 · 2025-01-06T17:38:00Z

From personal communication with DHM, the author remarked:

At present, hub_to_scorable_quantiles considers a hub_path then has built-in target-data dir. finder (this is good) but the observations file name and its location relative to target-data might benefit from being flexibly entered.

For example, SBi has covid-hospital-admissions as the observations file in target-data while target-hospital-admissions is what gets searched for, without compromise, by hub_to_scorable_quantiles. I need to ask SBi when and why the decision was made to have the file name the way it is, since I understand the reasoning behind having target-hospital-admissions.csv as the observation file name within target-data.

While each hub is pathogen specific and thus pertains to a single pathogen such that target-hospital-admissions will always refer to the pathogen in the repository name, there are instances where we might have analysis repos or wrappers onto of Hubs such that target-data could contain multiple directories and or files e.g. ./flu/<flu-admissions-file-name>.csv or ./influenza/<flu-admissions-file-name>.csv and or ./rsv/rsv-admissions-file-name>.csv etc... OR n-files in the form <pathogen>-target-data.csv.

AFg6K7h4fhy2 · 2025-01-06T21:18:46Z

The following file should also prove valuable to the completion of this PR:

summarize_visualize_scores.R (see here)

AFg6K7h4fhy2 · 2025-01-07T16:23:29Z

As a quick update, the current code in the PR associated with this issue does not yet produce (model, state) coverage plot pairs, but does produce plots, those being coverage and relative WIS, for the COVIDHub-ensemble.

get_date_specific_exclusions <- function(base_hub_path) {
  exclusions <- RcppTOML::parseTOML(fs::path("auxiliary-data",
    "excluded_locations",
    ext = "toml"
  )) |>
    tibble::enframe(
      name = "reference_date",
      value = "location"
    ) |>
    dplyr::mutate(
      reference_date = lubridate::ymd(.data$reference_date)
    ) |>
    tidyr::unnest_longer("location")

  return(exclusions)
}

get_excluded_locs <- function(base_hub_path) {
  ex_locs <- RcppTOML::parseTOML(fs::path("auxiliary-data",
    "excluded_territories",
    ext = "toml"
  ))$locations

  return(ex_locs)
}


get_hub_table <- function(base_hub_path) {
  exclusions <- get_date_specific_exclusions(base_hub_path)
  always_excluded_locs <- get_excluded_locs(base_hub_path)
  target_data_rel_path <- fs::path(
    "target-data",
    "covid-hospital-admissions.csv"
  )
  hub_table <- forecasttools::hub_to_scorable_quantiles(
    hub_path = base_hub_path,
    target_data_rel_path =
      target_data_rel_path
  ) |>
    dplyr::filter(
      .data$horizon >= 0,
      !.data$location %in% !!always_excluded_locs
    ) |>
    dplyr::anti_join(exclusions,
      by = c("reference_date", "location")
    ) |>
    dplyr::rename(model = "model_id") |>
    dplyr::mutate(location = forecasttools::us_loc_code_to_abbr(
      .data$location
    ))

  return(hub_table)
}


with_horizons <- function(df) {
  return(
    dplyr::mutate(df, horizon = floor(
      as.numeric(.data$target_end_date - .data$reference_date) / 7
    ))
  )
}


summarised_scoring_table <- function(quantile_scores,
                                     scale = "natural",
                                     baseline = "CovidHub-baseline",
                                     by = NULL) {
  filtered_scores <- quantile_scores |>
    dplyr::filter(scale == !!scale)

  summarised_rel <- filtered_scores |>
    scoringutils::get_pairwise_comparisons(
      baseline = baseline,
      by = by
    ) |>
    dplyr::filter(.data$compare_against == !!baseline) |>
    dplyr::select("model",
      dplyr::all_of(by),
      relative_wis =
        "wis_scaled_relative_skill"
    )

  summarised <- filtered_scores |>
    scoringutils::summarise_scores(by = c("model", by)) |>
    dplyr::select("model",
      dplyr::all_of(by),
      abs_wis = "wis",
      mae = "ae_median",
      "interval_coverage_50",
      "interval_coverage_80",
      "interval_coverage_90",
      "interval_coverage_95"
    ) |>
    dplyr::inner_join(summarised_rel,
      by = c("model", by)
    )
  return(summarised)
}


plot_scores_by_date <- function(scores_by_date,
                                date_column = "reference_date",
                                score_column = "relative_wis",
                                model_column = "model",
                                plot_title = "Scores by model over time",
                                xlabel = "Date",
                                ylabel = "Relative WIS") {
  min_score <- base::min(scores_by_date[[score_column]])
  max_score <- base::max(scores_by_date[[score_column]])
  max_overall <- base::max(c(1 / min_score, max_score))
  sym_ylim <- c(1 / max_overall, max_overall)

  score_fig <- scores_by_date |>
    ggplot2::ggplot(ggplot2::aes(
      x = .data[[date_column]],
      y = .data[[score_column]],
      color = .data[[model_column]],
      fill = .data[[model_column]]
    )) +
    ggplot2::geom_line(linewidth = 2) +
    ggplot2::geom_point(
      shape = 21,
      size = 3,
      color = "black"
    ) +
    ggplot2::labs(
      title = plot_title,
      x = xlabel,
      y = ylabel
    ) +
    ggplot2::scale_y_continuous(trans = "log10") +
    ggplot2::theme_minimal() +
    ggplot2::coord_cartesian(ylim = sym_ylim) +
    ggplot2::facet_wrap(~horizon)

  return(score_fig)
}

relative_wis_by_location <- function(summarised_scores,
                                     model = "CovidHub-ensemble") {
  summarised_scores <- summarised_scores |>
    dplyr::filter(.data$model == !!model)

  min_wis <- base::min(summarised_scores$relative_wis)
  max_wis <- base::max(summarised_scores$relative_wis)
  max_overall <- base::max(c(1 / min_wis, max_wis))
  sym_xlim <- c(1 / max_overall, max_overall)

  ordered_locs <- summarised_scores |>
    dplyr::filter(.data$horizon == base::min(.data$horizon)) |>
    dplyr::arrange(.data$relative_wis) |>
    dplyr::pull("location")

  fig <- summarised_scores |>
    dplyr::mutate(location = base::factor(.data$location,
      ordered = TRUE,
      levels = !!ordered_locs
    )) |>
    ggplot2::ggplot(
      ggplot2::aes(
        y = .data$location,
        x = .data$relative_wis,
        group = .data$model
      )
    ) +
    ggplot2::geom_point(
      shape = 21,
      size = 3,
      fill = "darkblue",
      color = "black"
    ) +
    ggplot2::geom_vline(
      xintercept = 1,
      linetype = "dashed"
    ) +
    ggplot2::scale_x_continuous(trans = "log10") +
    ggplot2::coord_cartesian(xlim = sym_xlim) +
    ggplot2::theme_minimal() +
    ggplot2::facet_wrap(~horizon,
      nrow = 1
    )

  return(fig)
}

coverage_plot <- function(data,
                          coverage_level,
                          date_column = "date") {
  coverage_column <- glue::glue("interval_coverage_{100 * coverage_level}")

  return(
    ggplot2::ggplot(
      data = data,
      mapping = ggplot2::aes(
        x = .data[[date_column]],
        y = .data[[coverage_column]]
      )
    ) +
      ggplot2::geom_line(linewidth = 2) +
      ggplot2::geom_point(shape = 21, size = 3, fill = "darkgreen") +
      ggplot2::geom_hline(
        yintercept = coverage_level,
        linewidth = 1.5,
        linetype = "dashed"
      ) +
      ggplot2::facet_wrap(c("horizon")) +
      ggplot2::scale_y_continuous(label = scales::label_percent()) +
      ggplot2::scale_x_date() +
      ggplot2::coord_cartesian(ylim = c(0, 1)) +
      ggplot2::theme_minimal()
  )
}

interval_coverage_80 <- purrr::partial(scoringutils::interval_coverage,
  interval_range = 80
)
interval_coverage_95 <- purrr::partial(scoringutils::interval_coverage,
  interval_range = 95
)


evaluate_and_save <- function(base_hub_path,
                              scores_as_of_date) {
  base_hub_path <- fs::path(base_hub_path)
  hub_table <- get_hub_table(base_hub_path)
  scored_results <- hub_table |>
    scoringutils::score(metrics = c(
      scoringutils::get_metrics(hub_table),
      interval_coverage_80 = interval_coverage_80,
      interval_coverage_95 = interval_coverage_95
    ))

  output_path <- fs::path(base_hub_path, "eval-output")
  fs::dir_create(output_path)
  scoring_output_file <- fs::path(
    output_path,
    glue::glue("{scores_as_of_date}.csv")
  )

  readr::write_csv(scored_results, scoring_output_file)
  message("Raw scores written to ", scoring_output_file)

  summarised_scores <- summarised_scoring_table(
    scored_results,
    scale = "log",
    baseline = "CovidHub-baseline"
  )

  summarised_by_ref_date_horizon <- summarised_scoring_table(
    scored_results,
    scale = "log",
    baseline = "CovidHub-baseline",
    by = c("horizon", "reference_date", "target_end_date")
  )

  summarised_by_location_horizon <- summarised_scoring_table(
    scored_results,
    scale = "log",
    baseline = "CovidHub-baseline",
    by = c("horizon", "location")
  )

  summary_save_path <- fs::path(output_path, "summary_scores.tsv")
  readr::write_tsv(summarised_scores, summary_save_path)

  coverage_plots <- purrr::map(
    c(0.5, 0.8, 0.9, 0.95),
    \(level) {
      coverage_plot(
        summarised_by_ref_date_horizon |>
          dplyr::filter(model == "CovidHub-ensemble"),
        coverage_level = level,
        date_column = "target_end_date"
      )
    }
  )
  forecasttools::plots_to_pdf(
    coverage_plots,
    fs::path(output_path, "coverage_by_date_and_horizon.pdf"),
    width = 8,
    height = 4
  )

  rel_wis_by_date <- plot_scores_by_date(
    summarised_by_ref_date_horizon,
    date_column = "reference_date",
    score_column = "relative_wis",
    model_column = "model"
  )
  ggplot2::ggsave(
    fs::path(output_path, "relative_wis_by_date.pdf"),
    rel_wis_by_date,
    width = 10,
    height = 8
  )



  rel_wis_by_location_horizon <- relative_wis_by_location(
    summarised_by_location_horizon,
    model = "CovidHub-ensemble"
  )
  ggplot2::ggsave(
    fs::path(output_path, "relative_wis_by_location_horizon.pdf"),
    rel_wis_by_location_horizon,
    height = 10,
    width = 8
  )
  message(paste0(
    "Scoring and plotting complete. ",
    "Outputs saved to "
  ), output_path)
}


parser <-
  argparser::arg_parser(paste0(
    "Evaluate forecasts submitted ",
    "to the COVID-19 Forecast Hub"
  )) |>
  argparser::add_argument(
    "--scores-as-of",
    type = "character",
    default = lubridate::today(),
    help = "Date of the scoring run in YYYY-MM-DD format."
  ) |>
  argparser::add_argument(
    "--base-hub-path",
    type = "character",
    default = ".",
    help = "Path to the Hub root directory"
  )

args <- argparser::parse_args(parser)
base_hub_path <- args$base_hub_path
scores_as_of_date <- args$scores_as_of
evaluate_and_save(
  args$base_hub_path,
  args$scores_as_of
)

AFg6K7h4fhy2 · 2025-01-07T17:03:15Z

At present, eval-output is git-ignored.

In the future, we may want to have the outputs produced by the evaluation pipeline stored in the repository and or be regularly updated, perhaps via a GitHub action.

AFg6K7h4fhy2 · 2025-01-08T15:46:19Z

At present, the following plots have been created:

Coverage of each model across states and dates.
Relative (to baseline) WIS of each model by horizon across states.
Relative (to baseline) WIS of each model by dates for each state.

Options remaining to be plotted:

Under/over-prediction
Dispersion
Bias
Relative (to ensemble) WIS of each model by dates for each state.
Relative (to each model) WIS of each model by dates for each state.

dylanhmorris · 2025-01-08T17:58:43Z

Under/over-prediction

Dispersion

This would be nice to have. These two can/should be on the same plot. You can use scoringutils::plot_wis_components() or DIY it.

Bias

This is also nice to have. For each model, would be good to see:

overall bias across locations and timepoint
overall bias across timepoints, by location
overall bias across locations, by timepoint.

Relative (to ensemble) WIS of each model by dates for each state.

This might be nice to have, but it is lower priority.

Relative (to each model) WIS of each model by dates for each state.

I do not think we need this.

AFg6K7h4fhy2 added the enhancement New feature or request label Dec 17, 2024

AFg6K7h4fhy2 self-assigned this Dec 17, 2024

AFg6K7h4fhy2 linked a pull request Dec 18, 2024 that will close this issue

Forecast Submission Evaluation #189

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forecast Submission Evaluation #184

Forecast Submission Evaluation #184

AFg6K7h4fhy2 commented Dec 17, 2024

AFg6K7h4fhy2 commented Dec 19, 2024 •

edited

Loading

AFg6K7h4fhy2 commented Jan 6, 2025

AFg6K7h4fhy2 commented Jan 6, 2025

AFg6K7h4fhy2 commented Jan 6, 2025

AFg6K7h4fhy2 commented Jan 7, 2025 •

edited

Loading

AFg6K7h4fhy2 commented Jan 7, 2025

AFg6K7h4fhy2 commented Jan 8, 2025

dylanhmorris commented Jan 8, 2025 •

edited

Loading

Forecast Submission Evaluation #184

Forecast Submission Evaluation #184

Comments

AFg6K7h4fhy2 commented Dec 17, 2024

AFg6K7h4fhy2 commented Dec 19, 2024 • edited Loading

AFg6K7h4fhy2 commented Jan 6, 2025

AFg6K7h4fhy2 commented Jan 6, 2025

AFg6K7h4fhy2 commented Jan 6, 2025

AFg6K7h4fhy2 commented Jan 7, 2025 • edited Loading

AFg6K7h4fhy2 commented Jan 7, 2025

AFg6K7h4fhy2 commented Jan 8, 2025

dylanhmorris commented Jan 8, 2025 • edited Loading

AFg6K7h4fhy2 commented Dec 19, 2024 •

edited

Loading

AFg6K7h4fhy2 commented Jan 7, 2025 •

edited

Loading

dylanhmorris commented Jan 8, 2025 •

edited

Loading