Evaluation, Reproducibility, Benchmarks Meeting 42

Minutes of Meeting 42

Date: 1st April, 2026

Present

Olivier
Nick
Rucha
Carole
Anne
Michela

Updates

Idea from Rucha
- Can we look at evaluation from the deployment side? For example, is this model suitable for a given site/population of images
- Could do a sort of user study, where we talk to people doing deployments and try to understand what the needs are
- Important to use tools that are already there -- metrics, etc.
- Next steps -- Rucha will reach out to member of the deployment working group to evaluate needs and make a plan
From Olivier
- Working to identify datasets to show the utility of the CI project across different modalities -- path, radiology, surgical videos, etc.
  - There are common pitfalls associated with each (failing to evaluate metrics at the patient level, for example)
- Ideally we would find benchmarks with lots of trained models available -- hopefully models will have respected official splits, but this is sometimes dubious
- Given the abundance of foundation models for 2D data, we are seeing more and more papers that sample/aggregate over 3D and expose users to these pitfalls
- Focus is on hierarchical data
From Carole
- Got good feedback on implementation
- Still working on software paper, will plan to send draft shortly
- Targeting IEEE MLMI
From Michela
- Becoming more available now -- will put something more concrete together for next meeting
- Rough ideas
  - Looking at different datasets and how interobserver/intraobserver variance plays a role
  - Questions of generalizability/idiosyncrasy of datasets
- DKFZ is building a data curation unit (per Annika) -- could be an interesting resource to connect with once they become more established
- Interesting questions around metrics used for quality
- Questions about semi-automatic ground truth labeling and biases that it introduces
- Questions about prevalence differences and representativeness of the set used for QC
- Carole going to put papers in GDrive

Administrative Item

Nick stepping down as secretary, Michela elected to fill roll
- Nick will share instructions will Michela offline
Anne also very time limited moving forward -- offering to step down if filling her spot with another person might be able to contribute more
- Perhaps could recruit someone from her team to ensure that we keep representation from platform side
Next meeting to be moved to the 29th of April

Copyright (c) MONAI Consortium

Evaluation, Reproducibility, Benchmarks Meeting 42

Minutes of Meeting 42

Present

Updates

Administrative Item

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!