Accepted at [Medical Image Understanding and Analysis (MIUA) 2025]
- 🧠 MVLM Benchmarking: Evaluate 5 major and recent MVLMs across 5 modalities, 7 corruption types, and 5 severity levels
- 📉 Corruption Evaluation: Analyze degradation under Gaussian noise, motion blur, pixelation, etc.
- 🔬 MediMeta-C: A new benchmark simulating real-world OOD shifts in high-res medical images
- 🧪 Few-shot Robustness: RobustMedCLIP uses just 1-10% of clean data for adaptation
- 🧠 LoRA Efficient Tuning: Low-rank fine-tuning in transformer attention layers
Overview of the RobustMedCLIP pipeline: A) Few-shot Sampling of Clean Samples from MediMeta and MedMNIST across 5 modalities; B) Fine-tuning LoRA adapters using Few-shot samples; C) Distribution Shifts of MediMeta-C compared to Clean samples; D) Evaluation Results across Top-1 Accuracy and Corruption Error for 4 baselines and RobustMedCLIP.
git clone https://github.com/BioMedIA-MBZUAI/RobustMedCLIP.git
cd RobustMedCLIP
conda create -n robustmedclip python=3.12.7
conda activate robustmedclip
pip install -r requirements.txt
pip install hugginface_hub
You will also need <YOUR-HUGGINGFACE-TOKEN>
with your personal Hugging Face access token, to directly download Datasets and Model Weights.
To create an access token, go to your Huggingface Settings
, then click on the Access Tokens
tab. Click on the New token button to create a new User Access Token.
All baseline and RobustMedCLIP model checkpoints are available for direct download via Hugging Face at RobustMedCLIP:
huggingface-cli download razaimam45/RobustMedCLIP \
--local-dir ./outputs \
--repo-type model \
--token <YOUR-HUGGINGFACE-TOKEN>
📁 Outputs
Folder Structure: The outputs/
folder (should be in root folder) contains all trained model weights and evaluation results:
outputs/
├── checkpoints/ # Baseline MVLMs (MedCLIP, UniMedCLIP)
├── exp-rank-8/ # RobustMedCLIP (LoRA Rank = 8) for ViT and ResNet across few-shots (1/3/7/10)%
├── exp-rank-16/ # RobustMedCLIP (LoRA Rank = 16) for ViT and ResNet across few-shots (1/3/7/10)%
└── results/ # Evaluation logs across mCE/Accuracy metrics
This project proposes MediMeta-C as corruption benchmark; and evaluates MVLMs on MedMNIST-C and MediMeta-C benchmarks.
Dataset | Modality | Clean Samples | Corruption Sets | Resolution |
---|---|---|---|---|
MediMeta-C | Multi-modality | 5 Modalities | 7 corruptions × 5 levels | High-res |
MedMNIST-C | Public Benchmark | 5 Modalities | 7 corruptions × 5 levels | Low-res |
The MediMeta-C dataset is hosted on HuggingFace and organized as follows:
MediMeta-C/
├── pbc/ # Blood Cell modality
│ ├── test/ # Test set
│ │ ├── clean.npz # Clean samples
│ │ ├── brightness_severity_1.npz
│ │ ├── brightness_severity_2.npz
│ │ ├── ... # Other severity levels
│ │ └── brightness_severity_5.npz
│ ├── val/ # Validation set
│ ├── clean.npz
│ ├── contrast_severity_1.npz
│ ├── contrast_severity_2.npz
│ ├── ... # Other severity levels
│ └── contrast_severity_5.npz
├── fundus/ # Fundus modality
│ ├── test/
│ ├── val/
│ └── ... # Similar structure as above
├── ... # Other modalities
└── README.md # Dataset description
You can download the dataset from: MediMeta-C, and MedMNIST-C. The downloaded folder data/MediMeta-C
should be in the root of the project folder.
huggingface-cli download razaimam45/MediMeta-C --local-dir ./data/MediMeta-C --repo-type dataset --token <YOUR-HUGGINGFACE-TOKEN>
You can fine-tune RobustMedCLIP with either ViT or ResNet backbones:
# Fine-tune with ViT backbone (e.g., BioMedCLIP)
bash scripts/run_finetune_vit.sh
# Fine-tune with ResNet backbone (e.g., MedCLIP)
bash scripts/run_finetune_resnet.sh
Evaluate a fine-tuned or pretrained MVLM (including RMedCLIP):
# Evaluation for RobustMedCLIP (RMC)
bash scripts/run_eval_rmed.sh
# Custom evaluation on other models (rmedclip, biomedclip, unimedclip, medclip, clip)
python evaluate.py --model rmedclip \
--backbone vit \
--gpu 0 --corruptions all --collection medimeta
RobustMedCLIP consistently outperforms prior MVLMs under corruptions across all modalities:
Model | Clean Error ↓ | mCE ↓ (avg) |
---|---|---|
CLIP | 100.0 | 100.0 |
MedCLIP | 106.4 | 112.5 |
BioMedCLIP | 116.3 | 126.8 |
UniMedCLIP | 111.8 | 98.87 |
RMedCLIP | 62.8 | 81.0 |
Detailed benchmarks available in Results and Discussions
.
If you find this repository helpful, please cite our paper:
@misc{imam2025robustnessmedicalvisionlanguagemodels,
title={On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?},
author={Raza Imam and Rufael Marew and Mohammad Yaqub},
year={2025},
eprint={2505.15425},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.15425},
}
- Built on top of BioMedCLIP and MedCLIP
- MediMeta-C corruption designs are inspired by ImageNet-C and MedMNIST-C
For questions, contact: [email protected]