RobustMedCLIP: On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?

Accepted at [Medical Image Understanding and Analysis (MIUA) 2025]

🚀 Highlights

🧠 MVLM Benchmarking: Evaluate 5 major and recent MVLMs across 5 modalities, 7 corruption types, and 5 severity levels
📉 Corruption Evaluation: Analyze degradation under Gaussian noise, motion blur, pixelation, etc.
🔬 MediMeta-C: A new benchmark simulating real-world OOD shifts in high-res medical images
🧪 Few-shot Robustness: RobustMedCLIP uses just 1-10% of clean data for adaptation
🧠 LoRA Efficient Tuning: Low-rank fine-tuning in transformer attention layers

Overview of the RobustMedCLIP pipeline: A) Few-shot Sampling of Clean Samples from MediMeta and MedMNIST across 5 modalities; B) Fine-tuning LoRA adapters using Few-shot samples; C) Distribution Shifts of MediMeta-C compared to Clean samples; D) Evaluation Results across Top-1 Accuracy and Corruption Error for 4 baselines and RobustMedCLIP.

📦 Installation

git clone https://github.com/BioMedIA-MBZUAI/RobustMedCLIP.git
cd RobustMedCLIP
conda create -n robustmedclip python=3.12.7
conda activate robustmedclip
pip install -r requirements.txt
pip install hugginface_hub

You will also need <YOUR-HUGGINGFACE-TOKEN> with your personal Hugging Face access token, to directly download Datasets and Model Weights.
To create an access token, go to your Huggingface Settings, then click on the Access Tokens tab. Click on the New token button to create a new User Access Token.

🧠 Models

All baseline and RobustMedCLIP model checkpoints are available for direct download via Hugging Face at RobustMedCLIP:

huggingface-cli download razaimam45/RobustMedCLIP \
  --local-dir ./outputs \
  --repo-type model \
  --token <YOUR-HUGGINGFACE-TOKEN>

📁 Outputs Folder Structure: The outputs/ folder (should be in root folder) contains all trained model weights and evaluation results:

outputs/
├── checkpoints/       # Baseline MVLMs (MedCLIP, UniMedCLIP)
├── exp-rank-8/        # RobustMedCLIP (LoRA Rank = 8) for ViT and ResNet across few-shots (1/3/7/10)%
├── exp-rank-16/       # RobustMedCLIP (LoRA Rank = 16) for ViT and ResNet across few-shots (1/3/7/10)%
└── results/           # Evaluation logs across mCE/Accuracy metrics

🧬 Datasets

This project proposes MediMeta-C as corruption benchmark; and evaluates MVLMs on MedMNIST-C and MediMeta-C benchmarks.

Dataset	Modality	Clean Samples	Corruption Sets	Resolution
MediMeta-C	Multi-modality	5 Modalities	7 corruptions × 5 levels	High-res
MedMNIST-C	Public Benchmark	5 Modalities	7 corruptions × 5 levels	Low-res

📂 Dataset Structure

The MediMeta-C dataset is hosted on HuggingFace and organized as follows:

MediMeta-C/
├── pbc/                  # Blood Cell modality
│   ├── test/             # Test set
│   │   ├── clean.npz     # Clean samples
│   │   ├── brightness_severity_1.npz
│   │   ├── brightness_severity_2.npz
│   │   ├── ...           # Other severity levels
│   │   └── brightness_severity_5.npz
│   ├── val/              # Validation set
│       ├── clean.npz
│       ├── contrast_severity_1.npz
│       ├── contrast_severity_2.npz
│       ├── ...           # Other severity levels
│       └── contrast_severity_5.npz
├── fundus/               # Fundus modality
│   ├── test/
│   ├── val/
│   └── ...               # Similar structure as above
├── ...                   # Other modalities
└── README.md             # Dataset description

You can download the dataset from: MediMeta-C, and MedMNIST-C. The downloaded folder data/MediMeta-C should be in the root of the project folder.

huggingface-cli download razaimam45/MediMeta-C --local-dir ./data/MediMeta-C --repo-type dataset --token <YOUR-HUGGINGFACE-TOKEN>

🔧 Usage

1. Few-Shot Tuning

You can fine-tune RobustMedCLIP with either ViT or ResNet backbones:

# Fine-tune with ViT backbone (e.g., BioMedCLIP)
bash scripts/run_finetune_vit.sh

# Fine-tune with ResNet backbone (e.g., MedCLIP)
bash scripts/run_finetune_resnet.sh

2. Evaluation

Evaluate a fine-tuned or pretrained MVLM (including RMedCLIP):

# Evaluation for RobustMedCLIP (RMC)
bash scripts/run_eval_rmed.sh

# Custom evaluation on other models (rmedclip, biomedclip, unimedclip, medclip, clip) 
python evaluate.py --model rmedclip \
                   --backbone vit \
                   --gpu 0 --corruptions all --collection medimeta

📊 Results

RobustMedCLIP consistently outperforms prior MVLMs under corruptions across all modalities:

Model	Clean Error ↓	mCE ↓ (avg)
CLIP	100.0	100.0
MedCLIP	106.4	112.5
BioMedCLIP	116.3	126.8
UniMedCLIP	111.8	98.87
RMedCLIP	62.8	81.0

Detailed benchmarks available in Results and Discussions.

✏️ Citation

If you find this repository helpful, please cite our paper:

@misc{imam2025robustnessmedicalvisionlanguagemodels,
      title={On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?}, 
      author={Raza Imam and Rufael Marew and Mohammad Yaqub},
      year={2025},
      eprint={2505.15425},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.15425}, 
}

🤝 Acknowledgements

Built on top of BioMedCLIP and MedCLIP
MediMeta-C corruption designs are inspired by ImageNet-C and MedMNIST-C

For questions, contact: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
scripts		scripts
.gitignore		.gitignore
README.md		README.md
cls_to_names.py		cls_to_names.py
dataset.py		dataset.py
evaluate.py		evaluate.py
finetune.py		finetune.py
models.py		models.py
requirements.txt		requirements.txt
test.ipynb		test.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RobustMedCLIP: On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?

🚀 Highlights

📦 Installation

🧠 Models

🧬 Datasets

📂 Dataset Structure

🔧 Usage

1. Few-Shot Tuning

2. Evaluation

📊 Results

✏️ Citation

🤝 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

BioMedIA-MBZUAI/RobustMedCLIP

Folders and files

Latest commit

History

Repository files navigation

RobustMedCLIP: On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?

🚀 Highlights

📦 Installation

🧠 Models

🧬 Datasets

📂 Dataset Structure

🔧 Usage

1. Few-Shot Tuning

2. Evaluation

📊 Results

✏️ Citation

🤝 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages