Mai A. Shaaban , Adnan Khan
, Mohammad Yaqub
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
School of Computer Science, Carleton University, Ottawa, CA
- A novel multimodal diagnostic model for chest X-ray images that harnesses multimodal LLMs (MLLMs), few-shot prompting (FP) and visual grounding (VG), enabling more accurate prediction of abnormalities.
- Mitigating of the incompleteness in EHR data by transforming inputs into a textual form, adopting pre-trained MLLMs.
- Extracting the logical patterns discerned from the few-shot data efficiently by implementing a new dynamic proximity selection technique, which allows for the capture of the underlying semantics.
2024/03/26
: Code is released!2024/05/12
: The MedPromptX-VQA dataset is released!
Create environment:
conda create -n MedPromptX python=3.8
Install dependencies: (we assume GPU device / cuda available):
cd env
source install.sh
Now, you should be all set.
-
Go to scripts/
-
Run:
python main.py --model Med-Flamingo --prompt_type few-shot --modality multimodal --lang_encoder huggyllama/llama-7b --num_shots 6 --data_path prompts_6_shot --dps_type similarity --dps_modality both --vg True
If you find our work helpful for your research, please consider citing the following BibTeX entry.
@inproceedings{shaaban2025medpromptx,
title = {MedPromptX: Grounded Multimodal Prompting for~Chest X-Ray Diagnosis},
author = {Shaaban, Mai A. and Khan, Adnan and Yaqub, Mohammad},
year = {2025},
booktitle = {Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024 Workshops},
publisher = {Springer Nature Switzerland},
address = {Cham},
pages = {211--222},
isbn = {978-3-031-84525-3}
}
Our code utilizes the following codebases: Med-Flamingo and GroundingDINO. We express gratitude to the authors for sharing their code and kindly request that you consider citing these works if you use our code.