M-SpecGene

M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision (ICCV 2025)

Brief Introduction

RGB-Thermal (RGBT) multispectral vision is essential for robust perception in complex environments. Most RGBT tasks follow a case-by-case research paradigm, relying on manually customized models to learn task-oriented representations. Nevertheless, this paradigm is inherently constrained by artificial inductive bias, modality bias, and data bottleneck. To address these limitations, we make the initial attempt to build a Generalized RGBT MultiSpectral foundation model (M-SpecGene), which aims to learn modality-invariant representations from large-scale broad data in a self-supervised manner. M-SpecGene provides new insights into multispectral fusion and integrates prior case-by-case studies into a unified paradigm. Considering the unique characteristic of information imbalance in RGBT data, we introduce the Cross-Modality Structural Sparsity (CMSS) metric to quantify the information density across two modalities. Then we develop the GMM-CMSS progressive masking strategy to facilitate a flexible, easy-to-hard, and object-centric pre-training process. Comprehensive experiments validate M-SpecGene’s generalizability across eleven datasets for four RGBT downstream tasks.

RGBT550K Dataset

To pretrain a multispectral foundation model with robust generalization capabilities, we exert our utmost efforts to make a comprehensive collection of available RGBT datasets. The multispectral (RGBT) image datasets can be found at A Summary of Multispectral (RGBT) Image Datasets. Our meticulous collection and preprocessing yields RGBT550K, a comprehensive dataset comprising 548,238 high-quality samples. It encompasses diverse scenarios, tasks, lighting conditions, resolutions, and object categories, providing a solid foundation for the self-supervised pre-training of the multispectral foundation model. You can download the RGBT550K dataset from Baidu Cloud (code: rwf7) or One Dirve (to be updated).

`RGBT550K Usage`
sudo apt install p7zip-full
7z x RGBT550K_archive.7z.partaa

Pretrained Models

Pretrain	Backbone	Model Weights
M-SpecGene	ViT-S
M-SpecGene	ViT-B

Usage

Pretraining

Finetuning

RGBT Multispectral Object Detection
RGBT Multispectral Semantic Segmentation
RGBT Cross-modality Feature Matching
RGBT Multispectral Salient Object Detection

Citation

@article{zhou2025m,
  title={M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision},
  author={Zhou, Kailai and Yang, Fuqiang and Wang, Shixian and Wen, Bihan and Zi, Chongde and Chen, Linsen and Shen, Qiu and Cao, Xun},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
img		img

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

M-SpecGene

Brief Introduction

RGBT550K Dataset

Pretrained Models

Usage

Pretraining

Finetuning

Citation

About

Uh oh!

Releases

Packages

License

CalayZhou/M-SpecGene

Folders and files

Latest commit

History

Repository files navigation

M-SpecGene

Brief Introduction

RGBT550K Dataset

Pretrained Models

Usage

Pretraining

Finetuning

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages