Skip to content

XingruiWang/Spatial457

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatial457 Logo

1Johns Hopkins University    2DEVCOM Army Research Laboratory

Project Page / Paper / Huggingface Data Card 🤗 / Code

Spatial457 Teaser

Official implementation of the CVPR 2025 (Highlight) paper:
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models

🧠 Introduction

Spatial457 is a diagnostic benchmark designed to evaluate the 6D spatial reasoning capabilities of large multimodal models (LMMs). It systematically introduces four key capabilities—multi-object understanding, 2D and 3D localization, and 3D orientation—across five difficulty levels and seven question types, progressing from basic recognition to complex physical interaction.

📦 Download

You can access the full dataset and evaluation toolkit:

🔥Run benchmark with VLMEvalKit.

Spatial457 is also support by VLMEvalKit! Please try here for quick evaluation on most of the VLM. Evaluation can be done be running run.py in VLMEvalKit:

python run.py --data Spatial457 --model <model_name>

Customized objects

We use blender to render the scenes, so you can also add customed objects in to dataset. We also support you customized you own questions type / templates for your studies. The source code of dataset generation is avaiable soon.

Generate images

See image_generation/README.md

The result will contains forder of image, and a json file scene annotation

Generate questions

Run bash script to generate all levels of question. Set input_scene_file as the json file scene annotation.

bash scripts/generate_questions.sh

Citation

@inproceedings{wang2025spatial457,
  title     = {Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models},
  author    = {Wang, Xingrui and Ma, Wufei and Zhang, Tiezheng and de Melo, Celso M and Chen, Jieneng and Yuille, Alan},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2025},
  url       = {https://arxiv.org/abs/2502.08636}
}

Content and toolkit are actively being updated. Stay tuned!

About

[CVPR'25] A vision question answering (VQA) benchmark for 6D spatial reasoning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •