Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models

Xingrui Wang¹, Wufei Ma¹, Tiezheng Zhang¹, Celso M de Melo², Jieneng Chen†¹, Alan Yuille†¹

¹Johns Hopkins University ²DEVCOM Army Research Laboratory

Project Page / Paper / Huggingface Data Card 🤗 / Code

Official implementation of the CVPR 2025 (Highlight) paper:
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models

🧠 Introduction

Spatial457 is a diagnostic benchmark designed to evaluate the 6D spatial reasoning capabilities of large multimodal models (LMMs). It systematically introduces four key capabilities—multi-object understanding, 2D and 3D localization, and 3D orientation—across five difficulty levels and seven question types, progressing from basic recognition to complex physical interaction.

📦 Download

You can access the full dataset and evaluation toolkit:

Dataset: Hugging Face
Code: GitHub Repository
Paper: arXiv 2502.08636

🔥Run benchmark with VLMEvalKit.

Spatial457 is also support by VLMEvalKit! Please try here for quick evaluation on most of the VLM. Evaluation can be done be running run.py in VLMEvalKit:

python run.py --data Spatial457 --model <model_name>

Customized objects

We use blender to render the scenes, so you can also add customed objects in to dataset. We also support you customized you own questions type / templates for your studies. The source code of dataset generation is avaiable soon.

Generate images

See image_generation/README.md

The result will contains forder of image, and a json file scene annotation

Generate questions

Run bash script to generate all levels of question. Set input_scene_file as the json file scene annotation.

bash scripts/generate_questions.sh

Citation

@inproceedings{wang2025spatial457,
  title     = {Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models},
  author    = {Wang, Xingrui and Ma, Wufei and Zhang, Tiezheng and de Melo, Celso M and Chen, Jieneng and Yuille, Alan},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2025},
  url       = {https://arxiv.org/abs/2502.08636}
}

Content and toolkit are actively being updated. Stay tuned!

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
image_generation		image_generation
imgs		imgs
question_generation		question_generation
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models

Xingrui Wang¹, Wufei Ma¹, Tiezheng Zhang¹, Celso M de Melo², Jieneng Chen†¹, Alan Yuille†¹

🧠 Introduction

📦 Download

🔥Run benchmark with VLMEvalKit.

Customized objects

Generate images

Generate questions

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

XingruiWang/Spatial457

Folders and files

Latest commit

History

Repository files navigation

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models

Xingrui Wang1, Wufei Ma1, Tiezheng Zhang1, Celso M de Melo2, Jieneng Chen†1, Alan Yuille†1

🧠 Introduction

📦 Download

🔥Run benchmark with VLMEvalKit.

Customized objects

Generate images

Generate questions

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Xingrui Wang¹, Wufei Ma¹, Tiezheng Zhang¹, Celso M de Melo², Jieneng Chen†¹, Alan Yuille†¹

Packages