[Bugfix][VLM] Fix transformers backend embed_multimodal for Qwen2.5-VL profiling #32969

AndreasKaratzas · 2026-01-23T21:33:41Z

Fixes RuntimeError: split_with_sizes expects split_sizes to sum exactly to 1 (input tensor's size at dimension 0), but got split_sizes=[4900, 4900] when running Qwen2.5-VL with the transformers backend.

Root Cause

During memory profiling, vLLM creates dummy encoder outputs with minimal size (shape [1, hidden_dim]). The original embed_multimodal code called unsqueeze(0) on 2D tensors, then attempted torch.split() with sizes derived from num_image_patches (e.g., [4900, 4900]), causing a dimension mismatch.

Fix

Replace unsqueeze(0) with proper view(-1, hidden_dim) for 3D to 2D flattening
Handle size mismatches during profiling by expanding/trimming embeddings to expected size
Remove redundant .flatten(start_dim=0, end_dim=-2) loop (no-op on 2D tensors)

Testing

pytest tests/models/multimodal/generation/test_common.py::test_single_image_models[qwen2_5_vl-transformers-test_case53] -s -v

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

The pull request effectively addresses the RuntimeError encountered during Qwen2.5-VL profiling by correctly handling the dimensions of vision_embeddings before splitting. The changes to flatten the embeddings to 2D, manage size mismatches during profiling, and remove redundant operations are well-implemented and directly resolve the reported issue. This improves the robustness and correctness of the multimodal embedding process.

AndreasKaratzas · 2026-01-23T21:42:00Z

cc @tjtanaa @DarkLight1337 Let me know guys if you can help reviewing this 😃

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-01-23T21:42:20Z

vllm/model_executor/models/transformers/multimodal.py

+                    if actual_size < total_expected:
+                        repeat_factor = (
+                            total_expected + actual_size - 1
+                        ) // actual_size


Division by zero if vision embeddings tensor is empty

Low Severity

The repeat_factor calculation divides by actual_size without checking if it's zero. If vision_embeddings.shape[0] is 0 (empty tensor) while total_expected > 0, this causes a ZeroDivisionError. The code enters the if actual_size < total_expected block since 0 < positive is true, then attempts (total_expected + 0 - 1) // 0.

I think failing fast is the better approach here - an empty tensor from the vision encoder indicates an upstream bug that shouldn't be silently papered over. Added an explicit check with a clear error message.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

fix(transformers): fix embed_multimodal tensor split for VLM profiling

f75973d

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested a review from hmellor as a code owner January 23, 2026 21:33

github-project-automation bot added this to Transformers backend Jan 23, 2026

github-project-automation bot moved this to Todo in Transformers backend Jan 23, 2026

mergify bot added qwen Related to Qwen models bug Something isn't working labels Jan 23, 2026

gemini-code-assist bot reviewed Jan 23, 2026

View reviewed changes

cursor bot reviewed Jan 23, 2026

View reviewed changes

fix: add guard for empty vision embeddings in embed_multimodal

6a7f266

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas mentioned this pull request Jan 23, 2026

[CI Failure]: mi325_1: Multi-Modal Models Test (Extended) 2 #29536

Open

3 tasks

DarkLight1337 requested a review from Isotr0py January 24, 2026 09:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][VLM] Fix transformers backend embed_multimodal for Qwen2.5-VL profiling #32969

[Bugfix][VLM] Fix transformers backend embed_multimodal for Qwen2.5-VL profiling #32969

AndreasKaratzas commented Jan 23, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

AndreasKaratzas commented Jan 23, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 23, 2026

Uh oh!

AndreasKaratzas Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[Bugfix][VLM] Fix transformers backend embed_multimodal for Qwen2.5-VL profiling #32969

Are you sure you want to change the base?

[Bugfix][VLM] Fix transformers backend embed_multimodal for Qwen2.5-VL profiling #32969

Conversation

AndreasKaratzas commented Jan 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause

Fix

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

AndreasKaratzas commented Jan 23, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 23, 2026

Choose a reason for hiding this comment

Division by zero if vision embeddings tensor is empty

Uh oh!

AndreasKaratzas Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AndreasKaratzas commented Jan 23, 2026 •

edited by github-actions bot

Loading