Skip to content

Conversation

@AndreasKaratzas
Copy link
Collaborator

@AndreasKaratzas AndreasKaratzas commented Jan 23, 2026

Fixes RuntimeError: split_with_sizes expects split_sizes to sum exactly to 1 (input tensor's size at dimension 0), but got split_sizes=[4900, 4900] when running Qwen2.5-VL with the transformers backend.

Root Cause

During memory profiling, vLLM creates dummy encoder outputs with minimal size (shape [1, hidden_dim]). The original embed_multimodal code called unsqueeze(0) on 2D tensors, then attempted torch.split() with sizes derived from num_image_patches (e.g., [4900, 4900]), causing a dimension mismatch.

Fix

  1. Replace unsqueeze(0) with proper view(-1, hidden_dim) for 3D to 2D flattening
  2. Handle size mismatches during profiling by expanding/trimming embeddings to expected size
  3. Remove redundant .flatten(start_dim=0, end_dim=-2) loop (no-op on 2D tensors)

Testing

pytest tests/models/multimodal/generation/test_common.py::test_single_image_models[qwen2_5_vl-transformers-test_case53] -s -v

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@mergify mergify bot added qwen Related to Qwen models bug Something isn't working labels Jan 23, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request effectively addresses the RuntimeError encountered during Qwen2.5-VL profiling by correctly handling the dimensions of vision_embeddings before splitting. The changes to flatten the embeddings to 2D, manage size mismatches during profiling, and remove redundant operations are well-implemented and directly resolve the reported issue. This improves the robustness and correctness of the multimodal embedding process.

@AndreasKaratzas
Copy link
Collaborator Author

cc @tjtanaa @DarkLight1337 Let me know guys if you can help reviewing this 😃

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

if actual_size < total_expected:
repeat_factor = (
total_expected + actual_size - 1
) // actual_size
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Division by zero if vision embeddings tensor is empty

Low Severity

The repeat_factor calculation divides by actual_size without checking if it's zero. If vision_embeddings.shape[0] is 0 (empty tensor) while total_expected > 0, this causes a ZeroDivisionError. The code enters the if actual_size < total_expected block since 0 < positive is true, then attempts (total_expected + 0 - 1) // 0.

Fix in Cursor Fix in Web

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think failing fast is the better approach here - an empty tensor from the vision encoder indicates an upstream bug that shouldn't be silently papered over. Added an explicit check with a clear error message.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant