[Bug]: GLM-4.1V-Thinking ValueError

### Your current environment

vllm==0.9.2.  the model weights from https://huggingface.co/zai-org/GLM-4.1V-9B-Thinking/tree/main


### 🐛 Describe the bug

bug:
ERROR 07-29 06:54:54 [core.py:588] EngineCore encountered a fatal error.
ERROR 07-29 06:54:54 [core.py:588] Traceback (most recent call last):
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 579, in run_engine_core
ERROR 07-29 06:54:54 [core.py:588]     engine_core.run_busy_loop()
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 606, in run_busy_loop
ERROR 07-29 06:54:54 [core.py:588]     self._process_engine_step()
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 631, in _process_engine_step
ERROR 07-29 06:54:54 [core.py:588]     outputs, model_executed = self.step_fn()
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 235, in step
ERROR 07-29 06:54:54 [core.py:588]     model_output = self.execute_model(scheduler_output)
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 221, in execute_model
ERROR 07-29 06:54:54 [core.py:588]     raise err
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 212, in execute_model
ERROR 07-29 06:54:54 [core.py:588]     return self.model_executor.execute_model(scheduler_output)
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/abstract.py", line 87, in execute_model
ERROR 07-29 06:54:54 [core.py:588]     output = self.collective_rpc("execute_model",
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
ERROR 07-29 06:54:54 [core.py:588]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/utils/__init__.py", line 2736, in run_method
ERROR 07-29 06:54:54 [core.py:588]     return func(*args, **kwargs)
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-29 06:54:54 [core.py:588]     return func(*args, **kwargs)
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_worker.py", line 308, in execute_model
ERROR 07-29 06:54:54 [core.py:588]     output = self.model_runner.execute_model(scheduler_output,
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-29 06:54:54 [core.py:588]     return func(*args, **kwargs)
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1332, in execute_model
ERROR 07-29 06:54:54 [core.py:588]     inputs_embeds = self.model.get_input_embeddings(
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/glm4_1v.py", line 1474, in get_input_embeddings
ERROR 07-29 06:54:54 [core.py:588]     inputs_embeds = merge_multimodal_embeddings(
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 505, in merge_multimodal_embeddings
ERROR 07-29 06:54:54 [core.py:588]     return _merge_multimodal_embeddings(
ERROR 07-29 06:54:54 [core.py:588]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 427, in _merge_multimodal_embeddings
ERROR 07-29 06:54:54 [core.py:588]     raise ValueError(
ERROR 07-29 06:54:54 [core.py:588] ValueError: Attempted to assign 646 = 646 multimodal tokens to 1292 placeholders

vllm==0.9.2 

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: GLM-4.1V-Thinking ValueError #21811

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: GLM-4.1V-Thinking ValueError #21811

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions