Skip to content

[Bug]: 0.9.rc2 with ngram spec decoding report assert attn_metadata.attn_mask is not None error #2071

@didongli182

Description

@didongli182

Your current environment

The output of `python collect_env.py`
vllm                              0.9.2+empty             /vllm-workspace/vllm
vllm_ascend                       0.9.2rc1                /vllm-workspace/vllm-ascend

🐛 Describe the bug

use vllm 0.9.2 and vllm-ascend 0.9.2rc1 by cmd:

vllm serve $MODEL \
    --served-model-name qwen \
    --tensor-parallel-size $TP \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --compilation_config '{"cudagraph_capture_sizes":[1,4,8,16,24,32]}' \
    --speculative_config '{"method":"ngram","num_speculative_tokens":2,"prompt_lookup_max":4}' \
    --quantization "ascend" \
    --disable-log-requests

use vllm benchmark script

python benchmarks/benchmark_serving.py \
        --backend vllm \
        --model /mnt/models/Qwen3-32B \
        --dataset-name random \
        --random-input-len 2048  \
        --random-output-len 2048 \
        --num-prompts 16 \
        --request-rate 4 \
        --max-concurrency 4 \
        --port 8000 \
        --host 10.50.95.185 \
        --served-model-name qwen

error info

INFO:     Started server process [15221]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     10.50.95.185:55526 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 07-28 12:46:59 [loggers.py:118] Engine 000: Avg prompt throughput: 204.8 tokens/s, Avg generation throughput: 30.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 0.0%
INFO 07-28 12:47:09 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 36.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 0.0%
INFO 07-28 12:47:19 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 36.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.5%, Prefix cache hit rate: 0.0%
INFO 07-28 12:47:29 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 36.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.5%, Prefix cache hit rate: 0.0%
INFO:     10.50.95.185:56174 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     10.50.95.185:56182 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     10.50.95.185:56188 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     10.50.95.185:56194 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 07-28 12:47:39 [loggers.py:118] Engine 000: Avg prompt throughput: 409.6 tokens/s, Avg generation throughput: 33.9 tokens/s, Running: 3 reqs, Waiting: 1 reqs, GPU KV cache usage: 0.9%, Prefix cache hit rate: 23.4%
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     output = func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1406, in execute_model
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     num_scheduled_tokens_np) = (self._process_reqs(
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1132, in _process_reqs
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     hidden_states = self.model(
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     output = func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1406, in execute_model
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     num_scheduled_tokens_np) = (self._process_reqs(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1132, in _process_reqs
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     model_output = self.forward(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     hidden_states = self.model(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     def forward(
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return fn(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     model_output = self.forward(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     def forward(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     raise e
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     output = func(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     output = func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return fn(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return func(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1406, in execute_model
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     num_scheduled_tokens_np) = (self._process_reqs(
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "<eval_with_key>.130", line 1296, in forward
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1132, in _process_reqs
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     submod_1 = self.submod_1(getitem, s0, getitem_1, getitem_2, getitem_3);  getitem = getitem_1 = getitem_2 = submod_1 = None
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     raise e
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     hidden_states = self.model(
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1406, in execute_model
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     num_scheduled_tokens_np) = (self._process_reqs(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1132, in _process_reqs
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     hidden_states = self.model(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     raise e
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "<eval_with_key>.130", line 1232, in forward
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     submod_1 = self.submod_1(getitem, s0, getitem_1, getitem_2, getitem_3);  getitem = getitem_1 = getitem_2 = submod_1 = None
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     model_output = self.forward(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     def forward(
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "<eval_with_key>.2", line 5, in forward
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     model_output = self.forward(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     unified_ascend_attention_with_output = torch.ops.vllm.unified_ascend_attention_with_output(query = query_1, key = key_1, value = value, output = output_2, layer_name = 'model.layers.0.self_attn.attn');  query_1 = key_1 = value = output_2 = unified_ascend_attention_with_output = None
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     raise e
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_ops.py", line 1116, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     def forward(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._op(*args, **(kwargs or {}))
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 451, in unified_ascend_attention_with_output
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     self.impl.forward(self,
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return fn(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 413, in forward
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     assert attn_metadata.attn_mask is not None
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] AssertionError
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "<eval_with_key>.2", line 5, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return fn(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     raise e
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     unified_ascend_attention_with_output = torch.ops.vllm.unified_ascend_attention_with_output(query = query_1, key = key_1, value = value, output = output_2, layer_name = 'model.layers.0.self_attn.attn');  query_1 = key_1 = value = output_2 = unified_ascend_attention_with_output = None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_ops.py", line 1116, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._op(*args, **(kwargs or {}))
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 451, in unified_ascend_attention_with_output
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     raise e
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     self.impl.forward(self,
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 413, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     assert attn_metadata.attn_mask is not None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "<eval_with_key>.130", line 1232, in forward
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] AssertionError
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     submod_1 = self.submod_1(getitem, s0, getitem_1, getitem_2, getitem_3);  getitem = getitem_1 = getitem_2 = submod_1 = None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "<eval_with_key>.130", line 1232, in forward
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     submod_1 = self.submod_1(getitem, s0, getitem_1, getitem_2, getitem_3);  getitem = getitem_1 = getitem_2 = submod_1 = None
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     raise e
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     raise e
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "<eval_with_key>.2", line 5, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     unified_ascend_attention_with_output = torch.ops.vllm.unified_ascend_attention_with_output(query = query_1, key = key_1, value = value, output = output_2, layer_name = 'model.layers.0.self_attn.attn');  query_1 = key_1 = value = output_2 = unified_ascend_attention_with_output = None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_ops.py", line 1116, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._op(*args, **(kwargs or {}))
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "<eval_with_key>.2", line 5, in forward
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 451, in unified_ascend_attention_with_output
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     unified_ascend_attention_with_output = torch.ops.vllm.unified_ascend_attention_with_output(query = query_1, key = key_1, value = value, output = output_2, layer_name = 'model.layers.0.self_attn.attn');  query_1 = key_1 = value = output_2 = unified_ascend_attention_with_output = None
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     self.impl.forward(self,
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_ops.py", line 1116, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 413, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     return self._op(*args, **(kwargs or {}))
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     assert attn_metadata.attn_mask is not None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 451, in unified_ascend_attention_with_output
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] AssertionError
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     self.impl.forward(self,
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]   File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 413, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522]     assert attn_metadata.attn_mask is not None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] AssertionError

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions