-
Notifications
You must be signed in to change notification settings - Fork 348
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
vllm 0.9.2+empty /vllm-workspace/vllm
vllm_ascend 0.9.2rc1 /vllm-workspace/vllm-ascend
🐛 Describe the bug
use vllm 0.9.2 and vllm-ascend 0.9.2rc1 by cmd:
vllm serve $MODEL \
--served-model-name qwen \
--tensor-parallel-size $TP \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--compilation_config '{"cudagraph_capture_sizes":[1,4,8,16,24,32]}' \
--speculative_config '{"method":"ngram","num_speculative_tokens":2,"prompt_lookup_max":4}' \
--quantization "ascend" \
--disable-log-requests
use vllm benchmark script
python benchmarks/benchmark_serving.py \
--backend vllm \
--model /mnt/models/Qwen3-32B \
--dataset-name random \
--random-input-len 2048 \
--random-output-len 2048 \
--num-prompts 16 \
--request-rate 4 \
--max-concurrency 4 \
--port 8000 \
--host 10.50.95.185 \
--served-model-name qwen
error info
INFO: Started server process [15221]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: 10.50.95.185:55526 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 07-28 12:46:59 [loggers.py:118] Engine 000: Avg prompt throughput: 204.8 tokens/s, Avg generation throughput: 30.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 0.0%
INFO 07-28 12:47:09 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 36.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 0.0%
INFO 07-28 12:47:19 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 36.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.5%, Prefix cache hit rate: 0.0%
INFO 07-28 12:47:29 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 36.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.5%, Prefix cache hit rate: 0.0%
INFO: 10.50.95.185:56174 - "POST /v1/completions HTTP/1.1" 200 OK
INFO: 10.50.95.185:56182 - "POST /v1/completions HTTP/1.1" 200 OK
INFO: 10.50.95.185:56188 - "POST /v1/completions HTTP/1.1" 200 OK
INFO: 10.50.95.185:56194 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 07-28 12:47:39 [loggers.py:118] Engine 000: Avg prompt throughput: 409.6 tokens/s, Avg generation throughput: 33.9 tokens/s, Running: 3 reqs, Waiting: 1 reqs, GPU KV cache usage: 0.9%, Prefix cache hit rate: 23.4%
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1406, in execute_model
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] num_scheduled_tokens_np) = (self._process_reqs(
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1132, in _process_reqs
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] hidden_states = self.model(
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1406, in execute_model
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] num_scheduled_tokens_np) = (self._process_reqs(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1132, in _process_reqs
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] model_output = self.forward(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] hidden_states = self.model(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] def forward(
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return fn(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] model_output = self.forward(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] def forward(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] raise e
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] WorkerProc hit an exception.
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] Traceback (most recent call last):
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 517, in worker_busy_loop
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return fn(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 217, in execute_model
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1406, in execute_model
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] num_scheduled_tokens_np) = (self._process_reqs(
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "<eval_with_key>.130", line 1296, in forward
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1132, in _process_reqs
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return func(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] submod_1 = self.submod_1(getitem, s0, getitem_1, getitem_2, getitem_3); getitem = getitem_1 = getitem_2 = submod_1 = None
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] raise e
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] hidden_states = self.model(
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1406, in execute_model
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] num_scheduled_tokens_np) = (self._process_reqs(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1132, in _process_reqs
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] hidden_states = self.model(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] raise e
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "<eval_with_key>.130", line 1232, in forward
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] submod_1 = self.submod_1(getitem, s0, getitem_1, getitem_2, getitem_3); getitem = getitem_1 = getitem_2 = submod_1 = None
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] model_output = self.forward(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen3.py", line 302, in forward
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] def forward(
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 246, in __call__
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "<eval_with_key>.2", line 5, in forward
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] model_output = self.forward(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] unified_ascend_attention_with_output = torch.ops.vllm.unified_ascend_attention_with_output(query = query_1, key = key_1, value = value, output = output_2, layer_name = 'model.layers.0.self_attn.attn'); query_1 = key_1 = value = output_2 = unified_ascend_attention_with_output = None
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] raise e
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 337, in forward
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_ops.py", line 1116, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] def forward(
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._op(*args, **(kwargs or {}))
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 451, in unified_ascend_attention_with_output
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] self.impl.forward(self,
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return fn(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 413, in forward
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] assert attn_metadata.attn_mask is not None
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
(VllmWorker rank=0 pid=15500) ERROR 07-28 12:47:39 [multiproc_executor.py:522] AssertionError
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "<eval_with_key>.2", line 5, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return fn(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] raise e
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] unified_ascend_attention_with_output = torch.ops.vllm.unified_ascend_attention_with_output(query = query_1, key = key_1, value = value, output = output_2, layer_name = 'model.layers.0.self_attn.attn'); query_1 = key_1 = value = output_2 = unified_ascend_attention_with_output = None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_ops.py", line 1116, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._op(*args, **(kwargs or {}))
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 451, in unified_ascend_attention_with_output
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] raise e
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] self.impl.forward(self,
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 413, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] assert attn_metadata.attn_mask is not None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "<eval_with_key>.130", line 1232, in forward
(VllmWorker rank=1 pid=15506) ERROR 07-28 12:47:39 [multiproc_executor.py:522] AssertionError
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] submod_1 = self.submod_1(getitem, s0, getitem_1, getitem_2, getitem_3); getitem = getitem_1 = getitem_2 = submod_1 = None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "<eval_with_key>.130", line 1232, in forward
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] submod_1 = self.submod_1(getitem, s0, getitem_1, getitem_2, getitem_3); getitem = getitem_1 = getitem_2 = submod_1 = None
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] raise e
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] raise e
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "<eval_with_key>.2", line 5, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._call_impl(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] unified_ascend_attention_with_output = torch.ops.vllm.unified_ascend_attention_with_output(query = query_1, key = key_1, value = value, output = output_2, layer_name = 'model.layers.0.self_attn.attn'); query_1 = key_1 = value = output_2 = unified_ascend_attention_with_output = None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_ops.py", line 1116, in __call__
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return forward_call(*args, **kwargs)
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._op(*args, **(kwargs or {}))
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "<eval_with_key>.2", line 5, in forward
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 451, in unified_ascend_attention_with_output
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] unified_ascend_attention_with_output = torch.ops.vllm.unified_ascend_attention_with_output(query = query_1, key = key_1, value = value, output = output_2, layer_name = 'model.layers.0.self_attn.attn'); query_1 = key_1 = value = output_2 = unified_ascend_attention_with_output = None
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] self.impl.forward(self,
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_ops.py", line 1116, in __call__
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 413, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] return self._op(*args, **(kwargs or {}))
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] assert attn_metadata.attn_mask is not None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 451, in unified_ascend_attention_with_output
(VllmWorker rank=3 pid=15765) ERROR 07-28 12:47:39 [multiproc_executor.py:522] AssertionError
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] self.impl.forward(self,
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] File "/vllm-workspace/vllm-ascend/vllm_ascend/attention/attention_v1.py", line 413, in forward
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] assert attn_metadata.attn_mask is not None
(VllmWorker rank=2 pid=15528) ERROR 07-28 12:47:39 [multiproc_executor.py:522] AssertionError
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working