-
Notifications
You must be signed in to change notification settings - Fork 297
Description
Your current environment
The output of `python collect_env.py`
Collecting environment information... PyTorch version: 2.5.1 Is debug build: FalseOS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.3
Libc version: glibc-2.35
Python version: 3.10.17 (main, May 8 2025, 07:18:04) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-4.19.90-vhulk2211.3.0.h1912.eulerosv2r10.aarch64-aarch64-with-glibc2.35
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: HiSilicon
BIOS Vendor ID: HiSilicon
Model name: Kunpeng-920
BIOS Model name: HUAWEI Kunpeng 920 5250
Model: 0
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 4
Stepping: 0x1
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 12 MiB (192 instances)
L1i cache: 12 MiB (192 instances)
L2 cache: 96 MiB (192 instances)
L3 cache: 192 MiB (8 instances)
NUMA node(s): 8
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
NUMA node4 CPU(s): 96-119
NUMA node5 CPU(s): 120-143
NUMA node6 CPU(s): 144-167
NUMA node7 CPU(s): 168-191
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.0.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1.post1.dev20250619
[pip3] torchvision==0.20.1
[pip3] transformers==4.53.3
[conda] Could not collect
vLLM Version: 0.9.2
vLLM Ascend Version: 0.1.dev1+g3aa3b46 (git sha: 3aa3b46)
ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
VLLM_USE_MODELSCOPE=true
PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
VLLM_USE_V1=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc2 Version: 24.1.rc2 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B4 | OK | 84.1 38 0 / 0 |
| 0 | 0000:C1:00.0 | 0 0 / 0 2881 / 32768 |
+===========================+===============+====================================================+
| 1 910B4 | OK | 87.9 41 0 / 0 |
| 0 | 0000:01:00.0 | 0 0 / 0 2853 / 32768 |
+===========================+===============+====================================================+
| 2 910B4 | OK | 86.5 37 0 / 0 |
| 0 | 0000:C2:00.0 | 0 0 / 0 2850 / 32768 |
+===========================+===============+====================================================+
| 3 910B4 | OK | 91.4 40 0 / 0 |
| 0 | 0000:02:00.0 | 0 0 / 0 2851 / 32768 |
+===========================+===============+====================================================+
| 4 910B4 | OK | 90.4 37 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 2853 / 32768 |
+===========================+===============+====================================================+
| 5 910B4 | OK | 86.3 40 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 2853 / 32768 |
+===========================+===============+====================================================+
| 6 910B4 | OK | 96.0 39 0 / 0 |
| 0 | 0000:82:00.0 | 0 0 / 0 2852 / 32768 |
+===========================+===============+====================================================+
| 7 910B4 | OK | 89.2 39 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 2850 / 32768 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 0 |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+
| No running processes found in NPU 2 |
+===========================+===============+====================================================+
| No running processes found in NPU 3 |
+===========================+===============+====================================================+
| No running processes found in NPU 4 |
+===========================+===============+====================================================+
| No running processes found in NPU 5 |
+===========================+===============+====================================================+
| No running processes found in NPU 6 |
+===========================+===============+====================================================+
| No running processes found in NPU 7 |
+===========================+===============+====================================================+
CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux
🐛 Describe the bug
Docker image used: url
Docker container creation command
docker run --privileged -itd -u root --rm --name <name> --ipc=host \
--privileged=true --net=host \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-e "ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7" \
-e "MAX_MEMORY_GB=55" \
-e "VLLM_USE_MODELSCOPE=true" \
-e "VLLM_USE_V1=1" \
-e "PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256" \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/ \
-v /usr/local/sbin/:/usr/local/sbin/ \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
-v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf \
-v /var/log/npu/slog/:/var/log/npu/slog \
-v /var/log/npu/profiling/:/var/log/npu/profiling \
-v /var/log/npu/dump/:/var/log/npu/dump \
-v /var/log/npu/:/usr/slog \
-v /lib/modules:/lib/modules \
vllm-ascend:v0.9.2rc1 \
/bin/bash
Quantized model with msmodelslim as in documentation version modelslim-VLLM-8.1.RC1.b020_001
msit# git branch
* (HEAD detached at modelslim-VLLM-8.1.RC1.b020_001)
quantization script
python3 msit/msmodelslim/example/Qwen/quant_qwen.py \
--model_path <path>/Qwen2.5-32B-Instruct/ \
--save_directory <path> \
--calib_file <path>msit/msmodelslim/example/Qwen/calib_data/calib_prompt.jsonl \
--anti_calib_file <path>/msit/msmodelslim/example/Qwen/calib_data/anti_prompt.jsonl \
--w_bit 8 \
--a_bit 8 \
--anti_method m2 \
--is_lowbit False \
--act_method 1 \
--w_sym True \
--do_smooth False \
--use_kvcache_quant True \
--use_sigma False \
--open_outlier True \
--is_dynamic False \
--group_size 128 \
--device_type npu \
--disable_last_linear True \
--model_type qwen2.5
vllm launch command
vllm serve <path> \
--host 127.0.0.1 \
--port 8010 \
--served-model-name qwen2 \
--gpu-memory-utilization 0.85 \
--quantization ascend \
--tensor-parallel-size 4
vllm serve output log
(VllmWorker rank=3 pid=22679) DEBUG 07-31 10:45:55 [utils.py:183] Loaded weight lm_head.weight with shape torch.Size([38016, 5120])
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] WorkerProc failed to start.
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] Traceback (most recent call last):
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 461, in worker_main
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] worker = WorkerProc(*args, **kwargs)
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 358, in __init__
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] self.worker.load_model()
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 213, in load_model
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] self.model_runner.load_model()
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1814, in load_model
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] self.model = get_model(vllm_config=self.vllm_config)
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/__init__.py", line 59, in get_model
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] return loader.load_model(vllm_config=vllm_config,
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 41, in load_model
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] self.load_weights(model, model_config)
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/default_loader.py", line 269, in load_weights
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] loaded_weights = model.load_weights(
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 498, in load_weights
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] return loader.load_weights(weights)
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/utils.py", line 291, in load_weights
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] autoloaded_weights = set(self._load_module("", self.module, weights))
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/utils.py", line 249, in _load_module
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] yield from self._load_module(prefix,
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/utils.py", line 222, in _load_module
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] loaded_params = module_load_weights(weights)
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 403, in load_weights
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] param = params_dict[name]
(VllmWorker rank=2 pid=22583) ERROR 07-31 10:45:56 [multiproc_executor.py:487] KeyError: 'layers.0.self_attn.qkv_proj.kv_cache_offset'
Steps to reproduce:
In order to be able to work with quantized kv cache (see attached quant model description)
It is required to change vllm-ascend/vllm_ascend/quantization/quantizer.py
Because in quant model description if there are fields "fa_quant_type": null, or "kv_quant_type": "C8" start will eventually fail with
NotImplementedError: Currently, vLLM Ascend only supports following quant types:['W8A8', 'W8A8_DYNAMIC', 'C8']
If you output what type it gets - you will get None
Default code error
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] WorkerProc failed to start.
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] Traceback (most recent call last):
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quantizer.py", line 51, in get_quantizer
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] module = importlib.import_module("mindie_turbo")
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/usr/local/python3.10.17/lib/python3.10/importlib/__init__.py", line 126, in import_module
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] return _bootstrap._gcd_import(name[level:], package, level)
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] ModuleNotFoundError: No module named 'mindie_turbo'
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487]
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] During handling of the above exception, another exception occurred:
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487]
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] Traceback (most recent call last):
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 461, in worker_main
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] worker = WorkerProc(*args, **kwargs)
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 358, in __init__
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] self.worker.load_model()
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 213, in load_model
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] self.model_runner.load_model()
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1814, in load_model
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] self.model = get_model(vllm_config=self.vllm_config)
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/__init__.py", line 59, in get_model
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] return loader.load_model(vllm_config=vllm_config,
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 38, in load_model
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] model = initialize_model(vllm_config=vllm_config,
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/utils.py", line 64, in initialize_model
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] return model_class(vllm_config=vllm_config, prefix=prefix)
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 448, in __init__
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] self.model = Qwen2Model(vllm_config=vllm_config,
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 152, in __init__
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 317, in __init__
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] self.start_layer, self.end_layer, self.layers = make_layers(
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/utils.py", line 639, in make_layers
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] [PPMissingLayer() for _ in range(start_layer)] + [
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/utils.py", line 640, in <listcomp>
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 319, in <lambda>
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] lambda prefix: decoder_layer_type(config=config,
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 216, in __init__
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] self.self_attn = Qwen2Attention(
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 162, in __init__
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] self.attn = Attention(
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/attention/layer.py", line 112, in __init__
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] quant_method = quant_config.get_quant_method(
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quant_config.py", line 103, in get_quant_method
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] return AscendKVCacheMethod(self, prefix)
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quant_config.py", line 227, in __init__
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] self.quantizer = AscendQuantizer.get_quantizer(
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quantizer.py", line 56, in get_quantizer
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] return VLLMAscendQuantizer.get_quantizer(quant_config, prefix,
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quantizer.py", line 268, in get_quantizer
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] raise NotImplementedError("Currently, vLLM Ascend only supports following quant types:" \
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] NotImplementedError: Currently, vLLM Ascend only supports following quant types:['W8A8', 'W8A8_DYNAMIC', 'C8']
(VllmWorker rank=3 pid=24267) ERROR 07-31 10:51:32 [multiproc_executor.py:487] Your quant type is: None
(VllmWorker rank=0 pid=24146) INFO 07-31 10:51:32 [quantizer.py:89] Using the vLLM Ascend Quantizer version now!
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] WorkerProc failed to start.
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] Traceback (most recent call last):
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quantizer.py", line 51, in get_quantizer
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] module = importlib.import_module("mindie_turbo")
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/usr/local/python3.10.17/lib/python3.10/importlib/__init__.py", line 126, in import_module
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] return _bootstrap._gcd_import(name[level:], package, level)
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] ModuleNotFoundError: No module named 'mindie_turbo'
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487]
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] During handling of the above exception, another exception occurred:
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487]
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] Traceback (most recent call last):
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 461, in worker_main
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] worker = WorkerProc(*args, **kwargs)
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 358, in __init__
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] self.worker.load_model()
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 213, in load_model
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] self.model_runner.load_model()
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1814, in load_model
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] self.model = get_model(vllm_config=self.vllm_config)
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/__init__.py", line 59, in get_model
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] return loader.load_model(vllm_config=vllm_config,
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 38, in load_model
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] model = initialize_model(vllm_config=vllm_config,
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/utils.py", line 64, in initialize_model
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] return model_class(vllm_config=vllm_config, prefix=prefix)
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 448, in __init__
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] self.model = Qwen2Model(vllm_config=vllm_config,
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 152, in __init__
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 317, in __init__
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] self.start_layer, self.end_layer, self.layers = make_layers(
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/utils.py", line 639, in make_layers
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] [PPMissingLayer() for _ in range(start_layer)] + [
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/utils.py", line 640, in <listcomp>
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 319, in <lambda>
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] lambda prefix: decoder_layer_type(config=config,
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 216, in __init__
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] self.self_attn = Qwen2Attention(
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 162, in __init__
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] self.attn = Attention(
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/attention/layer.py", line 112, in __init__
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] quant_method = quant_config.get_quant_method(
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quant_config.py", line 103, in get_quant_method
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] return AscendKVCacheMethod(self, prefix)
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quant_config.py", line 227, in __init__
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] self.quantizer = AscendQuantizer.get_quantizer(
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quantizer.py", line 56, in get_quantizer
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] return VLLMAscendQuantizer.get_quantizer(quant_config, prefix,
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quantizer.py", line 268, in get_quantizer
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] raise NotImplementedError("Currently, vLLM Ascend only supports following quant types:" \
(VllmWorker rank=0 pid=24146) ERROR 07-31 10:51:33 [multiproc_executor.py:487] NotImplementedError: Currently, vLLM Ascend only supports following quant types:['W8A8', 'W8A8_DYNAMIC', 'C8']