Motivation
Currently only AWQ models quantized with 'AutoAWQ' framework [1] are supported by 'lmdeploy'.
The AutoAWQ framework is deprecated, so many recent AWQ models on HF model repo are quantized with its successor 'llm-compressor' [2].
For example, the Qwen3-VL models by 'cynkiwi' (e.g. the 4B model [3]) are all quantized with 'llm-compressor'.
One can see which framework was used in the 'config.json' of the downloaded model, in key 'quant_method'.
For new 'llm-compressor' framework, its value is 'compressed-tensors'.
So it would be great if llmdeploy could support also AWQ models quantized with 'llm-compressor' framework.
References:
[1] https://docs.vllm.ai/en/latest/features/quantization/auto_awq/
[2] https://github.com/vllm-project/llm-compressor
[3] https://huggingface.co/cyankiwi/Qwen3-VL-4B-Instruct-AWQ-4bit/
Related resources
No response
Additional context
No response
Motivation
Currently only AWQ models quantized with 'AutoAWQ' framework [1] are supported by 'lmdeploy'.
The AutoAWQ framework is deprecated, so many recent AWQ models on HF model repo are quantized with its successor 'llm-compressor' [2].
For example, the Qwen3-VL models by 'cynkiwi' (e.g. the 4B model [3]) are all quantized with 'llm-compressor'.
One can see which framework was used in the 'config.json' of the downloaded model, in key 'quant_method'.
For new 'llm-compressor' framework, its value is 'compressed-tensors'.
So it would be great if llmdeploy could support also AWQ models quantized with 'llm-compressor' framework.
References:
[1] https://docs.vllm.ai/en/latest/features/quantization/auto_awq/
[2] https://github.com/vllm-project/llm-compressor
[3] https://huggingface.co/cyankiwi/Qwen3-VL-4B-Instruct-AWQ-4bit/
Related resources
No response
Additional context
No response