Skip to content

[Feature] Support AWQ models quantized with 'llm-compressor' framework #4539

@hfassold

Description

@hfassold

Motivation

Currently only AWQ models quantized with 'AutoAWQ' framework [1] are supported by 'lmdeploy'.
The AutoAWQ framework is deprecated, so many recent AWQ models on HF model repo are quantized with its successor 'llm-compressor' [2].
For example, the Qwen3-VL models by 'cynkiwi' (e.g. the 4B model [3]) are all quantized with 'llm-compressor'.
One can see which framework was used in the 'config.json' of the downloaded model, in key 'quant_method'.
For new 'llm-compressor' framework, its value is 'compressed-tensors'.

So it would be great if llmdeploy could support also AWQ models quantized with 'llm-compressor' framework.

References:
[1] https://docs.vllm.ai/en/latest/features/quantization/auto_awq/
[2] https://github.com/vllm-project/llm-compressor
[3] https://huggingface.co/cyankiwi/Qwen3-VL-4B-Instruct-AWQ-4bit/

Related resources

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions