Skip to content

feature: dynamic quantized model support#1155

Draft
dsrenesanse wants to merge 1 commit intoml-explore:mainfrom
dsrenesanse:main
Draft

feature: dynamic quantized model support#1155
dsrenesanse wants to merge 1 commit intoml-explore:mainfrom
dsrenesanse:main

Conversation

@dsrenesanse
Copy link
Copy Markdown

Proposal: developers would like to use bigger and more intelligent models locally.

Problem: We need more flexibility for quantization to optimize memory footprint , but current state of mlx-lm allows to run a model only if all layers have same quantization.

Solution: be able to run models with aggressive quantization on knowledge layers for example, but with less aggressive quantization on reasoning and attention layers.

Edge cases: change of config file is required to support this feature.

Regards, Daniil.

@dsrenesanse dsrenesanse marked this pull request as draft April 25, 2026 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant