Support for Activated LoRA (https://github.com/ggml-org/llama.cpp/issues/15212) #15213

kgreenewald · 2025-08-10T13:44:59Z

kgreenewald
Aug 10, 2025

Apologies if this is slightly out of order - I have created an issue #15212 requesting support for Activated LoRA adapters (see issue for details and motivation). These adapters are invoked by including an invocation sequence in the prompt, and only affect the weights for all tokens after the invocation sequence appears. This means that the adapter can re-use the KV cache from base model, leading to huge improvements in TTFT (compared to hot-swapping LoRA adapters) if you apply the adapter deep into a multi-turn interaction with the model. Appreciate any feedback or thoughts on this!

Our plan would be to start this integration work ourselves and submit a PR for this feature in the near future, building on the existing support for hot-swapping LoRA adapters.

This complements existing PRs to both Huggingface PEFT (huggingface/peft#2609) and vLLM (vllm-project/vllm#19710).

cc @gabe-l-hart

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Activated LoRA (https://github.com/ggml-org/llama.cpp/issues/15212) #15213

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Support for Activated LoRA (https://github.com/ggml-org/llama.cpp/issues/15212) #15213

Uh oh!

kgreenewald Aug 10, 2025

Replies: 0 comments

kgreenewald
Aug 10, 2025