Does prompt caching support LORA? #15098

tranlm · 2025-08-05T18:41:44Z

tranlm
Aug 5, 2025

Hi all,

I know that llama cpp "automatically" supports prompt caching by looking for the longest prefix in the string for each follow-up query. I'm wondering if this also works with LORA adapters? For example, say I have adapters A and B. I then do the following:

Call the base model with prompt t_1, ..., t_k.
Call the base model with adapter A with t_1, ..., t_k, t_{k+1}, ..., t_l.
Call the base model with adapter B with t_1, ..., t_k, t_{k+1}, ..., t_l, t_{l+1}, ..., t_m.

where 0 < k < l < m.

Would the prompt still be cached for the underlying base model (e.g. only the lora adapter operations are applied on each follow-up query)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does prompt caching support LORA? #15098

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Does prompt caching support LORA? #15098

Uh oh!

Uh oh!

tranlm Aug 5, 2025

Replies: 0 comments

tranlm
Aug 5, 2025