Add prompt cache support and implement for Anthropic by arunkumarry · Pull Request #716 · crmne/ruby_llm

arunkumarry · 2026-04-02T10:34:41Z

What this does

Adds prompt caching support for Anthropic Claude models via a cache_point: keyword on Chat#with_instructions and Chat#ask. When a message is marked as a cache point, the gem injects Anthropic's cache_control: { type: 'ephemeral' } header on the last content block of that message and automatically adds the required anthropic-beta: prompt-caching-2024-07-31 request header. The static portion of the prompt is cached server-side by Anthropic for 5 minutes, reducing input token costs on repeated calls.

Fixes #706

Usage

chat = RubyLLM.chat(model: 'claude-haiku-4-5')
chat.with_instructions(large_static_system_prompt, cache_point: true)

response = chat.ask(user_message)

puts response.input_tokens        # total tokens processed
puts response.cached_tokens       # tokens served from cache (billed at 10x lower rate)
puts response.cache_creation_tokens # tokens written to cache on first call

Multiple cache points are supported (up to Anthropic's limit of 4 per request):

chat = RubyLLM.chat(model: 'claude-sonnet-4-5')
  .with_instructions(static_prefix, cache_point: true)
  .with_instructions(session_config, append: true, cache_point: true)

chat.ask(dynamic_user_query)

Cache points on ask are also supported for caching user messages:

chat.ask(large_static_user_context, cache_point: true)

Future extensibility

The cache_point attribute on Message is provider agnostic. Adding support for other providers requires only provider-specific formatting logic:

For providers with inline cache markers (like Anthropic):

Override complete method to add any required headers/beta flags when messages.any?(&:cache_point?)
Add an inject_cache_* helper in the provider's Chat module that modifies content blocks
Call the helper in the message formatting methods when msg.cache_point?
For providers with separate cache APIs (like Gemini's Context Caching):

Override complete to manage the cache lifecycle (create → reuse → retry on expiry)
Modify render_payload to accept a cached_content_name: parameter
When the name is present, split messages at the last cache point and only send the dynamic suffix inline
For providers without caching support:

No changes needed cache_point flags are silently ignored, preserving existing behavior
The core Message and Chat changes are already in place. Future PRs for Gemini, OpenAI (when they add caching), or other providers only need to touch their respective provider modules.

Type of change

Scope check

I read the Contributing Guide
This aligns with RubyLLM's focus on LLM communication
This isn't application-specific logic that belongs in user code
This benefits most users, not just my specific use case

Required for new features

I opened an issue before writing code and received maintainer approval
Linked issue: [FEATURE] Add prompt caching support for providers(Currently for Anthropic) #706

Quality check

I ran overcommit --install and all hooks pass
There are existing rubocop offenses in - spec/ruby_llm/generators/chat_ui_generator_spec.rb
There are existing Flay offenses in following files - spec/ruby_llm/generators/chat_ui_generator_spec.rb and lib/ruby_llm/error.rb
I tested my changes thoroughly
- For provider changes: Re-recorded VCR cassettes with bundle exec rake vcr:record[provider_name]
- All tests pass: bundle exec rspec
I updated documentation if needed
I didn't modify auto-generated files manually (models.json, aliases.json)

AI-generated code

I used AI tools to help write this code
I have reviewed and understand all generated code (required if above is checked)

API changes

Breaking change
New public methods/classes
Changed method signatures
No API changes

arunkumarry · 2026-04-02T10:37:48Z

There are existing rubocop offenses in - spec/ruby_llm/generators/chat_ui_generator_spec.rb
There are existing Flay offenses in following files - spec/ruby_llm/generators/chat_ui_generator_spec.rb and lib/ruby_llm/error.rb

# This is the 1st commit message: Add prompt caching support for Anthropic # The commit message crmne#2 will be skipped: # uncommit unnecessary file # The commit message crmne#3 will be skipped: # remove rubocop and flay fixes as they are unrelated to this issue # The commit message crmne#4 will be skipped: # remove rubocop ignore for anthropic complete method

# This is the 1st commit message: Add prompt caching support for Anthropic # The commit message crmne#2 will be skipped: # Add prompt caching support for Anthropic

arunkumarry added 3 commits April 2, 2026 19:29

# This is a combination of 2 commits.

4db646c

# This is the 1st commit message: Add prompt caching support for Anthropic # The commit message crmne#2 will be skipped: # Add prompt caching support for Anthropic

Add prompt caching support for Anthropic

1779bf1

arunkumarry force-pushed the add-prompt-cache-anthropic branch from d474ab2 to 1779bf1 Compare April 2, 2026 14:01

revert anthropic capabilities changes

df8353f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add prompt cache support and implement for Anthropic#716

Add prompt cache support and implement for Anthropic#716
arunkumarry wants to merge 4 commits intocrmne:mainfrom
arunkumarry:add-prompt-cache-anthropic

arunkumarry commented Apr 2, 2026 •

edited

Loading

Uh oh!

arunkumarry commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

arunkumarry commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Usage

Future extensibility

Type of change

Scope check

Required for new features

Quality check

AI-generated code

API changes

Uh oh!

arunkumarry commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arunkumarry commented Apr 2, 2026 •

edited

Loading

arunkumarry commented Apr 2, 2026 •

edited

Loading