Skip to content

Add prompt cache support and implement for Anthropic#716

Open
arunkumarry wants to merge 4 commits intocrmne:mainfrom
arunkumarry:add-prompt-cache-anthropic
Open

Add prompt cache support and implement for Anthropic#716
arunkumarry wants to merge 4 commits intocrmne:mainfrom
arunkumarry:add-prompt-cache-anthropic

Conversation

@arunkumarry
Copy link
Copy Markdown

@arunkumarry arunkumarry commented Apr 2, 2026

What this does

Adds prompt caching support for Anthropic Claude models via a cache_point: keyword on Chat#with_instructions and Chat#ask. When a message is marked as a cache point, the gem injects Anthropic's cache_control: { type: 'ephemeral' } header on the last content block of that message and automatically adds the required anthropic-beta: prompt-caching-2024-07-31 request header. The static portion of the prompt is cached server-side by Anthropic for 5 minutes, reducing input token costs on repeated calls.

Fixes #706

Usage

chat = RubyLLM.chat(model: 'claude-haiku-4-5')
chat.with_instructions(large_static_system_prompt, cache_point: true)

response = chat.ask(user_message)

puts response.input_tokens        # total tokens processed
puts response.cached_tokens       # tokens served from cache (billed at 10x lower rate)
puts response.cache_creation_tokens # tokens written to cache on first call

Multiple cache points are supported (up to Anthropic's limit of 4 per request):

chat = RubyLLM.chat(model: 'claude-sonnet-4-5')
  .with_instructions(static_prefix, cache_point: true)
  .with_instructions(session_config, append: true, cache_point: true)

chat.ask(dynamic_user_query)

Cache points on ask are also supported for caching user messages:

chat.ask(large_static_user_context, cache_point: true)

Future extensibility

The cache_point attribute on Message is provider agnostic. Adding support for other providers requires only provider-specific formatting logic:

For providers with inline cache markers (like Anthropic):

Override complete method to add any required headers/beta flags when messages.any?(&:cache_point?)
Add an inject_cache_* helper in the provider's Chat module that modifies content blocks
Call the helper in the message formatting methods when msg.cache_point?
For providers with separate cache APIs (like Gemini's Context Caching):

Override complete to manage the cache lifecycle (create → reuse → retry on expiry)
Modify render_payload to accept a cached_content_name: parameter
When the name is present, split messages at the last cache point and only send the dynamic suffix inline
For providers without caching support:

No changes needed cache_point flags are silently ignored, preserving existing behavior
The core Message and Chat changes are already in place. Future PRs for Gemini, OpenAI (when they add caching), or other providers only need to touch their respective provider modules.

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Required for new features

Quality check

  • I ran overcommit --install and all hooks pass
    There are existing rubocop offenses in - spec/ruby_llm/generators/chat_ui_generator_spec.rb
    There are existing Flay offenses in following files - spec/ruby_llm/generators/chat_ui_generator_spec.rb and lib/ruby_llm/error.rb
  • I tested my changes thoroughly
    • For provider changes: Re-recorded VCR cassettes with bundle exec rake vcr:record[provider_name]
    • All tests pass: bundle exec rspec
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

AI-generated code

  • I used AI tools to help write this code
  • I have reviewed and understand all generated code (required if above is checked)

API changes

  • Breaking change
  • New public methods/classes
  • Changed method signatures
  • No API changes

@arunkumarry
Copy link
Copy Markdown
Author

arunkumarry commented Apr 2, 2026

There are existing rubocop offenses in - spec/ruby_llm/generators/chat_ui_generator_spec.rb
There are existing Flay offenses in following files - spec/ruby_llm/generators/chat_ui_generator_spec.rb and lib/ruby_llm/error.rb

# This is the 1st commit message:

Add prompt caching support for Anthropic

# The commit message crmne#2 will be skipped:

# uncommit unnecessary file

# The commit message crmne#3 will be skipped:

# remove rubocop and flay fixes as they are unrelated to this issue

# The commit message crmne#4 will be skipped:

# remove rubocop ignore for anthropic complete method
# This is the 1st commit message:

Add prompt caching support for Anthropic

# The commit message crmne#2 will be skipped:

# Add prompt caching support for Anthropic
@arunkumarry arunkumarry force-pushed the add-prompt-cache-anthropic branch from d474ab2 to 1779bf1 Compare April 2, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add prompt caching support for providers(Currently for Anthropic)

1 participant