Skip to content

Assistant: Basic Anthropic prompt caching #8246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 25, 2025
Merged

Conversation

seeM
Copy link
Contributor

@seeM seeM commented Jun 23, 2025

This PR uses Anthropic's prompt caching API to reduce costs and alleviate organization rate limit pressure – particularly in Databot.

By default, we add a cache control point after the system prompt. Callers (e.g. Databot) can disable that if needed.

This PR also adds more Assistant logging:

  • The entire Anthropic API request/response at trace level, and summaries (including token usage) at debug level.
  • Elements added to the user context message.

Release Notes

New Features

Bug Fixes

  • N/A

QA Notes

Try out Assistant and Databot and check the cache read/write debug logs in the "Assistant" output channel.

@seeM seeM requested review from wch, jmcphers and sharon-wang June 23, 2025 17:59
Copy link

github-actions bot commented Jun 23, 2025

E2E Tests 🚀
This PR will run tests tagged with: @:critical

readme  valid tags

@seeM
Copy link
Contributor Author

seeM commented Jun 23, 2025

I seem to have broken the context stuff a bit, will fix.

EDIT: Fixed.

@seeM seeM force-pushed the feature/anthropic-prompt-caching branch from 3699e0d to cf0548e Compare June 24, 2025 17:27
@wch
Copy link
Contributor

wch commented Jun 24, 2025

If I'm understanding the code right, I think that the cache point on the last user message will not actually be read from, for a couple of reasons:

  • Suppose for the user message 1, it writes to the cache. Then when Positron sends the next request (with user message 1, assistant message 2, and user message 2), the cache point is only set on user message 2. But I believe that for this request there must be cache point on user message 1, if you want to restore from the cache. So I think it makes sense to put cache points on the two most recent user messages: The cache point on the most recent message will cause

  • The cache point is put on the last part of the last user message. However, when the request is made, the structure of the messages is roughly this (relevant code here):

    • User and Assistant messages from context.history. This includes only the content shown in the UI.
    • User message: Positron context like whether the active session is R or Python, and the session history.
    • User message: The user's text entered by the user in the chat box.

    Note that the Positron context is ephemeral -- it is not persisted across requests. So this means that if you put the cache breakpoint on the last message, there will never be another request that has that content as the prefix, and so there will never be a cache hit. In order for there to be a cache hit, I think you would have to swap the order of the last two messages: the user's actual text would have to come before the Positron context, and the cache breakpoint would have to be on the message with the user's text. And even then, you'd have to be sure that the exact same text available in the next turn when you get it out of context.history -- I could imagine that there might be some small changes, like to whitespace, when the previous text is pulled from context.history.
    Also, I am not 100% sure about this, but I think that if a message is sent with content: [{ type: "text", text: "Hello", cache_control: {type: "ephemeral"}}], as opposed to content: "Hello", then it should always be sent with the more complex JSON structure no matter where it appears in the history. Probably safest to always convert it.

I don't think it's necessary to cache the tools separately from the system prompt. I think it makes sense to just put a cache point at the end of the system prompt.

I believe that the cached content always contains the entire request up the breakpoint. Even if there are multiple breakpoints, each one will store in the cache everything from the beginning to that breakpoint, not just the content between breakpoints. The Anthropic documentation isn't super clear about this, so I asked Claude about this and pointed it to the docs, and it says the same thing: https://claude.ai/share/a85ae7e7-c3df-46be-9308-9a35a4a96705

Finally, for chat participants that follow the VS Code API, they can't change the real system prompt. Instead, they are supposed to insert a first user message with the same content that would go in a system prompt. It would be nice if the extension author could designate that message as being a cache breakpoint -- without breaking compatibility with the VS Code API. I understand that it may be outside the scope of this PR, but I just want to mention that.

@seeM
Copy link
Contributor Author

seeM commented Jun 25, 2025

Thanks for the detailed feedback @wch!

  1. Oops, you're right re user message caching, I've removed that for now. Given the complexity required there, I agree that may be best handled by updating the extension API and letting callers define cache controls directly.
  2. You're right that a cache control is not needed in the last tool, I've removed that too.

At this point, this PR only adds a cache control point after the system prompt, which should be a big win for Databot users, as you initially suggested. I'd like to get that merged ASAP in time for RC. Will sync up with you about further improvements.

@seeM seeM changed the title Assistant: Anthropic prompt caching Assistant: Basic Anthropic prompt caching Jun 25, 2025
Copy link
Collaborator

@jmcphers jmcphers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

2025-06-25 10:01:41.634 [debug] [anthropic] Adding cache control point to system prompt
2025-06-25 10:01:45.975 [debug] [anthropic] SEND messages.stream [req_011CQVJGyw7CudpFoPpSQXEg]: model: claude-sonnet-4-20250514; cache options: default; tools: executeCode, getAttachedPythonPackages, getAttachedRPackages, getInstalledPythonPackageVersion, getInstalledRPackageVersion, getPlot, getProjectTree, inspectVariables, notebook_install_packages, notebook_list_packages, positron_editFile_internal, positron_findTextInProject_internal, positron_getFileContents_internal, vscode_fetchWebPage_internal, vscode_searchExtensions_internal; tool choice: default; system chars: 26495; user messages: 2; user message characters: 6724; assistant messages: 0; assistant message characters: 2

@seeM seeM merged commit 3315562 into main Jun 25, 2025
10 checks passed
@seeM seeM deleted the feature/anthropic-prompt-caching branch June 25, 2025 17:22
@github-actions github-actions bot locked and limited conversation to collaborators Jun 25, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants