Assistant: Basic Anthropic prompt caching #8246

seeM · 2025-06-23T17:59:48Z

This PR uses Anthropic's prompt caching API to reduce costs and alleviate organization rate limit pressure – particularly in Databot.

By default, we add a cache control point after the system prompt. Callers (e.g. Databot) can disable that if needed.

This PR also adds more Assistant logging:

The entire Anthropic API request/response at trace level, and summaries (including token usage) at debug level.
Elements added to the user context message.

Release Notes

New Features

Assistant caches prompts for Anthropic models (Assistant: Prompt caching #8077).

Bug Fixes

N/A

QA Notes

Try out Assistant and Databot and check the cache read/write debug logs in the "Assistant" output channel.

github-actions · 2025-06-23T18:00:06Z

E2E Tests 🚀
This PR will run tests tagged with: @:critical

^readme ^{valid tags}

seeM · 2025-06-23T18:54:22Z

I seem to have broken the context stuff a bit, will fix.

EDIT: Fixed.

wch · 2025-06-24T22:26:25Z

If I'm understanding the code right, I think that the cache point on the last user message will not actually be read from, for a couple of reasons:

Suppose for the user message 1, it writes to the cache. Then when Positron sends the next request (with user message 1, assistant message 2, and user message 2), the cache point is only set on user message 2. But I believe that for this request there must be cache point on user message 1, if you want to restore from the cache. So I think it makes sense to put cache points on the two most recent user messages: The cache point on the most recent message will cause
The cache point is put on the last part of the last user message. However, when the request is made, the structure of the messages is roughly this (relevant code here):
- User and Assistant messages from context.history. This includes only the content shown in the UI.
- User message: Positron context like whether the active session is R or Python, and the session history.
- User message: The user's text entered by the user in the chat box.
Note that the Positron context is ephemeral -- it is not persisted across requests. So this means that if you put the cache breakpoint on the last message, there will never be another request that has that content as the prefix, and so there will never be a cache hit. In order for there to be a cache hit, I think you would have to swap the order of the last two messages: the user's actual text would have to come before the Positron context, and the cache breakpoint would have to be on the message with the user's text. And even then, you'd have to be sure that the exact same text available in the next turn when you get it out of context.history -- I could imagine that there might be some small changes, like to whitespace, when the previous text is pulled from context.history.
Also, I am not 100% sure about this, but I think that if a message is sent with content: [{ type: "text", text: "Hello", cache_control: {type: "ephemeral"}}], as opposed to content: "Hello", then it should always be sent with the more complex JSON structure no matter where it appears in the history. Probably safest to always convert it.

I don't think it's necessary to cache the tools separately from the system prompt. I think it makes sense to just put a cache point at the end of the system prompt.

I believe that the cached content always contains the entire request up the breakpoint. Even if there are multiple breakpoints, each one will store in the cache everything from the beginning to that breakpoint, not just the content between breakpoints. The Anthropic documentation isn't super clear about this, so I asked Claude about this and pointed it to the docs, and it says the same thing: https://claude.ai/share/a85ae7e7-c3df-46be-9308-9a35a4a96705

Finally, for chat participants that follow the VS Code API, they can't change the real system prompt. Instead, they are supposed to insert a first user message with the same content that would go in a system prompt. It would be nice if the extension author could designate that message as being a cache breakpoint -- without breaking compatibility with the VS Code API. I understand that it may be outside the scope of this PR, but I just want to mention that.

seeM · 2025-06-25T12:09:26Z

Thanks for the detailed feedback @wch!

Oops, you're right re user message caching, I've removed that for now. Given the complexity required there, I agree that may be best handled by updating the extension API and letting callers define cache controls directly.
You're right that a cache control is not needed in the last tool, I've removed that too.

At this point, this PR only adds a cache control point after the system prompt, which should be a big win for Databot users, as you initially suggested. I'd like to get that merged ASAP in time for RC. Will sync up with you about further improvements.

jmcphers

LGTM!

2025-06-25 10:01:41.634 [debug] [anthropic] Adding cache control point to system prompt
2025-06-25 10:01:45.975 [debug] [anthropic] SEND messages.stream [req_011CQVJGyw7CudpFoPpSQXEg]: model: claude-sonnet-4-20250514; cache options: default; tools: executeCode, getAttachedPythonPackages, getAttachedRPackages, getInstalledPythonPackageVersion, getInstalledRPackageVersion, getPlot, getProjectTree, inspectVariables, notebook_install_packages, notebook_list_packages, positron_editFile_internal, positron_findTextInProject_internal, positron_getFileContents_internal, vscode_fetchWebPage_internal, vscode_searchExtensions_internal; tool choice: default; system chars: 26495; user messages: 2; user message characters: 6724; assistant messages: 0; assistant message characters: 2

seeM requested review from wch, jmcphers and sharon-wang June 23, 2025 17:59

seeM mentioned this pull request Jun 23, 2025

Assistant: provide token counts with LLM responses #8233

Open

seeM added 3 commits June 24, 2025 19:00

anthropic prompt caching

0e1e31a

fix tests

4208331

test cache_control

cf0548e

seeM force-pushed the feature/anthropic-prompt-caching branch from 3699e0d to cf0548e Compare June 24, 2025 17:27

only cache system prompt for now

8b6f0fc

seeM changed the title ~~Assistant: Anthropic prompt caching~~ Assistant: Basic Anthropic prompt caching Jun 25, 2025

jmcphers approved these changes Jun 25, 2025

View reviewed changes

seeM merged commit 3315562 into main Jun 25, 2025
10 checks passed

seeM deleted the feature/anthropic-prompt-caching branch June 25, 2025 17:22

github-actions bot locked and limited conversation to collaborators Jun 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Assistant: Basic Anthropic prompt caching #8246

Assistant: Basic Anthropic prompt caching #8246

Uh oh!

seeM commented Jun 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 23, 2025 •

edited

Loading

Uh oh!

seeM commented Jun 23, 2025 •

edited

Loading

Uh oh!

wch commented Jun 24, 2025

Uh oh!

seeM commented Jun 25, 2025

Uh oh!

jmcphers left a comment

Uh oh!

Uh oh!

Uh oh!

Assistant: Basic Anthropic prompt caching #8246

Assistant: Basic Anthropic prompt caching #8246

Uh oh!

Conversation

seeM commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

New Features

Bug Fixes

QA Notes

Uh oh!

github-actions bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seeM commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wch commented Jun 24, 2025

Uh oh!

seeM commented Jun 25, 2025

Uh oh!

jmcphers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

seeM commented Jun 23, 2025 •

edited

Loading

github-actions bot commented Jun 23, 2025 •

edited

Loading

seeM commented Jun 23, 2025 •

edited

Loading