Skip to content

tightened up language to make the responses cookbooks more readable. #1847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions examples/responses_api/reasoning_items.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,8 @@
"metadata": {},
"source": [
"## Caching\n",
"As illustrated above, reasoning models produce both reasoning tokens and completion tokens that are treated differently in the API today. This also has implications for cache utilization and latency. To illustrate the point, we include this helpful sketch.\n",
"\n",
"As shown above, reasoning models generate both reasoning tokens and completion tokens, which the API handles differently. This distinction affects how caching works and impacts both performance and latency. The following diagram illustrates these concepts:\n",
"\n",
"![reasoning-context](../../images/responses-diagram.png)"
]
Expand All @@ -296,7 +297,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In turn 2, reasoning items from turn 1 are ignored and stripped, since the model doesn't reuse reasoning items from previous turns. This makes it impossible to get a full cache hit on the fourth API call in the diagram above, as the prompt now omits those reasoning items. However, including them does no harm—the API will automatically remove any reasoning items that aren't relevant for the current turn. Note that caching only matters for prompts longer than 1024 tokens. In our tests, switching from Completions to the Responses API increased cache utilization from 40% to 80%. Better cache utilization means better economics, since cached tokens are billed much less: for `o4-mini`, cached input tokens are 75% cheaper than uncached ones. Latency also improves."
"In turn 2, any reasoning items from turn 1 are ignored and removed, since the model does not reuse reasoning items from previous turns. As a result, the fourth API call in the diagram cannot achieve a full cache hit, because those reasoning items are missing from the prompt. However, including them is harmless—the API will simply discard any reasoning items that arent relevant for the current turn. Keep in mind that caching only impacts prompts longer than 1024 tokens. In our tests, switching from the Completions API to the Responses API boosted cache utilization from 40% to 80%. Higher cache utilization leads to lower costs (for example, cached input tokens for `o4-mini` are 75% cheaper than uncached ones) and improved latency."
]
},
{
Expand All @@ -305,13 +306,15 @@
"source": [
"## Encrypted Reasoning Items\n",
"\n",
"For organizations that can't use the Responses API statefully due to compliance or data retention requirements (such as [Zero Data Retention](https://openai.com/enterprise-privacy/)), we've introduced [encrypted reasoning items](https://platform.openai.com/docs/guides/reasoning?api-mode=responses#encrypted-reasoning-items). This lets you get all the benefits of reasoning items while keeping your workflow stateless.\n",
"Some organizations—such as those with [Zero Data Retention (ZDR)](https://openai.com/enterprise-privacy/) requirements—cannot use the Responses API in a stateful way due to compliance or data retention policies. To support these cases, OpenAI offers [encrypted reasoning items](https://platform.openai.com/docs/guides/reasoning?api-mode=responses#encrypted-reasoning-items), allowing you to keep your workflow stateless while still benefiting from reasoning items.\n",
"\n",
"To use this, simply add `[\"reasoning.encrypted_content\"]` to the `include` field. You'll receive an encrypted version of the reasoning tokens, which you can pass back to the API just as you would with regular reasoning items.\n",
"To use encrypted reasoning items:\n",
"- Add `[\"reasoning.encrypted_content\"]` to the `include` field in your API call.\n",
"- The API will return an encrypted version of the reasoning tokens, which you can pass back in future requests just like regular reasoning items.\n",
"\n",
"For Zero Data Retention (ZDR) organizations, OpenAI enforces `store=false` at the API level. When a request arrives, the API checks for any `encrypted_content` in the payload. If present, it's decrypted in-memory using keys only OpenAI can access. This decrypted reasoning (chain-of-thought) is never written to disk and is used only for generating the next response. Any new reasoning tokens are immediately encrypted and returned to you. All transient data—including decrypted inputs and model outputs—is securely discarded after the response, with no intermediate state persisted, ensuring full ZDR compliance.\n",
"For ZDR organizations, OpenAI enforces `store=false` automatically. When a request includes `encrypted_content`, it is decrypted in-memory (never written to disk), used for generating the next response, and then securely discarded. Any new reasoning tokens are immediately encrypted and returned to you, ensuring no intermediate state is ever persisted.\n",
"\n",
"Here’s a quick update to the earlier code snippet to show how this works:"
"Here’s a quick code update to show how this works:"
]
},
{
Expand Down Expand Up @@ -451,7 +454,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Reasoning summary text enables you to design user experiences where users can peek into the model's thought process. For example, in conversations involving multiple function calls, users can see not only which function calls are made, but also the reasoning behind each tool call—without having to wait for the final assistant message. This provides greater transparency and interactivity in your application's UX."
"Reasoning summary text lets you give users a window into the models thought process. For example, during conversations with multiple function calls, users can see both which functions were called and the reasoning behind each call—without waiting for the final assistant message. This adds transparency and interactivity to your application’s user experience."
]
},
{
Expand Down
19 changes: 10 additions & 9 deletions examples/responses_api/responses_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is the Responses API\n",
"## What is the Responses API?\n",
"\n",
"The Responses API is a new API that focuses on greater simplicity and greater expressivity when using our APIs. It is designed for multiple tools, multiple turns, and multiple modalities — as opposed to current APIs, which either have these features bolted onto an API designed primarily for text in and out (chat completions) or need a lot bootstrapping to perform simple actions (assistants api).\n",
"The Responses API is a new way to interact with OpenAI models, designed to be simpler and more flexible than previous APIs. It makes it easy to build advanced AI applications that use multiple tools, handle multi-turn conversations, and work with different types of data (not just text).\n",
"\n",
"Here I will show you a couple of new features that the Responses API has to offer and tie it all together at the end.\n",
"`responses` solves for a number of user painpoints with our current set of APIs. During our time with the completions API, we found that folks wanted:\n",
"Unlike older APIs—such as Chat Completions, which were built mainly for text, or the Assistants API, which can require a lot of setup—the Responses API is built from the ground up for:\n",
"\n",
"- the ability to easily perform multi-turn model interactions in a single API call\n",
"- to have access to our hosted tools (file_search, web_search, code_interpreter)\n",
"- granular control over the context sent to the model\n",
"- Seamless multi-turn interactions (carry on a conversation across several steps in a single API call)\n",
"- Easy access to powerful hosted tools (like file search, web search, and code interpreter)\n",
"- Fine-grained control over the context you send to the model\n",
"\n",
"As models start to develop longer running reasoning and thinking capabilities, users will want an async-friendly and stateful primitive. Response solves for this. \n"
"As AI models become more capable of complex, long-running reasoning, developers need an API that is both asynchronous and stateful. The Responses API is designed to meet these needs.\n",
"\n",
"In this guide, you'll see some of the new features the Responses API offers, along with practical examples to help you get started."
]
},
{
Expand Down Expand Up @@ -181,7 +182,7 @@
"\n",
"Another benefit of the Responses API is that it adds support for hosted tools like `file_search` and `web_search`. Instead of manually calling the tools, simply pass in the tools and the API will decide which tool to use and use it.\n",
"\n",
"Here is an example of using the `web_search` tool to incorporate web search results into the response. You may already be familiar with how ChatGPT can search the web. You can now build similar experiences too! The web search tool uses the OpenAI Index, the one that powers the web search in ChatGPT, having being optimized for chat applications.\n"
"Here is an example of using the `web_search` tool to incorporate web search results into the response."
]
},
{
Expand Down
1 change: 1 addition & 0 deletions registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# should build pages for, and indicates metadata such as tags, creation date and
# authors for each page.


- title: Better performance from reasoning models using the Responses API
path: examples/responses_api/reasoning_items.ipynb
date: 2025-05-11
Expand Down