diff --git a/examples/responses_api/reasoning_items.ipynb b/examples/responses_api/reasoning_items.ipynb new file mode 100644 index 0000000000..fc5928bb69 --- /dev/null +++ b/examples/responses_api/reasoning_items.ipynb @@ -0,0 +1,490 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Better performance from reasoning models using the Responses API \n", + "\n", + "### Overview\n", + "\n", + "By leveraging the Responses API with OpenAI’s latest reasoning models, you can unlock higher intelligence, lower costs, and more efficient token usage in your applications. The API also enables access to reasoning summaries, supports features like hosted-tool use, and is designed to accommodate upcoming enhancements for even greater flexibility and performance.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "We've recently released two new state-of-the-art reasoning models, o3 and o4-mini, that excel at combining reasoning capabilities with agentic tool use. What many folks don't know is that you can improve their performance by fully leveraging our (relatively) new Responses API. This cookbook shows how to get the most out of these models and explores how reasoning and function calling work behind the scenes. By giving the model access to previous reasoning items, we can ensure it operates at maximum intelligence and lowest cost." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We introduced the Responses API with a separate [cookbook](https://cookbook.openai.com/examples/responses_api/responses_example) and [API reference](https://platform.openai.com/docs/api-reference/responses). The main takeaway: the Responses API is similar to the Completions API, but with improvements and added features. We've also rolled out encrypted content for Responses, making it even more useful for those who can't use the API in a stateful way!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## How Reasoning Models work\n", + "\n", + "Before we dive into how the Responses API can help, let's quickly review how [reasoning models](https://platform.openai.com/docs/guides/reasoning?api-mode=responses) work. Models like o3 and o4-mini break problems down step by step, producing an internal chain of thought that encodes their reasoning. For safety, these reasoning tokens are only exposed to users in summarized form." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In a multistep conversation, the reasoning tokens are discarded after each turn while input and output tokens from each step are fed into the next\n", + "\n", + "![reasoning-context](../../images/reasoning-turns.png)\n", + "Diagram borrowed from our [doc](https://platform.openai.com/docs/guides/reasoning?api-mode=responses#how-reasoning-works)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let us examine the response object being returned:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "from openai import OpenAI\n", + "import os\n", + "client = OpenAI(api_key=os.getenv(\"OPENAI_API_KEY\"))" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "response = client.responses.create(\n", + " model=\"o4-mini\",\n", + " input=\"tell me a joke\",\n", + ")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"id\": \"resp_6820f382ee1c8191bc096bee70894d040ac5ba57aafcbac7\",\n", + " \"created_at\": 1746989954.0,\n", + " \"error\": null,\n", + " \"incomplete_details\": null,\n", + " \"instructions\": null,\n", + " \"metadata\": {},\n", + " \"model\": \"o4-mini-2025-04-16\",\n", + " \"object\": \"response\",\n", + " \"output\": [\n", + " {\n", + " \"id\": \"rs_6820f383d7c08191846711c5df8233bc0ac5ba57aafcbac7\",\n", + " \"summary\": [],\n", + " \"type\": \"reasoning\",\n", + " \"status\": null\n", + " },\n", + " {\n", + " \"id\": \"msg_6820f3854688819187769ff582b170a60ac5ba57aafcbac7\",\n", + " \"content\": [\n", + " {\n", + " \"annotations\": [],\n", + " \"text\": \"Why don\\u2019t scientists trust atoms? \\nBecause they make up everything!\",\n", + " \"type\": \"output_text\"\n", + " }\n", + " ],\n", + " \"role\": \"assistant\",\n", + " \"status\": \"completed\",\n", + " \"type\": \"message\"\n", + " }\n", + " ],\n", + " \"parallel_tool_calls\": true,\n", + " \"temperature\": 1.0,\n", + " \"tool_choice\": \"auto\",\n", + " \"tools\": [],\n", + " \"top_p\": 1.0,\n", + " \"max_output_tokens\": null,\n", + " \"previous_response_id\": null,\n", + " \"reasoning\": {\n", + " \"effort\": \"medium\",\n", + " \"generate_summary\": null,\n", + " \"summary\": null\n", + " },\n", + " \"status\": \"completed\",\n", + " \"text\": {\n", + " \"format\": {\n", + " \"type\": \"text\"\n", + " }\n", + " },\n", + " \"truncation\": \"disabled\",\n", + " \"usage\": {\n", + " \"input_tokens\": 10,\n", + " \"input_tokens_details\": {\n", + " \"cached_tokens\": 0\n", + " },\n", + " \"output_tokens\": 148,\n", + " \"output_tokens_details\": {\n", + " \"reasoning_tokens\": 128\n", + " },\n", + " \"total_tokens\": 158\n", + " },\n", + " \"user\": null,\n", + " \"service_tier\": \"default\",\n", + " \"store\": true\n", + "}\n" + ] + } + ], + "source": [ + "import json\n", + "\n", + "print(json.dumps(response.model_dump(), indent=2))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the JSON dump of the response object, you can see that in addition to the `output_text`, the model also produces a reasoning item. This item represents the model's internal reasoning tokens and is exposed as an ID—here, for example, `rs_6820f383d7c08191846711c5df8233bc0ac5ba57aafcbac7`. Because the Responses API is stateful, these reasoning tokens persist: just include their IDs in subsequent messages to give future responses access to the same reasoning items. If you use `previous_response_id` for multi-turn conversations, the model will automatically have access to all previously produced reasoning items.\n", + "\n", + "You can also see how many reasoning tokens the model generated. For example, with 10 input tokens, the response included 148 output tokens—128 of which are reasoning tokens not shown in the final assistant message." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Wait—didn’t the diagram show that reasoning from previous turns is discarded? So why bother passing it back in later turns?\n", + "\n", + "Great question! In typical multi-turn conversations, you don’t need to include reasoning items or tokens—the model is trained to produce the best output without them. However, things change when tool use is involved. If a turn includes a function call (which may require an extra round trip outside the API), you do need to include the reasoning items—either via `previous_response_id` or by explicitly adding the reasoning item to `input`. Let’s see how this works with a quick function-calling example." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[ResponseReasoningItem(id='rs_68210c71a95c81919cc44afadb9d220400c77cc15fd2f785', summary=[], type='reasoning', status=None),\n", + " ResponseFunctionToolCall(arguments='{\"latitude\":48.8566,\"longitude\":2.3522}', call_id='call_9ylqPOZUyFEwhxvBwgpNDqPT', name='get_weather', type='function_call', id='fc_68210c78357c8191977197499d5de6ca00c77cc15fd2f785', status='completed')]" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import requests\n", + "\n", + "def get_weather(latitude, longitude):\n", + " response = requests.get(f\"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}¤t=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m\")\n", + " data = response.json()\n", + " return data['current']['temperature_2m']\n", + "\n", + "\n", + "tools = [{\n", + " \"type\": \"function\",\n", + " \"name\": \"get_weather\",\n", + " \"description\": \"Get current temperature for provided coordinates in celsius.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"latitude\": {\"type\": \"number\"},\n", + " \"longitude\": {\"type\": \"number\"}\n", + " },\n", + " \"required\": [\"latitude\", \"longitude\"],\n", + " \"additionalProperties\": False\n", + " },\n", + " \"strict\": True\n", + "}]\n", + "\n", + "context = [{\"role\": \"user\", \"content\": \"What's the weather like in Paris today?\"}]\n", + "\n", + "response = client.responses.create(\n", + " model=\"o4-mini\",\n", + " input=context,\n", + " tools=tools,\n", + ")\n", + "\n", + "\n", + "response.output" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After some reasoning, the o4-mini model determines it needs more information and calls a function to get it. We can call the function and return its output to the model. Crucially, to maximize the model’s intelligence, we should include the reasoning item by simply adding all of the output back into the context for the next turn." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The current temperature in Paris is 16.3°C. If you’d like more details—like humidity, wind speed, or a brief description of the sky—just let me know!\n" + ] + } + ], + "source": [ + "context += response.output # Add the response to the context (including the reasoning item)\n", + "\n", + "tool_call = response.output[1]\n", + "args = json.loads(tool_call.arguments)\n", + "\n", + "\n", + "# calling the function\n", + "result = get_weather(args[\"latitude\"], args[\"longitude\"]) \n", + "\n", + "context.append({ \n", + " \"type\": \"function_call_output\",\n", + " \"call_id\": tool_call.call_id,\n", + " \"output\": str(result)\n", + "})\n", + "\n", + "# we are calling the api again with the added function call output. Note that while this is another API call, we consider this as a single turn in the conversation.\n", + "response_2 = client.responses.create(\n", + " model=\"o4-mini\",\n", + " input=context,\n", + " tools=tools,\n", + ")\n", + "\n", + "print(response_2.output_text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "While this toy example may not clearly show the benefits—since the model will likely perform well with or without the reasoning item—our own tests found otherwise. On a more rigorous benchmark like SWE-bench, including reasoning items led to about a **3% improvement** for the same prompt and setup." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Caching\n", + "As illustrated above, reasoning models produce both reasoning tokens and completion tokens that are treated differently in the API today. This also has implications for cache utilization and latency. To illustrate the point, we include this helpful sketch.\n", + "\n", + "![reasoning-context](../../images/responses-diagram.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In turn 2, reasoning items from turn 1 are ignored and stripped, since the model doesn't reuse reasoning items from previous turns. This makes it impossible to get a full cache hit on the fourth API call in the diagram above, as the prompt now omits those reasoning items. However, including them does no harm—the API will automatically remove any reasoning items that aren't relevant for the current turn. Note that caching only matters for prompts longer than 1024 tokens. In our tests, switching from Completions to the Responses API increased cache utilization from 40% to 80%. Better cache utilization means better economics, since cached tokens are billed much less: for `o4-mini`, cached input tokens are 75% cheaper than uncached ones. Latency also improves." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Encrypted Reasoning Items\n", + "\n", + "For organizations that can't use the Responses API statefully due to compliance or data retention requirements (such as [Zero Data Retention](https://openai.com/enterprise-privacy/)), we've introduced [encrypted reasoning items](https://platform.openai.com/docs/guides/reasoning?api-mode=responses#encrypted-reasoning-items). This lets you get all the benefits of reasoning items while keeping your workflow stateless.\n", + "\n", + "To use this, simply add `[\"reasoning.encrypted_content\"]` to the `include` field. You'll receive an encrypted version of the reasoning tokens, which you can pass back to the API just as you would with regular reasoning items.\n", + "\n", + "For Zero Data Retention (ZDR) organizations, OpenAI enforces `store=false` at the API level. When a request arrives, the API checks for any `encrypted_content` in the payload. If present, it's decrypted in-memory using keys only OpenAI can access. This decrypted reasoning (chain-of-thought) is never written to disk and is used only for generating the next response. Any new reasoning tokens are immediately encrypted and returned to you. All transient data—including decrypted inputs and model outputs—is securely discarded after the response, with no intermediate state persisted, ensuring full ZDR compliance.\n", + "\n", + "Here’s a quick update to the earlier code snippet to show how this works:" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "context = [{\"role\": \"user\", \"content\": \"What's the weather like in Paris today?\"}]\n", + "\n", + "response = client.responses.create(\n", + " model=\"o3\",\n", + " input=context,\n", + " tools=tools,\n", + " store=False, #store=false, just like how ZDR is enforced\n", + " include=[\"reasoning.encrypted_content\"] # Encrypted chain of thought is passed back in the response\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ResponseReasoningItem(id='rs_6821243503d481919e1b385c2a154d5103d2cbc5a14f3696', summary=[], type='reasoning', status=None, encrypted_content='gAAAAABoISQ24OyVRYbkYfukdJoqdzWT-3uiErKInHDC-lgAaXeky44N77j7aibc2elHISjAvX7OmUwMU1r7NgaiHSVWL5BtWgXVBp4BMFkWZpXpZY7ff5pdPFnW3VieuF2cSo8Ay7tJ4aThGUnXkNM5QJqk6_u5jwd-W9cTHjucw9ATGfGqD2qHrXyj6NEW9RmpWHV2SK41d5TpUYdN0xSuIUP98HBVZ2VGgD4MIocUm6Lx0xhRl9KUx19f7w4Sn7SCpKUQ0zwXze8UsQOVvv1HQxk_yDosbIg1SylEj38H-DNLil6yUFlWI4vGWcPn1bALXphTR2EwYVR52nD1rCFEORUd7prS99i18MUMSAhghIVv9OrpbjmfxJh8bSQaHu1ZDTMWcfC58H3i8KnogmI7V_h2TKAiLTgSQIkYRHnV3hz1XwaUqYAIhBvP6c5UxX-j_tpYpB_XCpD886L0XyJxCmfr9cwitipOhHr8zfLVwMI4ULu-P3ftw7QckIVzf71HFLNixrQlkdgTn-zM6aQl5BZcJgwwn3ylJ5ji4DQTS1H3AiTrFsEt4kyiBcE2d7tYA_m3G8L-e4-TuTDdJZtLaz-q8J12onFaKknGSyU6px8Ki4IPqnWIJw8SaFMJ5fSUYJO__myhp7lbbQwuOZHIQuvKutM-QUuR0cHus_HtfWtZZksqvVCVNBYViBxD2_KvKJvR-nN62zZ8sNiydIclt1yJfIMkiRErfRTzv92hQaUtdqz80UiW7FBcN2Lnzt8awXCz1pnGyWy_hNQe8C7W35zRxJDwFdb-f3VpanJT0tNmU5bfEWSXcIVmiMZL1clwzVNryf9Gk482LaWPwhVYrhv2MkhKMPKdeAZWVhZbgm0eTT8a4DgbwcYRGhoXMGxrXWzOdvAY536DkrI_0xsJk8-Szb5Y2EH0xPxN4-CdB_fMPP60TPEQTOP1Qc64cJcQ9p2JE5Jfz59bubF_QGajC9-FtHkD6Q5pT-6CbhuD6xrFJMgxQPcggSDaWL_4260fZCdf6nzMlwPRD3wrfsxs6rFyd8pLC-2SOh9Iv297xAjes8xcnyqvMKSuCkjARr11gJCe0EXnx87NWt2rfW8ODUU0qFYbjFx8Rj9WJtnvQBNyqp7t5LLLf12S8pyyeKTv0ePqC3xDuWdFKmELDUZjarkkCyMHoO12EbXa6YCpY_MpA01c2vV5plrcouVPSwRK0ahbPs0mQnQnDAkfi2XVS0Bzgk2GpNONGf7KWkzD7uTgDtg9UbWI0v_-f-iiBM2kKDz_dIb1opZfaxZEloyiQ2MnWQj2MRefL7WM_0c3IyTAccICN-diGn2f1im82uL9maELcbYn')\n" + ] + } + ], + "source": [ + "# take a look at the encrypted reasoning item\n", + "print(response.output[0]) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "With `include=[\"reasoning.encrypted_content\"]` set, we now see an `encrypted_content` field in the reasoning item being passed back. This encrypted content represents the model's reasoning state, persisted entirely on the client side with OpenAI retaining no data. We can then pass this back just as we did with the reasoning item before." + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "It’s currently about 20 °C in Paris.\n" + ] + } + ], + "source": [ + "context += response.output # Add the response to the context (including the encrypted chain of thought)\n", + "tool_call = response.output[1]\n", + "args = json.loads(tool_call.arguments)\n", + "\n", + "\n", + "\n", + "result = 20 #mocking the result of the function call\n", + "\n", + "context.append({ \n", + " \"type\": \"function_call_output\",\n", + " \"call_id\": tool_call.call_id,\n", + " \"output\": str(result)\n", + "})\n", + "\n", + "response_2 = client.responses.create(\n", + " model=\"o3\",\n", + " input=context,\n", + " tools=tools,\n", + " store=False,\n", + " include=[\"reasoning.encrypted_content\"]\n", + ")\n", + "\n", + "print(response_2.output_text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "With a simple change to the `include` field, we can now pass back the encrypted reasoning item and use it to improve the model's performance in intelligence, cost, and latency.\n", + "\n", + "Now you should be fully equipped with the knowledge to fully utilize our latest reasoning models!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reasoning Summaries\n", + "\n", + "Another useful feature in the Responses API is that it supports reasoning summaries. While we do not expose the raw chain of thought tokens, users can access their [summaries](https://platform.openai.com/docs/guides/reasoning?api-mode=responses#reasoning-summaries)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "First reasoning summary text:\n", + " **Analyzing biological processes**\n", + "\n", + "I think the user is looking for a clear explanation of the differences between certain processes. I should create a side-by-side comparison that lists out key elements like the formulas, energy flow, locations, reactants, products, organisms involved, electron carriers, and whether the processes are anabolic or catabolic. This structured approach will help in delivering a comprehensive answer. It’s crucial to cover all aspects to ensure the user understands the distinctions clearly.\n" + ] + } + ], + "source": [ + "# Make a hard call to o3 with reasoning summary included\n", + "\n", + "response = client.responses.create(\n", + " model=\"o3\",\n", + " input=\"What are the main differences between photosynthesis and cellular respiration?\",\n", + " reasoning={\"summary\": \"auto\"},\n", + "\n", + " \n", + ")\n", + "\n", + "# Extract the first reasoning summary text from the response object\n", + "first_reasoning_item = response.output[0] # Should be a ResponseReasoningItem\n", + "first_summary_text = first_reasoning_item.summary[0].text if first_reasoning_item.summary else None\n", + "print(\"First reasoning summary text:\\n\", first_summary_text)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Reasoning summary text enables you to design user experiences where users can peek into the model's thought process. For example, in conversations involving multiple function calls, users can see not only which function calls are made, but also the reasoning behind each tool call—without having to wait for the final assistant message. This provides greater transparency and interactivity in your application's UX." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "By leveraging the OpenAI Responses API and the latest reasoning models, you can unlock higher intelligence, improved transparency, and greater efficiency in your applications. Whether you’re utilizing reasoning summaries, encrypted reasoning items for compliance, or optimizing for cost and latency, these tools empower you to build more robust and interactive AI experiences.\n", + "\n", + "Happy building!" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/images/reasoning-turns.png b/images/reasoning-turns.png new file mode 100644 index 0000000000..a8c3223633 Binary files /dev/null and b/images/reasoning-turns.png differ diff --git a/images/responses-diagram.png b/images/responses-diagram.png new file mode 100644 index 0000000000..e8627c3ae8 Binary files /dev/null and b/images/responses-diagram.png differ diff --git a/images/responses_cache.png b/images/responses_cache.png new file mode 100644 index 0000000000..3b23ad48ce Binary files /dev/null and b/images/responses_cache.png differ diff --git a/registry.yaml b/registry.yaml index 8387aa2e1a..c3a34157b5 100644 --- a/registry.yaml +++ b/registry.yaml @@ -4,6 +4,16 @@ # should build pages for, and indicates metadata such as tags, creation date and # authors for each page. + +- title: Better performance from reasoning models using the Responses API + path: examples/responses_api/reasoning_items.ipynb + date: 2025-05-11 + authors: + - billchen-openai + tags: + - responses + - functions + - title: Context Summarization with Realtime API path: examples/Context_summarization_with_realtime_api.ipynb date: 2025-05-10