From afa91abf36cca22da6f6a87cddb276745894f3cc Mon Sep 17 00:00:00 2001
From: Tom Pakeman <tompakeman@openai.com>
Date: Fri, 25 Apr 2025 19:08:39 +0100
Subject: [PATCH 1/6] Added example notebook for handling function calls with
 reasoning models

---
 examples/reasoning_function_calls.ipynb | 561 ++++++++++++++++++++++++
 1 file changed, 561 insertions(+)
 create mode 100644 examples/reasoning_function_calls.ipynb

diff --git a/examples/reasoning_function_calls.ipynb b/examples/reasoning_function_calls.ipynb
new file mode 100644
index 0000000000..a5aa120454
--- /dev/null
+++ b/examples/reasoning_function_calls.ipynb
@@ -0,0 +1,561 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Managing Function Calls With Reasoning Models\n",
+    "OpenAI now offers [reasoning models](https://platform.openai.com/docs/guides/reasoning?api-mode=responses) which are trained to follow logical chains of thought, making them better suited for complex or multi-step tasks.\n",
+    "> \"_Reasoning models like o3 and o4-mini are LLMs trained with reinforcement learning to perform reasoning. Reasoning models think before they answer, producing a long internal chain of thought before responding to the user. Reasoning models excel in complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows. They're also the best models for Codex CLI, our lightweight coding agent._\"\n",
+    "\n",
+    "For the most part, using these models via the API is very simple and comparable to using familiar classic 'chat' models. \n",
+    "\n",
+    "However, there are some nuances to bear in mind, particularly when it comes to using features such as function calling. \n",
+    "\n",
+    "All examples in this notebook use the newer [Responses API](https://community.openai.com/t/introducing-the-responses-api/1140929) which provides convenient abstractions for managing conversation state. The principles here are however relevant when using the older chat completions API."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Making API calls to reasoning models"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# pip install openai\n",
+    "# Import libraries \n",
+    "import json, openai\n",
+    "from uuid import uuid4\n",
+    "from typing import Callable\n",
+    "\n",
+    "client = openai.OpenAI()\n",
+    "MODEL_DEFAULTS = {\n",
+    "    \"model\": \"o4-mini\", # 200,000 token context window\n",
+    "    \"reasoning\": {\"effort\": \"low\", \"summary\": \"auto\"}, # Automatically summarise the reasoning process. Can also choose \"detailed\" or \"none\"\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's make a simple call to a reasoning model using the Responses API.\n",
+    "We specify a low reasoning effort and retrieve the response with the helpful `output_text` attribute.\n",
+    "We can ask follow up questions and use the `previous_response_id` to let OpenAI manage the conversation history automatically"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Among the last four Summer Olympic host cities (Beijing 2008, London 2012, Rio de Janeiro 2016 and Tokyo 2020), Rio de Janeiro has by far the highest mean annual temperature—around 23 °C, compared with about 16 °C in Tokyo, 13 °C in Beijing and 11 °C in London.\n",
+      "Of those four, London has the lowest mean annual temperature, at roughly 11 °C.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Let's keep track of the response ids in a naive way, in case we want to reverse the conversation and pick up from a previous point\n",
+    "response = client.responses.create(input=\"Which of the last four Olympic host cities has the highest average temperature?\", **MODEL_DEFAULTS)\n",
+    "print(response.output_text)\n",
+    "response = client.responses.create(input=\"what about the lowest?\", previous_response_id=response.id, **MODEL_DEFAULTS)\n",
+    "print(response.output_text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Nice and easy!\n",
+    "\n",
+    "We're asking relatively complex questions that may requires the model to reason out a plan and proceed through it in steps, but this reasoning is hidden from us. We simply wait a little longer before being shown the output. \n",
+    "However, if we inspect the output we can see that the model has made use of a hidden set of 'reasoning' tokens that were included in the model context window, but not exposed to us as end users.\n",
+    "We can see these tokens and a summary of the reasoning (but not the literal tokens used) in the response"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "**Determining Olympic cities**\n",
+      "\n",
+      "The user is asking about the last four Olympic host cities, assuming it’s for the Summer Olympics. Those would be Beijing in 2008, London in 2012, Rio in 2016, and Tokyo in 2020. They’re interested in the lowest average temperature, which I see is London at around 11°C. Beijing is about 13°C, Tokyo 16°C, but London has the lowest. I should clarify it's the mean annual temperature. So, I'll present it neatly that London is the answer.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'input_tokens': 109,\n",
+       " 'input_tokens_details': {'cached_tokens': 0},\n",
+       " 'output_tokens': 89,\n",
+       " 'output_tokens_details': {'reasoning_tokens': 64},\n",
+       " 'total_tokens': 198}"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "print(next(rx for rx in response.output if rx.type == 'reasoning').summary[0].text)\n",
+    "response.usage.to_dict()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It is important to know about these reasoning tokens, because it means we will consume our available context window more quickly than with traditional chat models. More on this later.\n",
+    "\n",
+    "## Calling custom functions\n",
+    "What happens if we ask the model a complex request that also requires the use of custom tools?\n",
+    "* Let's imagine we have more questions about Olympic Cities, but we also have an internal database that contains IDs for each city.\n",
+    "* It's possible that the model will need to invoke our tool partway through its reasoning process before returning a result.\n",
+    "* Let's make a function that produces a random UUID and ask the model to reason about these UUIDs. \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "\n",
+    "def get_city_uuid(city: str) -> str:\n",
+    "    \"\"\"Just a fake tool to return a fake UUID\"\"\"\n",
+    "    uuid = str(uuid4())\n",
+    "    return f\"{city} ID: {uuid}\"\n",
+    "\n",
+    "# The tool schema that we will pass to the model\n",
+    "tools = [\n",
+    "    {\n",
+    "        \"type\": \"function\",\n",
+    "        \"name\": \"get_city_uuid\",\n",
+    "        \"description\": \"Retrieve the internal ID for a city from the internal database. Only invoke this function if the user needs to know the internal ID for a city.\",\n",
+    "        \"parameters\": {\n",
+    "            \"type\": \"object\",\n",
+    "            \"properties\": {\n",
+    "                \"city\": {\"type\": \"string\", \"description\": \"The name of the city to get information about\"}\n",
+    "            },\n",
+    "            \"required\": [\"city\"]\n",
+    "        }\n",
+    "    }\n",
+    "]\n",
+    "\n",
+    "# This is a general practice - we need a mapping of the tool names we tell the model about, and the functions that implement them.\n",
+    "tool_mapping = {\n",
+    "    \"get_city_uuid\": get_city_uuid\n",
+    "}\n",
+    "\n",
+    "# Let's add this to our defaults so we don't have to pass it every time\n",
+    "MODEL_DEFAULTS[\"tools\"] = tools\n",
+    "\n",
+    "response = client.responses.create(input=\"What's the internal ID for the lowest-temperature city?\", previous_response_id=response.id, **MODEL_DEFAULTS)\n",
+    "print(response.output_text)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We didn't get an `output_text` this time. Let's look at the response output"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[ResponseReasoningItem(id='rs_680bcde645a08191bbb8b42ba4613aef07423969e3977116', summary=[], type='reasoning', status=None),\n",
+       " ResponseFunctionToolCall(arguments='{\"city\":\"London\"}', call_id='call_VcyIJQnP7HW2gge7Nh8HmPNG', name='get_city_uuid', type='function_call', id='fc_680bcde7cda48191ada496d462ca7c5407423969e3977116', status='completed')]"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "response.output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Along with the reasoning step, the model has successfully identified the need for a tool call and passed back instructions to send to our function call. \n",
+    "Let's invoke the function and pass the results back to the model so it can continue.\n",
+    "Function responses are a special kind of message, so we need to structure our next message as a special kind of input:\n",
+    "```json\n",
+    "{\n",
+    "    \"type\": \"function_call_output\",\n",
+    "    \"call_id\": function_call.call_id,\n",
+    "    \"output\": tool_output\n",
+    "}\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Extract the function call(s) from the response\n",
+    "new_conversation_items = []\n",
+    "function_calls = [rx for rx in response.output if rx.type == 'function_call']\n",
+    "for function_call in function_calls:\n",
+    "    target_tool = tool_mapping.get(function_call.name)\n",
+    "    if not target_tool:\n",
+    "        raise ValueError(f\"No tool found for function call: {function_call.name}\")\n",
+    "    arguments = json.loads(function_call.arguments) # Load the arguments as a dictionary\n",
+    "    tool_output = target_tool(**arguments) # Invoke the tool with the arguments\n",
+    "    new_conversation_items.append({\n",
+    "        \"type\": \"function_call_output\",\n",
+    "        \"call_id\": function_call.call_id, # We map the call_id back to the original function call\n",
+    "        \"output\": tool_output\n",
+    "    })"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The internal ID for London is ce863d03-9c01-4de2-9af8-96b123852aec.\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = client.responses.create(\n",
+    "    input=new_conversation_items,\n",
+    "    previous_response_id=response.id,\n",
+    "    **MODEL_DEFAULTS\n",
+    ")\n",
+    "print(response.output_text)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This works great here - as we know that a single function call is all that is required for the model to respond - but we also need to account for situations where multiple tool calls might need to be executed for the reasoning to complete.\n",
+    "\n",
+    "## Executing multiple functions in series\n",
+    "\n",
+    "Some OpenAI models support the parameter `parallel_tool_calls` which allows the model to return an array of functions which we can then execute in parallel. However, reasoning models may produce a sequence of function calls that must be made in series, particularly as some steps may depend on the results of previous ones.\n",
+    "As such, we ought to define a general pattern which we can use to handle arbitrarily complex reasoning workflows:\n",
+    "* At each step in the conversation, initialise a loop\n",
+    "* If the response contains function calls, we must assume the reasoning is ongoing and we should feed the function results (and any intermediate reasoning) back into the model for further inference\n",
+    "* If there are no function calls and we instead receive a Reponse.output with a type of 'message', we can safely assume the agent has finished reasoning and we can break out of the loop"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's wrap our logic above into a function which we can use to invoke tool calls.\n",
+    "def invoke_functions_from_response(response,\n",
+    "                                   tool_mapping: dict[str, Callable] = tool_mapping\n",
+    "                                   ) -> list[dict]:\n",
+    "    \"\"\"Extract all function calls from the response, look up the corresponding tool function(s) and execute them.\n",
+    "    (This would be a good place to handle asynchroneous tool calls, or ones that take a while to execute.)\n",
+    "    This returns a list of messages to be added to the conversation history.\n",
+    "    \"\"\"\n",
+    "    intermediate_messages = []\n",
+    "    for response_item in response.output:\n",
+    "        if response_item.type == 'function_call':\n",
+    "            target_tool = tool_mapping.get(response_item.name)\n",
+    "            if target_tool:\n",
+    "                try:\n",
+    "                    arguments = json.loads(response_item.arguments)\n",
+    "                    print(f\"Invoking tool: {response_item.name}({arguments})\")\n",
+    "                    tool_output = target_tool(**arguments)\n",
+    "                    intermediate_messages.append({\n",
+    "                        \"type\": \"function_call_output\",\n",
+    "                        \"call_id\": response_item.call_id,\n",
+    "                        \"output\": tool_output\n",
+    "                    })\n",
+    "                except Exception as e:\n",
+    "                    tool_output = f\"Error executing function call: {function_call.name}: {e}\"\n",
+    "            else:\n",
+    "                print(f\"ERROR - No tool registered for function call: {function_call.name}\")\n",
+    "    return intermediate_messages"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let's demonstrate the loop concept we discussed before."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Invoking tool: get_city_uuid({'city': 'Turin'})\n",
+      "More reasoning required, continuing...\n",
+      "Invoking tool: get_city_uuid({'city': 'Beijing'})\n",
+      "More reasoning required, continuing...\n",
+      "Invoking tool: get_city_uuid({'city': 'Vancouver'})\n",
+      "More reasoning required, continuing...\n",
+      "Invoking tool: get_city_uuid({'city': 'London'})\n",
+      "More reasoning required, continuing...\n",
+      "Invoking tool: get_city_uuid({'city': 'Sochi'})\n",
+      "More reasoning required, continuing...\n",
+      "Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})\n",
+      "More reasoning required, continuing...\n",
+      "Invoking tool: get_city_uuid({'city': 'Pyeongchang'})\n",
+      "More reasoning required, continuing...\n",
+      "Invoking tool: get_city_uuid({'city': 'Tokyo'})\n",
+      "More reasoning required, continuing...\n",
+      "Invoking tool: get_city_uuid({'city': 'Paris'})\n",
+      "More reasoning required, continuing...\n",
+      "Here are the internal IDs for the cities that have hosted the Olympics in the last 20 years:\n",
+      "\n",
+      "• Turin: 53c0e635-7a1c-478b-84ca-742a6f0df830  \n",
+      "• Beijing: 2c48757a-a1ed-48e7-897f-9edecf4909b5  \n",
+      "• Vancouver: cc8be1f1-5154-46f4-8879-451e97f771c7  \n",
+      "• London: a24addb0-4dd4-444c-a4a9-199612e0aca8  \n",
+      "• Sochi: da7386b3-2283-45cc-9244-c1e0f4121782  \n",
+      "• Rio de Janeiro: 01f60ec2-0efd-40b8-bb85-e63c2d2ddf4c  \n",
+      "• Pyeongchang: f5d3687a-0097-4551-800c-aec66c37e8db  \n",
+      "• Tokyo: 15aa0b12-7f7c-43d0-9ba3-b91250cafe48  \n",
+      "• Paris: 56d062f2-8835-4707-a826-5d68d8be9d3f  \n",
+      "\n",
+      "Of these, the only city whose ID begins with “2” is:\n",
+      "• Beijing: 2c48757a-a1ed-48e7-897f-9edecf4909b5\n"
+     ]
+    }
+   ],
+   "source": [
+    "initial_question = \"What are the internal IDs for the cities that have hosted the Olympics in the last 20 years, and which cities have IDs beginning with the number '2'. Use your internal tools to look up the IDs?\"\n",
+    "\n",
+    "# We fetch a response and then kick off a loop to handle the response\n",
+    "response = client.responses.create(\n",
+    "    input=initial_question,\n",
+    "    **MODEL_DEFAULTS,\n",
+    ")\n",
+    "while True:   \n",
+    "    function_responses = invoke_functions_from_response(response)\n",
+    "    messages = [rx.to_dict() for rx in response.output if rx.type == 'message']\n",
+    "    if len(function_responses) == 0: # We're done reasoning\n",
+    "        print(response.output_text)\n",
+    "        break\n",
+    "    else:\n",
+    "        print(\"More reasoning required, continuing...\")\n",
+    "        response = client.responses.create(\n",
+    "            input=function_responses,\n",
+    "            previous_response_id=response.id,\n",
+    "            **MODEL_DEFAULTS\n",
+    "        )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Manual conversation orchestration\n",
+    "So far so good! It's really cool to watch the model pause execution to run a function before continuing. \n",
+    "In practice the example above is quite trivial, and production use cases may be much more complex:\n",
+    "* Our context window may grow too large and we may wish to prune older and less relevant messages\n",
+    "* We may not wish to proceed sequentially using the `previous_response_id` but allow users to navigate back and forth through the conversation and re-generate answers\n",
+    "* We may wish to store messages in our own database for audit purposes rather than relying on OpenAI's storage and orchestration\n",
+    "* etc.\n",
+    "\n",
+    "In these situations we will treat the API as stateless - rather than using `previous_message_id` we will instead make and maintain an array of conversation items that we add to and pass as input. This allows us full control of the conversation.\n",
+    "\n",
+    "This poses some Reasoning model specific nuances to consider. \n",
+    "* In particular, it is essential that we preserve any reasoning and function call responses in our conversation history.\n",
+    "* This is how the model keeps track of what chain-of-thought steps it has run through. The API will error if these are not included.\n",
+    "\n",
+    "Let's run through the example above again, orchestrating the messages ourselves and tracking token usage.\n",
+    "\n",
+    "---\n",
+    "*Note that the code below is structured for readibility - in practice you may wish to consider a more sophisticated workflow to handle edge cases*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "*******************************************************************************\n",
+      "User message: Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a prime number? Use your available tools to look up the IDs for each city.\n",
+      "*******************************************************************************\n",
+      "More reasoning required, continuing...\n",
+      "\n",
+      "Invoking tool: get_city_uuid({'city': 'Beijing'})\n",
+      "Invoking tool: get_city_uuid({'city': 'London'})\n",
+      "Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})\n",
+      "Invoking tool: get_city_uuid({'city': 'Tokyo'})\n",
+      "Invoking tool: get_city_uuid({'city': 'Paris'})\n",
+      "More reasoning required, continuing...\n",
+      "\n",
+      "Here are the UUIDs for each Summer Olympic host city since 2005, with the leading numeric prefix highlighted and assessed for primality:\n",
+      "\n",
+      "• Beijing (2008): 11ab370c-2f59-4c35-b557-f845e22c847b  \n",
+      "  – Leading digits “11” → 11 is prime  \n",
+      "• London (2012): 0fdff00b-cbfb-4b82-bdd8-2107c4100319  \n",
+      "  – Leading digit “0” → 0 is not prime  \n",
+      "• Rio de Janeiro (2016): 9c2202c4-00ab-46ee-a954-a17505e32d64  \n",
+      "  – Leading digit “9” → 9 is not prime  \n",
+      "• Tokyo (2020): c4bf0281-7e84-4489-88e4-750e07211334  \n",
+      "  – No leading digit → N/A  \n",
+      "• Paris (2024): b8c4b88e-dece-435d-b398-94f0ff762c88  \n",
+      "  – No leading digit → N/A  \n",
+      "\n",
+      "Conclusion: Only Beijing’s ID begins with a prime number (“11”).\n",
+      "*******************************************************************************\n",
+      "User message: Great thanks! We've just updated the IDs - could you please check again?\n",
+      "*******************************************************************************\n",
+      "More reasoning required, continuing...\n",
+      "\n",
+      "Invoking tool: get_city_uuid({'city': 'Beijing'})\n",
+      "Invoking tool: get_city_uuid({'city': 'London'})\n",
+      "Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})\n",
+      "Invoking tool: get_city_uuid({'city': 'Tokyo'})\n",
+      "Invoking tool: get_city_uuid({'city': 'Paris'})\n",
+      "Here are the updated UUIDs and their leading numeric prefixes:\n",
+      "\n",
+      "• Beijing (2008): 30b0886f-c4da-431c-8983-33e8bbb4c352  \n",
+      "  – Leading “30” → 30 is not prime  \n",
+      "• London (2012): 72ff5a9d-d147-4ba8-9a87-64e3572ba3bc  \n",
+      "  – Leading “72” → 72 is not prime  \n",
+      "• Rio de Janeiro (2016): 7a45a392-b43a-41be-8eaf-07ec44d42a2b  \n",
+      "  – Leading “7” → 7 is prime  \n",
+      "• Tokyo (2020): f725244f-079f-44e1-a91c-5c31c270c209  \n",
+      "  – Leading “f” → no numeric prefix  \n",
+      "• Paris (2024): b0230ad4-bc35-48be-a198-65a9aaf28fb5  \n",
+      "  – Leading “b” → no numeric prefix  \n",
+      "\n",
+      "Conclusion: After the update, only Rio de Janeiro’s ID begins with a prime number (“7”).\n",
+      "Total tokens used: 9734 (4.87% of o4-mini's context window)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Let's initialise our conversation with the first user message\n",
+    "total_tokens_used = 0\n",
+    "user_messages = [\n",
+    "    \"Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a prime number? Use your available tools to look up the IDs for each city.\",\n",
+    "    \"Great thanks! We've just updated the IDs - could you please check again?\"\n",
+    "    ]\n",
+    "\n",
+    "conversation = []\n",
+    "for message in user_messages:\n",
+    "    conversation_item = {\n",
+    "        \"role\": \"user\",\n",
+    "        \"type\": \"message\",\n",
+    "        \"content\": message\n",
+    "    }\n",
+    "    print(f\"{'*' * 79}\\nUser message: {message}\\n{'*' * 79}\")\n",
+    "    conversation.append(conversation_item)\n",
+    "    while True: # Response loop\n",
+    "        response = client.responses.create(\n",
+    "            input=conversation,\n",
+    "            **MODEL_DEFAULTS\n",
+    "        )\n",
+    "        total_tokens_used += response.usage.total_tokens\n",
+    "        reasoning = [rx.to_dict() for rx in response.output if rx.type == 'reasoning']\n",
+    "        function_calls = [rx.to_dict() for rx in response.output if rx.type == 'function_call']\n",
+    "        messages = [rx.to_dict() for rx in response.output if rx.type == 'message']\n",
+    "        if len(reasoning) > 0:\n",
+    "            print(\"More reasoning required, continuing...\")\n",
+    "            # Ensure we capture any reasoning steps\n",
+    "            conversation.extend(reasoning)\n",
+    "            print('\\n'.join(s['text'] for r in reasoning for s in r['summary']))\n",
+    "        if len(function_calls) > 0:\n",
+    "            function_outputs = invoke_functions_from_response(response)\n",
+    "            # Preserve order of function calls and outputs in case of multiple function calls (currently not supported by reasoning models, but worth considering)\n",
+    "            interleaved = [val for pair in zip(function_calls, function_outputs) for val in pair]\n",
+    "            conversation.extend(interleaved)\n",
+    "        if len(messages) > 0:\n",
+    "            print(response.output_text)\n",
+    "            conversation.extend(messages)\n",
+    "        if len(function_calls) == 0:  # No more functions = We're done reasoning and we're ready for the next user message\n",
+    "            break\n",
+    "print(f\"Total tokens used: {total_tokens_used} ({total_tokens_used / 200_000:.2%} of o4-mini's context window)\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "* Reasoning models can invoke custom functions during their reasoning process, allowing for complex workflows that require external data or operations.\n",
+    "* These models may require multiple function calls in series, as some steps depend on the results of previous ones, necessitating a loop to handle ongoing reasoning.\n",
+    "* It's essential to preserve reasoning and function call responses in the conversation history to maintain the chain-of-thought and avoid errors in the reasoning process.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

From 99a1f84281d3bf9df361aa31534fc694d46fcbfd Mon Sep 17 00:00:00 2001
From: Tom Pakeman <tompakeman@openai.com>
Date: Fri, 25 Apr 2025 19:31:51 +0100
Subject: [PATCH 2/6] Updated authors

---
 authors.yaml  |  5 +++++
 registry.yaml | 11 +++++++++++
 2 files changed, 16 insertions(+)

diff --git a/authors.yaml b/authors.yaml
index abeabf8668..822aed35c4 100644
--- a/authors.yaml
+++ b/authors.yaml
@@ -297,3 +297,8 @@ brandonbaker-openai:
   name: "Brandon Baker"
   website: "https://www.linkedin.com/in/brandonbaker18"
   avatar: "https://avatars.githubusercontent.com/u/208719822"
+
+tompakeman-oai:
+  name: "Tom Pakeman"
+  website: "https://www.linkedin.com/in/tom-pakeman/"
+  avatar: "https://avatars.githubusercontent.com/u/204937754"
diff --git a/registry.yaml b/registry.yaml
index 37fe4657ca..60d419eb26 100644
--- a/registry.yaml
+++ b/registry.yaml
@@ -1925,3 +1925,14 @@
     - katiagg
   tags:
     - images
+
+- title: Handling Function Calls with Reasoning Models
+  path: examples/reasoning_function_calls.ipynb
+  date: 2025-04-25
+  authors:
+    - tompakeman-oai
+  tags:
+    - reasoning
+    - functions
+    - responses
+    - api

From 48aefeef4a444f978f52674b3c3d504c5f51c1c2 Mon Sep 17 00:00:00 2001
From: Tom Pakeman <tompakeman@openai.com>
Date: Mon, 28 Apr 2025 09:42:35 +0100
Subject: [PATCH 3/6] Updated notebook based on PR feedback

---
 examples/reasoning_function_calls.ipynb | 49 ++++++++++++++++---------
 1 file changed, 32 insertions(+), 17 deletions(-)

diff --git a/examples/reasoning_function_calls.ipynb b/examples/reasoning_function_calls.ipynb
index a5aa120454..58b45ffa1a 100644
--- a/examples/reasoning_function_calls.ipynb
+++ b/examples/reasoning_function_calls.ipynb
@@ -5,8 +5,8 @@
    "metadata": {},
    "source": [
     "# Managing Function Calls With Reasoning Models\n",
-    "OpenAI now offers [reasoning models](https://platform.openai.com/docs/guides/reasoning?api-mode=responses) which are trained to follow logical chains of thought, making them better suited for complex or multi-step tasks.\n",
-    "> \"_Reasoning models like o3 and o4-mini are LLMs trained with reinforcement learning to perform reasoning. Reasoning models think before they answer, producing a long internal chain of thought before responding to the user. Reasoning models excel in complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows. They're also the best models for Codex CLI, our lightweight coding agent._\"\n",
+    "OpenAI now offers function calling using [reasoning models](https://platform.openai.com/docs/guides/reasoning?api-mode=responses). Reasoning models are trained to follow logical chains of thought, making them better suited for complex or multi-step tasks.\n",
+    "> _Reasoning models like o3 and o4-mini are LLMs trained with reinforcement learning to perform reasoning. Reasoning models think before they answer, producing a long internal chain of thought before responding to the user. Reasoning models excel in complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows. They're also the best models for Codex CLI, our lightweight coding agent._\n",
     "\n",
     "For the most part, using these models via the API is very simple and comparable to using familiar classic 'chat' models. \n",
     "\n",
@@ -30,11 +30,12 @@
    "source": [
     "# pip install openai\n",
     "# Import libraries \n",
-    "import json, openai\n",
+    "import json\n",
+    "from openai import OpenAI\n",
     "from uuid import uuid4\n",
     "from typing import Callable\n",
     "\n",
-    "client = openai.OpenAI()\n",
+    "client = OpenAI()\n",
     "MODEL_DEFAULTS = {\n",
     "    \"model\": \"o4-mini\", # 200,000 token context window\n",
     "    \"reasoning\": {\"effort\": \"low\", \"summary\": \"auto\"}, # Automatically summarise the reasoning process. Can also choose \"detailed\" or \"none\"\n",
@@ -65,10 +66,17 @@
     }
    ],
    "source": [
-    "# Let's keep track of the response ids in a naive way, in case we want to reverse the conversation and pick up from a previous point\n",
-    "response = client.responses.create(input=\"Which of the last four Olympic host cities has the highest average temperature?\", **MODEL_DEFAULTS)\n",
+    "response = client.responses.create(\n",
+    "    input=\"Which of the last four Olympic host cities has the highest average temperature?\",\n",
+    "    **MODEL_DEFAULTS\n",
+    ")\n",
     "print(response.output_text)\n",
-    "response = client.responses.create(input=\"what about the lowest?\", previous_response_id=response.id, **MODEL_DEFAULTS)\n",
+    "\n",
+    "response = client.responses.create(\n",
+    "    input=\"what about the lowest?\",\n",
+    "    previous_response_id=response.id,\n",
+    "    **MODEL_DEFAULTS\n",
+    ")\n",
     "print(response.output_text)"
    ]
   },
@@ -397,8 +405,8 @@
     "## Manual conversation orchestration\n",
     "So far so good! It's really cool to watch the model pause execution to run a function before continuing. \n",
     "In practice the example above is quite trivial, and production use cases may be much more complex:\n",
-    "* Our context window may grow too large and we may wish to prune older and less relevant messages\n",
-    "* We may not wish to proceed sequentially using the `previous_response_id` but allow users to navigate back and forth through the conversation and re-generate answers\n",
+    "* Our context window may grow too large and we may wish to prune older and less relevant messages, or summarize the conversation so far\n",
+    "* We may wish to allow users to navigate back and forth through the conversation and re-generate answers\n",
     "* We may wish to store messages in our own database for audit purposes rather than relying on OpenAI's storage and orchestration\n",
     "* etc.\n",
     "\n",
@@ -526,15 +534,22 @@
    "metadata": {},
    "source": [
     "## Summary\n",
-    "* Reasoning models can invoke custom functions during their reasoning process, allowing for complex workflows that require external data or operations.\n",
-    "* These models may require multiple function calls in series, as some steps depend on the results of previous ones, necessitating a loop to handle ongoing reasoning.\n",
-    "* It's essential to preserve reasoning and function call responses in the conversation history to maintain the chain-of-thought and avoid errors in the reasoning process.\n"
+    "In this cookbook, we identified how to combine function calling with OpenAI's reasoning models to demonstrate multi-step tasks that are dependent on external data sources. \n",
+    "\n",
+    "Importantly, we covered reasoning-model specific nuances in the function calling process, specifically that:\n",
+    "* The model may choose to make multiple function calls or reasoning steps in series, and some steps may depend on the results of previous ones\n",
+    "* We cannot know how many of these steps there will be, so we must process responses with a loop\n",
+    "* The responses API makes orchestration easy using the `previous_response_id` parameter, but where manual control is needed, it's important to maintain the correct order of conversation item to preserve the 'chain-of-thought'\n",
+    "\n",
+    "---\n",
+    "\n",
+    "The examples used here are rather simple, but you can imagine how this technique could be extended to more real-world use cases, such as:\n",
+    "\n",
+    "* Looking up a customer's transaction history and recent correspondence to determine if they are eligible for a promotional offer\n",
+    "* Calling recent transaction logs, geolocation data, and device metadata to assess the likelihood of a transaction being fraudulent\n",
+    "* Reviewing internal HR databases to fetch an employee’s benefits usage, tenure, and recent policy changes to answer personalized HR questions\n",
+    "* Reading internal dashboards, competitor news feeds, and market analyses to compile a daily executive briefing tailored to their focus areas"
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": []
   }
  ],
  "metadata": {

From 737751214b34ddffff9bb167cf93ddce93177543 Mon Sep 17 00:00:00 2001
From: Tom Pakeman <tompakeman@openai.com>
Date: Mon, 28 Apr 2025 10:54:32 +0100
Subject: [PATCH 4/6] Minor updates to spelling

---
 examples/reasoning_function_calls.ipynb | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/examples/reasoning_function_calls.ipynb b/examples/reasoning_function_calls.ipynb
index 58b45ffa1a..2c26478919 100644
--- a/examples/reasoning_function_calls.ipynb
+++ b/examples/reasoning_function_calls.ipynb
@@ -8,11 +8,11 @@
     "OpenAI now offers function calling using [reasoning models](https://platform.openai.com/docs/guides/reasoning?api-mode=responses). Reasoning models are trained to follow logical chains of thought, making them better suited for complex or multi-step tasks.\n",
     "> _Reasoning models like o3 and o4-mini are LLMs trained with reinforcement learning to perform reasoning. Reasoning models think before they answer, producing a long internal chain of thought before responding to the user. Reasoning models excel in complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows. They're also the best models for Codex CLI, our lightweight coding agent._\n",
     "\n",
-    "For the most part, using these models via the API is very simple and comparable to using familiar classic 'chat' models. \n",
+    "For the most part, using these models via the API is very simple and comparable to using familiar 'chat' models. \n",
     "\n",
     "However, there are some nuances to bear in mind, particularly when it comes to using features such as function calling. \n",
     "\n",
-    "All examples in this notebook use the newer [Responses API](https://community.openai.com/t/introducing-the-responses-api/1140929) which provides convenient abstractions for managing conversation state. The principles here are however relevant when using the older chat completions API."
+    "All examples in this notebook use the newer [Responses API](https://community.openai.com/t/introducing-the-responses-api/1140929) which provides convenient abstractions for managing conversation state. However the principles here are relevant when using the older chat completions API."
    ]
   },
   {
@@ -86,9 +86,10 @@
    "source": [
     "Nice and easy!\n",
     "\n",
-    "We're asking relatively complex questions that may requires the model to reason out a plan and proceed through it in steps, but this reasoning is hidden from us. We simply wait a little longer before being shown the output. \n",
+    "We're asking relatively complex questions that may require the model to reason out a plan and proceed through it in steps, but this reasoning is hidden from us - we simply wait a little longer before being shown the response. \n",
+    "\n",
     "However, if we inspect the output we can see that the model has made use of a hidden set of 'reasoning' tokens that were included in the model context window, but not exposed to us as end users.\n",
-    "We can see these tokens and a summary of the reasoning (but not the literal tokens used) in the response"
+    "We can see these tokens and a summary of the reasoning (but not the literal tokens used) in the response."
    ]
   },
   {
@@ -129,7 +130,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "It is important to know about these reasoning tokens, because it means we will consume our available context window more quickly than with traditional chat models. More on this later.\n",
+    "It is important to know about these reasoning tokens, because it means we will consume our available context window more quickly than with traditional chat models.\n",
     "\n",
     "## Calling custom functions\n",
     "What happens if we ask the model a complex request that also requires the use of custom tools?\n",
@@ -182,7 +183,10 @@
     "# Let's add this to our defaults so we don't have to pass it every time\n",
     "MODEL_DEFAULTS[\"tools\"] = tools\n",
     "\n",
-    "response = client.responses.create(input=\"What's the internal ID for the lowest-temperature city?\", previous_response_id=response.id, **MODEL_DEFAULTS)\n",
+    "response = client.responses.create(\n",
+    "    input=\"What's the internal ID for the lowest-temperature city?\",\n",
+    "    previous_response_id=response.id,\n",
+    "    **MODEL_DEFAULTS)\n",
     "print(response.output_text)\n"
    ]
   },
@@ -219,7 +223,8 @@
    "metadata": {},
    "source": [
     "Along with the reasoning step, the model has successfully identified the need for a tool call and passed back instructions to send to our function call. \n",
-    "Let's invoke the function and pass the results back to the model so it can continue.\n",
+    "\n",
+    "Let's invoke the function and send the results to the model so it can continue reasoning.\n",
     "Function responses are a special kind of message, so we need to structure our next message as a special kind of input:\n",
     "```json\n",
     "{\n",
@@ -410,7 +415,7 @@
     "* We may wish to store messages in our own database for audit purposes rather than relying on OpenAI's storage and orchestration\n",
     "* etc.\n",
     "\n",
-    "In these situations we will treat the API as stateless - rather than using `previous_message_id` we will instead make and maintain an array of conversation items that we add to and pass as input. This allows us full control of the conversation.\n",
+    "In these situations we may wish to take full control of the conversation. Rather than using `previous_message_id` we can instead treat the API as 'stateless' and make and maintain an array of conversation items that we send to the model as input each time.\n",
     "\n",
     "This poses some Reasoning model specific nuances to consider. \n",
     "* In particular, it is essential that we preserve any reasoning and function call responses in our conversation history.\n",

From a6c8dd1290344489f259b3f904aa299bc5b799cb Mon Sep 17 00:00:00 2001
From: Tom Pakeman <tompakeman@openai.com>
Date: Tue, 29 Apr 2025 22:19:57 +0100
Subject: [PATCH 5/6] Added fix for error handling

---
 examples/reasoning_function_calls.ipynb | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/examples/reasoning_function_calls.ipynb b/examples/reasoning_function_calls.ipynb
index 2c26478919..127de07e11 100644
--- a/examples/reasoning_function_calls.ipynb
+++ b/examples/reasoning_function_calls.ipynb
@@ -317,15 +317,21 @@
     "                    arguments = json.loads(response_item.arguments)\n",
     "                    print(f\"Invoking tool: {response_item.name}({arguments})\")\n",
     "                    tool_output = target_tool(**arguments)\n",
-    "                    intermediate_messages.append({\n",
-    "                        \"type\": \"function_call_output\",\n",
-    "                        \"call_id\": response_item.call_id,\n",
-    "                        \"output\": tool_output\n",
-    "                    })\n",
     "                except Exception as e:\n",
-    "                    tool_output = f\"Error executing function call: {function_call.name}: {e}\"\n",
+    "                    msg = f\"Error executing function call: {response_item.name}: {e}\"\n",
+    "                    tool_output = msg\n",
+    "                    print(msg)\n",
     "            else:\n",
-    "                print(f\"ERROR - No tool registered for function call: {function_call.name}\")\n",
+    "                msg = f\"ERROR - No tool registered for function call: {response_item.name}\"\n",
+    "                tool_output = msg\n",
+    "                print(msg)\n",
+    "            intermediate_messages.append({\n",
+    "                \"type\": \"function_call_output\",\n",
+    "                \"call_id\": response_item.call_id,\n",
+    "                \"output\": tool_output\n",
+    "            })\n",
+    "        elif response_item.type == 'reasoning':\n",
+    "            print(f'Reasoning step: {response_item.summary}')\n",
     "    return intermediate_messages"
    ]
   },

From c57430e64ddbe9cc1441453fedee4549b6a1f8c2 Mon Sep 17 00:00:00 2001
From: Tom Pakeman <tompakeman@openai.com>
Date: Tue, 13 May 2025 16:55:39 +0100
Subject: [PATCH 6/6] Updated cookbook to include web search example

---
 examples/reasoning_function_calls.ipynb | 207 ++++++++++++++++--------
 1 file changed, 136 insertions(+), 71 deletions(-)

diff --git a/examples/reasoning_function_calls.ipynb b/examples/reasoning_function_calls.ipynb
index 127de07e11..88afb557e8 100644
--- a/examples/reasoning_function_calls.ipynb
+++ b/examples/reasoning_function_calls.ipynb
@@ -24,7 +24,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 13,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -53,15 +53,23 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 14,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Among the last four Summer Olympic host cities (Beijing 2008, London 2012, Rio de Janeiro 2016 and Tokyo 2020), Rio de Janeiro has by far the highest mean annual temperature—around 23 °C, compared with about 16 °C in Tokyo, 13 °C in Beijing and 11 °C in London.\n",
-      "Of those four, London has the lowest mean annual temperature, at roughly 11 °C.\n"
+      "Of the last four Summer Olympic host cities—Beijing (2008), London (2012), Rio de Janeiro (2016) and Tokyo (2020)—Rio de Janeiro has by far the highest average annual temperature.\n",
+      "\n",
+      "Approximate average annual temperatures:  \n",
+      "• Rio de Janeiro: about 23 °C  \n",
+      "• Tokyo: about 16 °C  \n",
+      "• Beijing: about 12 °C  \n",
+      "• London: about 11 °C  \n",
+      "\n",
+      "So Rio de Janeiro is the warmest.\n",
+      "Of those four, London has the lowest average annual temperature, at around 11 °C.\n"
      ]
     }
    ],
@@ -94,29 +102,29 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 15,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "**Determining Olympic cities**\n",
+      "**Determining lowest temperatures**\n",
       "\n",
-      "The user is asking about the last four Olympic host cities, assuming it’s for the Summer Olympics. Those would be Beijing in 2008, London in 2012, Rio in 2016, and Tokyo in 2020. They’re interested in the lowest average temperature, which I see is London at around 11°C. Beijing is about 13°C, Tokyo 16°C, but London has the lowest. I should clarify it's the mean annual temperature. So, I'll present it neatly that London is the answer.\n"
+      "The user previously mentioned four Summer Olympic host cities and found Rio to have the highest average temperature. Now they're asking about the lowest. Considering the averages: London has about 11°C, Beijing around 12°C, Tokyo about 16°C, and Rio roughly 23°C. So, it seems that London's average annual temperature is the lowest at approximately 11°C. I need to keep the focus on this answer since it fits their pattern of inquiry!\n"
      ]
     },
     {
      "data": {
       "text/plain": [
-       "{'input_tokens': 109,\n",
+       "{'input_tokens': 134,\n",
        " 'input_tokens_details': {'cached_tokens': 0},\n",
        " 'output_tokens': 89,\n",
        " 'output_tokens_details': {'reasoning_tokens': 64},\n",
-       " 'total_tokens': 198}"
+       " 'total_tokens': 223}"
       ]
      },
-     "execution_count": 3,
+     "execution_count": 15,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -141,7 +149,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 16,
    "metadata": {},
    "outputs": [
     {
@@ -199,17 +207,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 17,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "[ResponseReasoningItem(id='rs_680bcde645a08191bbb8b42ba4613aef07423969e3977116', summary=[], type='reasoning', status=None),\n",
-       " ResponseFunctionToolCall(arguments='{\"city\":\"London\"}', call_id='call_VcyIJQnP7HW2gge7Nh8HmPNG', name='get_city_uuid', type='function_call', id='fc_680bcde7cda48191ada496d462ca7c5407423969e3977116', status='completed')]"
+       "[ResponseReasoningItem(id='rs_68236afb98748191bedd406ac09304d00650f58f696c6dc4', summary=[], type='reasoning', status=None),\n",
+       " ResponseFunctionToolCall(arguments='{\"city\":\"London\"}', call_id='call_MYB0swVrKRdFMM5zW4SWNj8m', name='get_city_uuid', type='function_call', id='fc_68236afc99b88191a8d1c616fa3b27790650f58f696c6dc4', status='completed')]"
       ]
      },
-     "execution_count": 5,
+     "execution_count": 17,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -237,7 +245,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 18,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -259,14 +267,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 19,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "The internal ID for London is ce863d03-9c01-4de2-9af8-96b123852aec.\n"
+      "The internal ID for London is:\n",
+      "\n",
+      "eaf83b1b-8e3a-469c-a6a7-669c25cb1898\n"
      ]
     }
    ],
@@ -285,6 +295,45 @@
    "source": [
     "This works great here - as we know that a single function call is all that is required for the model to respond - but we also need to account for situations where multiple tool calls might need to be executed for the reasoning to complete.\n",
     "\n",
+    "Let's add a second call to run a web search.\n",
+    "\n",
+    "OpenAI's web search tool is not available out of the box with reasoning models (as of May 2025 - this may soon change) but it's not too hard to create a custom web search function using 4o mini or another web search enabled model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def web_search(query: str) -> str:\n",
+    "    \"\"\"Search the web for information and return back a summary of the results\"\"\"\n",
+    "    result = client.responses.create(\n",
+    "        model=\"gpt-4o-mini\",\n",
+    "        input=f\"Search the web for '{query}' and reply with only the result.\",\n",
+    "        tools=[{\"type\": \"web_search_preview\"}],\n",
+    "    )\n",
+    "    return result.output_text\n",
+    "\n",
+    "tools.append({\n",
+    "        \"type\": \"function\",\n",
+    "        \"name\": \"web_search\",\n",
+    "        \"description\": \"Search the web for information and return back a summary of the results\",\n",
+    "        \"parameters\": {\n",
+    "            \"type\": \"object\",\n",
+    "            \"properties\": {\n",
+    "                \"query\": {\"type\": \"string\", \"description\": \"The query to search the web for.\"}\n",
+    "            },\n",
+    "            \"required\": [\"query\"]\n",
+    "        }\n",
+    "    })\n",
+    "tool_mapping[\"web_search\"] = web_search\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "## Executing multiple functions in series\n",
     "\n",
     "Some OpenAI models support the parameter `parallel_tool_calls` which allows the model to return an array of functions which we can then execute in parallel. However, reasoning models may produce a sequence of function calls that must be made in series, particularly as some steps may depend on the results of previous ones.\n",
@@ -296,7 +345,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 21,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -344,13 +393,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
+      "Reasoning step: [Summary(text=\"**Identifying Olympic cities**\\n\\nThe user is looking for IDs of cities that have hosted the Olympics from 2005 to 2025, which includes both Summer and Winter events. The cities to list are: Turin 2006, Beijing 2008, Vancouver 2010, London 2012, Sochi 2014, Rio de Janeiro 2016, Pyeongchang 2018, and Tokyo 2020 (held in 2021). There's also Paris 2024, which is upcoming but still counts. I should clarify which cities have hosted, considering future hosts only as upcoming.\", type='summary_text'), Summary(text=\"**Organizing Olympic city data**\\n\\nI’m realizing that the range from 2005 to 2025 does include 2024, so Paris is part of the list. I need to avoid duplicating Beijing 2022. The final cities are Turin, Beijing, Vancouver, London, Sochi, Rio de Janeiro, Pyeongchang, Tokyo, and Paris. I should retrieve city UUIDs for each and then search for news stories regarding the Olympics in 2025 for these cities. I’ll do this sequentially, starting with Turin, then moving through the others in order, followed by the news searches. Let's get started!\", type='summary_text')]\n",
       "Invoking tool: get_city_uuid({'city': 'Turin'})\n",
       "More reasoning required, continuing...\n",
       "Invoking tool: get_city_uuid({'city': 'Beijing'})\n",
@@ -369,25 +419,26 @@
       "More reasoning required, continuing...\n",
       "Invoking tool: get_city_uuid({'city': 'Paris'})\n",
       "More reasoning required, continuing...\n",
-      "Here are the internal IDs for the cities that have hosted the Olympics in the last 20 years:\n",
-      "\n",
-      "• Turin: 53c0e635-7a1c-478b-84ca-742a6f0df830  \n",
-      "• Beijing: 2c48757a-a1ed-48e7-897f-9edecf4909b5  \n",
-      "• Vancouver: cc8be1f1-5154-46f4-8879-451e97f771c7  \n",
-      "• London: a24addb0-4dd4-444c-a4a9-199612e0aca8  \n",
-      "• Sochi: da7386b3-2283-45cc-9244-c1e0f4121782  \n",
-      "• Rio de Janeiro: 01f60ec2-0efd-40b8-bb85-e63c2d2ddf4c  \n",
-      "• Pyeongchang: f5d3687a-0097-4551-800c-aec66c37e8db  \n",
-      "• Tokyo: 15aa0b12-7f7c-43d0-9ba3-b91250cafe48  \n",
-      "• Paris: 56d062f2-8835-4707-a826-5d68d8be9d3f  \n",
-      "\n",
-      "Of these, the only city whose ID begins with “2” is:\n",
-      "• Beijing: 2c48757a-a1ed-48e7-897f-9edecf4909b5\n"
+      "Reasoning step: []\n",
+      "Invoking tool: web_search({'query': '2025 news Olympics Turin'})\n",
+      "More reasoning required, continuing...\n",
+      "Reasoning step: [Summary(text=\"**Determining news relevance**\\n\\nThis is about the Special Olympics, not the regular Olympics, which makes things a bit tricky. The user asked for news stories about the Olympics in general, and while Special Olympics could fit, I think it’s best to focus on the traditional Olympic Games instead. Since there’s no new news on the Turin Olympics, I’ll skip that. Next stop is checking for updates on Beijing. Let's see what I find!\", type='summary_text')]\n",
+      "Invoking tool: web_search({'query': '2025 news Olympics Beijing'})\n",
+      "More reasoning required, continuing...\n",
+      "Reasoning step: []\n",
+      "Invoking tool: web_search({'query': '2025 news Olympics Vancouver'})\n",
+      "More reasoning required, continuing...\n",
+      "Reasoning step: []\n",
+      "Invoking tool: web_search({'query': '2025 news Olympics London'})\n",
+      "More reasoning required, continuing...\n",
+      "Reasoning step: []\n",
+      "Invoking tool: web_search({'query': '2025 news Olympics Sochi'})\n",
+      "More reasoning required, continuing...\n"
      ]
     }
    ],
    "source": [
-    "initial_question = \"What are the internal IDs for the cities that have hosted the Olympics in the last 20 years, and which cities have IDs beginning with the number '2'. Use your internal tools to look up the IDs?\"\n",
+    "initial_question = \"What are the internal IDs for the cities that have hosted the Olympics in the last 20 years, and which of those cities have recent news stories (in 2025) about the Olympics? Use your internal tools to look up the IDs and the web search tool to find the news stories.\"\n",
     "\n",
     "# We fetch a response and then kick off a loop to handle the response\n",
     "response = client.responses.create(\n",
@@ -435,7 +486,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 12,
    "metadata": {},
    "outputs": [
     {
@@ -443,56 +494,70 @@
      "output_type": "stream",
      "text": [
       "*******************************************************************************\n",
-      "User message: Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a prime number? Use your available tools to look up the IDs for each city.\n",
+      "User message: Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a number and a temperate climate? Use your available tools to look up the IDs for each city and make sure to search the web to find out about the climate.\n",
       "*******************************************************************************\n",
       "More reasoning required, continuing...\n",
+      "**Listing Summer Olympics cities**\n",
       "\n",
-      "Invoking tool: get_city_uuid({'city': 'Beijing'})\n",
+      "The user wants to identify the host cities of the Summer Olympics over the last 20 years. This would include 2024 in Paris, 2021 in Tokyo, 2016 in Rio, 2012 in London, and 2008 in Beijing. While considering the timeframe from 2005 to 2025, cities from 2004 should be excluded since it’s outside of the last 20 years. So, the final list is Beijing, London, Rio de Janeiro, Tokyo, and Paris. I'm checking their climates, noting that Paris and London have temperate climates, while Rio is tropical and Beijing has a monsoon-influenced climate.\n",
+      "Reasoning step: [Summary(text=\"**Listing Summer Olympics cities**\\n\\nThe user wants to identify the host cities of the Summer Olympics over the last 20 years. This would include 2024 in Paris, 2021 in Tokyo, 2016 in Rio, 2012 in London, and 2008 in Beijing. While considering the timeframe from 2005 to 2025, cities from 2004 should be excluded since it’s outside of the last 20 years. So, the final list is Beijing, London, Rio de Janeiro, Tokyo, and Paris. I'm checking their climates, noting that Paris and London have temperate climates, while Rio is tropical and Beijing has a monsoon-influenced climate.\", type='summary_text')]\n",
+      "Invoking tool: get_city_uuid({'city': 'Paris'})\n",
       "Invoking tool: get_city_uuid({'city': 'London'})\n",
       "Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})\n",
       "Invoking tool: get_city_uuid({'city': 'Tokyo'})\n",
-      "Invoking tool: get_city_uuid({'city': 'Paris'})\n",
+      "Invoking tool: get_city_uuid({'city': 'Beijing'})\n",
+      "More reasoning required, continuing...\n",
+      "\n",
+      "Reasoning step: []\n",
+      "Invoking tool: web_search({'query': 'Paris climate classification'})\n",
+      "More reasoning required, continuing...\n",
+      "\n",
+      "Reasoning step: []\n",
+      "Invoking tool: web_search({'query': 'London climate classification'})\n",
+      "More reasoning required, continuing...\n",
+      "\n",
+      "Reasoning step: []\n",
+      "Invoking tool: web_search({'query': 'Beijing climate classification'})\n",
       "More reasoning required, continuing...\n",
+      "**Evaluating climate classification**\n",
+      "\n",
+      "I’m looking into Beijing's climate and realizing it falls under Dwa, which is classified as humid continental. This means it’s not temperate, as temperate climates usually refer to C climates. So, I’m left with Paris and London as the temperate options. I check Tokyo but see it's a Cfa and doesn’t fit with my criteria. The final answer: Paris and London qualify as temperate, and I’ll include their IDs and climate types.\n",
+      "Among the last five Summer Olympic host cities (2008–2024), the only ones that\n",
+      "\n",
+      "• have Köppen “temperate” (C) climates, and  \n",
+      "• whose internal IDs begin with a digit  \n",
+      "\n",
+      "are:\n",
       "\n",
-      "Here are the UUIDs for each Summer Olympic host city since 2005, with the leading numeric prefix highlighted and assessed for primality:\n",
+      "1. Paris  \n",
+      "   – ID: 8c06d343-7532-4b7d-aefe-a434775f2bc1  \n",
+      "   – Climate: Oceanic (Köppen Cfb) – mild temperatures year-round with moderate rainfall  \n",
       "\n",
-      "• Beijing (2008): 11ab370c-2f59-4c35-b557-f845e22c847b  \n",
-      "  – Leading digits “11” → 11 is prime  \n",
-      "• London (2012): 0fdff00b-cbfb-4b82-bdd8-2107c4100319  \n",
-      "  – Leading digit “0” → 0 is not prime  \n",
-      "• Rio de Janeiro (2016): 9c2202c4-00ab-46ee-a954-a17505e32d64  \n",
-      "  – Leading digit “9” → 9 is not prime  \n",
-      "• Tokyo (2020): c4bf0281-7e84-4489-88e4-750e07211334  \n",
-      "  – No leading digit → N/A  \n",
-      "• Paris (2024): b8c4b88e-dece-435d-b398-94f0ff762c88  \n",
-      "  – No leading digit → N/A  \n",
+      "2. London  \n",
+      "   – ID: 71210de8-d578-4cbb-869e-2d63268e159e  \n",
+      "   – Climate: Temperate oceanic (Köppen Cfb) – cool, wet winters and mild, relatively dry summers  \n",
       "\n",
-      "Conclusion: Only Beijing’s ID begins with a prime number (“11”).\n",
+      "Tokyo (Cfa) is temperate but its ID begins with “f.” Beijing (Dwa) and Rio de Janeiro (Aw) are not classified as temperate.\n",
       "*******************************************************************************\n",
       "User message: Great thanks! We've just updated the IDs - could you please check again?\n",
       "*******************************************************************************\n",
       "More reasoning required, continuing...\n",
       "\n",
-      "Invoking tool: get_city_uuid({'city': 'Beijing'})\n",
-      "Invoking tool: get_city_uuid({'city': 'London'})\n",
-      "Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})\n",
-      "Invoking tool: get_city_uuid({'city': 'Tokyo'})\n",
+      "Reasoning step: []\n",
       "Invoking tool: get_city_uuid({'city': 'Paris'})\n",
-      "Here are the updated UUIDs and their leading numeric prefixes:\n",
+      "Invoking tool: get_city_uuid({'city': 'London'})\n",
+      "Here are the updated IDs and their climates:\n",
+      "\n",
+      "1. Paris  \n",
+      "   – New ID: abb950fe-1748-4d99-a341-5b6a3ed3e544  \n",
+      "   – Climate: Oceanic (Köppen Cfb) – mild temperatures year-round with moderate rainfall  \n",
       "\n",
-      "• Beijing (2008): 30b0886f-c4da-431c-8983-33e8bbb4c352  \n",
-      "  – Leading “30” → 30 is not prime  \n",
-      "• London (2012): 72ff5a9d-d147-4ba8-9a87-64e3572ba3bc  \n",
-      "  – Leading “72” → 72 is not prime  \n",
-      "• Rio de Janeiro (2016): 7a45a392-b43a-41be-8eaf-07ec44d42a2b  \n",
-      "  – Leading “7” → 7 is prime  \n",
-      "• Tokyo (2020): f725244f-079f-44e1-a91c-5c31c270c209  \n",
-      "  – Leading “f” → no numeric prefix  \n",
-      "• Paris (2024): b0230ad4-bc35-48be-a198-65a9aaf28fb5  \n",
-      "  – Leading “b” → no numeric prefix  \n",
+      "2. London  \n",
+      "   – New ID: dc73ca06-ba2d-4561-965e-5bec3d8b4d37  \n",
+      "   – Climate: Temperate oceanic (Köppen Cfb) – cool, wet winters and mild summers  \n",
       "\n",
-      "Conclusion: After the update, only Rio de Janeiro’s ID begins with a prime number (“7”).\n",
-      "Total tokens used: 9734 (4.87% of o4-mini's context window)\n"
+      "Both remain temperate (Cfb) and their IDs still begin with a digit.\n",
+      "Total tokens used: 13217 (6.61% of o4-mini's context window)\n"
      ]
     }
    ],
@@ -500,7 +565,7 @@
     "# Let's initialise our conversation with the first user message\n",
     "total_tokens_used = 0\n",
     "user_messages = [\n",
-    "    \"Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a prime number? Use your available tools to look up the IDs for each city.\",\n",
+    "    \"Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a number and a temperate climate? Use your available tools to look up the IDs for each city and make sure to search the web to find out about the climate.\",\n",
     "    \"Great thanks! We've just updated the IDs - could you please check again?\"\n",
     "    ]\n",
     "\n",
@@ -545,7 +610,7 @@
    "metadata": {},
    "source": [
     "## Summary\n",
-    "In this cookbook, we identified how to combine function calling with OpenAI's reasoning models to demonstrate multi-step tasks that are dependent on external data sources. \n",
+    "In this cookbook, we identified how to combine function calling with OpenAI's reasoning models to demonstrate multi-step tasks that are dependent on external data sources., including searching the web.\n",
     "\n",
     "Importantly, we covered reasoning-model specific nuances in the function calling process, specifically that:\n",
     "* The model may choose to make multiple function calls or reasoning steps in series, and some steps may depend on the results of previous ones\n",
@@ -565,7 +630,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": ".venv",
+   "display_name": "openai",
    "language": "python",
    "name": "python3"
   },