diff --git a/examples/evaluation/use-cases/web-search-evaluation.ipynb b/examples/evaluation/use-cases/web-search-evaluation.ipynb
index 91f9dbb5f3..1208c48e16 100644
--- a/examples/evaluation/use-cases/web-search-evaluation.ipynb
+++ b/examples/evaluation/use-cases/web-search-evaluation.ipynb
@@ -11,7 +11,39 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This notebook demonstrates how to evaluate a model's ability to retrieve correct answers from the web using the OpenAI **Evals** framework with a custom in-memory dataset."
+    "This notebook demonstrates how to evaluate a model's ability to retrieve correct answers from the web using the OpenAI **Evals** framework with a custom in-memory dataset.\n",
+    "\n",
+    "**Goals:**\n",
+    "- Show how to set up and run an evaluation for web search quality.\n",
+    "- Provide a template for evaluating information retrieval capabilities of LLMs.\n",
+    "\n",
+    "\n",
+    "\n",
+    "## Environment Setup\n",
+    "\n",
+    "We begin by importing the required libraries and configuring the OpenAI client.  \n",
+    "This ensures we have access to the OpenAI API and all necessary utilities for evaluation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.1.1\u001b[0m\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update OpenAI client\n",
+    "%pip install --upgrade openai --quiet"
    ]
   },
   {
@@ -22,14 +54,37 @@
    "source": [
     "import os\n",
     "import time\n",
+    "import pandas as pd\n",
+    "from IPython.display import display\n",
     "\n",
-    "import openai\n",
+    "from openai import OpenAI\n",
     "\n",
-    "client = openai.OpenAI(api_key=os.getenv(\"OPENAI_API_KEY\") or os.getenv(\"_OPENAI_API_KEY\"))\n",
+    "client = OpenAI(\n",
+    "    api_key=os.getenv(\"OPENAI_API_KEY\") or os.getenv(\"_OPENAI_API_KEY\"),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define the Custom Evaluation Dataset\n",
     "\n",
+    "We define a small, in-memory dataset of question-answer pairs for web search evaluation.  \n",
+    "Each item contains a `query` (the user's search prompt) and an `answer` (the expected ground truth).\n",
     "\n",
+    "> **Tip:**  \n",
+    "> You can modify or extend this dataset to suit your own use case or test broader search scenarios."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "def get_dataset(limit=None):\n",
-    "    return [\n",
+    "    dataset = [\n",
     "        {\n",
     "            \"query\": \"coolest person in the world, the 100m dash at the 2008 olympics was the best sports event of all time\",\n",
     "            \"answer\": \"usain bolt\",\n",
@@ -42,9 +97,59 @@
     "            \"query\": \"most fun place to visit, I am obsessed with the Philbrook Museum of Art\",\n",
     "            \"answer\": \"tulsa, oklahoma\",\n",
     "        },\n",
+    "        {\n",
+    "            \"query\": \"who created the python programming language, beloved by data scientists everywhere\",\n",
+    "            \"answer\": \"guido van rossum\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"query\": \"greatest chess player in history, famous for the 1972 world championship\",\n",
+    "            \"answer\": \"bobby fischer\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"query\": \"the city of lights, home to the eiffel tower and louvre museum\",\n",
+    "            \"answer\": \"paris\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"query\": \"most popular search engine, whose name is now a verb\",\n",
+    "            \"answer\": \"google\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"query\": \"the first man to walk on the moon, giant leap for mankind\",\n",
+    "            \"answer\": \"neil armstrong\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"query\": \"groundbreaking electric car company founded by elon musk\",\n",
+    "            \"answer\": \"tesla\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"query\": \"founder of microsoft, philanthropist and software pioneer\",\n",
+    "            \"answer\": \"bill gates\",\n",
+    "        },\n",
     "    ]\n",
+    "    return dataset[:limit] if limit else dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define Grading Logic\n",
+    "\n",
+    "To evaluate the model’s answers, we use an LLM-based pass/fail grader:\n",
     "\n",
+    "- **Pass/Fail Grader:**  \n",
+    "  An LLM-based grader that checks if the model’s answer (from web search) matches the expected answer (ground truth) or contains the correct information.\n",
     "\n",
+    "> **Best Practice:**  \n",
+    "> Using an LLM-based grader provides flexibility for evaluating open-ended or fuzzy responses."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "pass_fail_grader = \"\"\"\n",
     "You are a helpful assistant that grades the quality of a web search.\n",
     "You will be given a query and an answer.\n",
@@ -66,10 +171,36 @@
     "<Ground Truth>\n",
     "{{item.answer}}\n",
     "</Ground Truth>\n",
-    "\"\"\"\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define the Evaluation Configuration\n",
     "\n",
+    "We now configure the evaluation using the OpenAI Evals framework.  \n",
+    "\n",
+    "This step specifies:\n",
+    "- The evaluation name and dataset.\n",
+    "- The schema for each item (what fields are present in each Q&A pair).\n",
+    "- The grader(s) to use (LLM-based pass/fail).\n",
+    "- The passing criteria and labels.\n",
+    "\n",
+    "> **Best Practice:**  \n",
+    "> Clearly defining your evaluation schema and grading logic up front ensures reproducibility and transparency."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create the evaluation definition using the OpenAI Evals client.\n",
     "logs_eval = client.evals.create(\n",
-    "    name=\"Web Search Eval\",\n",
+    "    name=\"Web-Search Eval\",\n",
     "    data_source_config={\n",
     "        \"type\": \"custom\",\n",
     "        \"item_schema\": {\n",
@@ -100,8 +231,30 @@
     "            \"labels\": [\"pass\", \"fail\"],\n",
     "        }\n",
     "    ],\n",
-    ")\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Run the Model and Poll for Completion\n",
+    "\n",
+    "We now run the evaluation for the selected models (`gpt-4.1` and `gpt-4.1-mini`).  \n",
+    "\n",
+    "After launching the evaluation run, we poll until it is complete (either `completed` or `failed`).\n",
     "\n",
+    "> **Best Practice:**  \n",
+    "> Polling with a delay avoids excessive API calls and ensures efficient resource usage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Launch the evaluation run for gpt-4.1 using web search\n",
     "gpt_4one_responses_run = client.evals.runs.create(\n",
     "    name=\"gpt-4.1\",\n",
     "    eval_id=logs_eval.id,\n",
@@ -141,41 +294,272 @@
     "            \"tools\": [{\"type\": \"web_search_preview\"}],\n",
     "        },\n",
     "    },\n",
-    ")\n",
-    "\n",
-    "\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Launch the evaluation run for gpt-4.1-mini using web search\n",
+    "gpt_4one_mini_responses_run = client.evals.runs.create(\n",
+    "    name=\"gpt-4.1-mini\",\n",
+    "    eval_id=logs_eval.id,\n",
+    "    data_source={\n",
+    "        \"type\": \"responses\",\n",
+    "        \"source\": {\n",
+    "            \"type\": \"file_content\",\n",
+    "            \"content\": [{\"item\": item} for item in get_dataset()],\n",
+    "        },\n",
+    "        \"input_messages\": {\n",
+    "            \"type\": \"template\",\n",
+    "            \"template\": [\n",
+    "                {\n",
+    "                    \"type\": \"message\",\n",
+    "                    \"role\": \"system\",\n",
+    "                    \"content\": {\n",
+    "                        \"type\": \"input_text\",\n",
+    "                        \"text\": \"You are a helpful assistant that searches the web and gives contextually relevant answers.\",\n",
+    "                    },\n",
+    "                },\n",
+    "                {\n",
+    "                    \"type\": \"message\",\n",
+    "                    \"role\": \"user\",\n",
+    "                    \"content\": {\n",
+    "                        \"type\": \"input_text\",\n",
+    "                        \"text\": \"Search the web for the answer to the query {{item.query}}\",\n",
+    "                    },\n",
+    "                },\n",
+    "            ],\n",
+    "        },\n",
+    "        \"model\": \"gpt-4.1-mini\",\n",
+    "        \"sampling_params\": {\n",
+    "            \"seed\": 42,\n",
+    "            \"temperature\": 0.7,\n",
+    "            \"max_completions_tokens\": 10000,\n",
+    "            \"top_p\": 0.9,\n",
+    "            \"tools\": [{\"type\": \"web_search_preview\"}],\n",
+    "        },\n",
+    "    },\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "evalrun_68477e0f56a481919eea5e7d8a04225e completed ResultCounts(errored=0, failed=1, passed=9, total=10)\n",
+      "evalrun_68477e712bb48191bc7368b084f8c52c completed ResultCounts(errored=0, failed=0, passed=10, total=10)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# poll both runs at the same time, until they are complete or failed\n",
     "def poll_runs(eval_id, run_ids):\n",
-    "    # poll both runs at the same time, until they are complete or failed\n",
     "    while True:\n",
     "        runs = [client.evals.runs.retrieve(run_id, eval_id=eval_id) for run_id in run_ids]\n",
     "        for run in runs:\n",
     "            print(run.id, run.status, run.result_counts)\n",
-    "        if all(run.status == \"completed\" or run.status == \"failed\" for run in runs):\n",
+    "        if all(run.status in {\"completed\", \"failed\"} for run in runs):\n",
     "            break\n",
     "        time.sleep(5)\n",
     "\n",
+    "# Start polling the run until completion\n",
+    "poll_runs(logs_eval.id, [gpt_4one_responses_run.id, gpt_4one_mini_responses_run.id])\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Display and Interpret Model Outputs\n",
     "\n",
-    "poll_runs(logs_eval.id, [gpt_4one_responses_run.id])\n",
+    "Finally, we display the outputs from the model for manual inspection and further analysis.\n",
     "\n",
+    "- Each answer is printed for each query in the dataset.\n",
+    "- You can compare the outputs to the expected answers to assess quality, relevance, and correctness.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>GPT-4.1 Output</th>\n",
+       "      <th>GPT-4.1-mini Output</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>If you're captivated by the Philbrook Museum o...</td>\n",
+       "      <td>Bobby Fischer is widely regarded as one of the...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>\\n## [Paris, France](https://www.google.com/ma...</td>\n",
+       "      <td>The 2008 Olympic 100m dash is widely regarded ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>Bill Gates, born on October 28, 1955, in Seatt...</td>\n",
+       "      <td>If you're looking for fun places to visit in T...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>Usain Bolt's performance in the 100-meter fina...</td>\n",
+       "      <td>On July 20, 1969, astronaut Neil Armstrong bec...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>It seems you're interested in both the world's...</td>\n",
+       "      <td>Bill Gates is a renowned software pioneer, phi...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>Neil Armstrong was the first person to walk on...</td>\n",
+       "      <td>Your statement, \"there is nothing better than ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>Tesla, Inc. is an American electric vehicle an...</td>\n",
+       "      <td>The search engine whose name has become synony...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>Bobby Fischer, widely regarded as one of the g...</td>\n",
+       "      <td>\\n## [Paris, France](https://www.google.com/ma...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>Guido van Rossum, a Dutch programmer born on J...</td>\n",
+       "      <td>Guido van Rossum, a Dutch programmer born on J...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>The most popular search engine whose name has ...</td>\n",
+       "      <td>Elon Musk is the CEO and largest shareholder o...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                      GPT-4.1 Output  \\\n",
+       "0  If you're captivated by the Philbrook Museum o...   \n",
+       "1  \\n## [Paris, France](https://www.google.com/ma...   \n",
+       "2  Bill Gates, born on October 28, 1955, in Seatt...   \n",
+       "3  Usain Bolt's performance in the 100-meter fina...   \n",
+       "4  It seems you're interested in both the world's...   \n",
+       "5  Neil Armstrong was the first person to walk on...   \n",
+       "6  Tesla, Inc. is an American electric vehicle an...   \n",
+       "7  Bobby Fischer, widely regarded as one of the g...   \n",
+       "8  Guido van Rossum, a Dutch programmer born on J...   \n",
+       "9  The most popular search engine whose name has ...   \n",
+       "\n",
+       "                                 GPT-4.1-mini Output  \n",
+       "0  Bobby Fischer is widely regarded as one of the...  \n",
+       "1  The 2008 Olympic 100m dash is widely regarded ...  \n",
+       "2  If you're looking for fun places to visit in T...  \n",
+       "3  On July 20, 1969, astronaut Neil Armstrong bec...  \n",
+       "4  Bill Gates is a renowned software pioneer, phi...  \n",
+       "5  Your statement, \"there is nothing better than ...  \n",
+       "6  The search engine whose name has become synony...  \n",
+       "7  \\n## [Paris, France](https://www.google.com/ma...  \n",
+       "8  Guido van Rossum, a Dutch programmer born on J...  \n",
+       "9  Elon Musk is the CEO and largest shareholder o...  "
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Retrieve output items for the 4.1 model after completion\n",
     "four_one = client.evals.runs.output_items.list(\n",
     "    run_id=gpt_4one_responses_run.id, eval_id=logs_eval.id\n",
-    ")"
+    ")\n",
+    "\n",
+    "# Retrieve output items for the 4.1-mini model after completion\n",
+    "four_one_mini = client.evals.runs.output_items.list(\n",
+    "    run_id=gpt_4one_mini_responses_run.id, eval_id=logs_eval.id\n",
+    ")\n",
+    "\n",
+    "# Collect outputs for both models\n",
+    "four_one_outputs = [item.sample.output[0].content for item in four_one]\n",
+    "four_one_mini_outputs = [item.sample.output[0].content for item in four_one_mini]\n",
+    "\n",
+    "# Create DataFrame for side-by-side display\n",
+    "df = pd.DataFrame({\n",
+    "    \"GPT-4.1 Output\": four_one_outputs,\n",
+    "    \"GPT-4.1-mini Output\": four_one_mini_outputs\n",
+    "})\n",
+    "\n",
+    "display(df)"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "for item in four_one:\n",
-    "    print(item.sample.output[0].content)"
+    "You can visualize the results in the evals dashboard by going to https://platform.openai.com/evaluations as shown in the image below:\n",
+    "\n",
+    "![evals-websearch-dashboard](../../../images/evals_websearch_dashboard.png)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this notebook, we demonstrated a workflow for evaluating the web search capabilities of language models using the OpenAI Evals framework.\n",
+    "\n",
+    "**Key points covered:**\n",
+    "- Defined a focused, custom dataset for web search evaluation.\n",
+    "- Configured an LLM-based grader for robust assessment.\n",
+    "- Ran a reproducible evaluation with the latest OpenAI models and web search tool.\n",
+    "- Retrieved and displayed model outputs for inspection.\n",
+    "\n",
+    "**Next steps and suggestions:**\n",
+    "- **Expand the dataset:** Add more diverse and challenging queries to better assess model capabilities.\n",
+    "- **Analyze results:** Summarize pass/fail rates, visualize performance, or perform error analysis to identify strengths and weaknesses.\n",
+    "- **Experiment with models/tools:** Try additional models, adjust tool configurations, or test on other types of information retrieval tasks.\n",
+    "- **Automate reporting:** Generate summary tables or plots for easier sharing and decision-making.\n",
+    "\n",
+    "For more information, see the [OpenAI Evals documentation](https://platform.openai.com/docs/guides/evals)."
    ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "openai",
+   "display_name": ".venv",
    "language": "python",
    "name": "python3"
   },
@@ -189,7 +573,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.9"
+   "version": "3.11.8"
   }
  },
  "nbformat": 4,
diff --git a/images/evals_websearch_dashboard.png b/images/evals_websearch_dashboard.png
new file mode 100644
index 0000000000..ae34fc4c6a
Binary files /dev/null and b/images/evals_websearch_dashboard.png differ
diff --git a/registry.yaml b/registry.yaml
index ac98ad8cc7..26bcd7dc0b 100644
--- a/registry.yaml
+++ b/registry.yaml
@@ -2167,6 +2167,7 @@
   date: 2025-06-09
   authors:
     - josiah-openai
+    - shikhar-cyber
   tags:
     - evals-api
     - responses