Skip to content

Minor updates and images addition to cookbook #1882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,15 @@
"source": [
"## Project Lifecycle\n",
"\n",
"Not every project will proceed in the same way, but projects generally have some \n",
"important components in common.\n",
"\n",
"![Project Lifecycle](../../../images/partner_project_lifecycle.png)\n",
"\n",
"The solid arrows show the primary progressions or steps, while the dotted line \n",
"represents the ongoing nature of problem understanding - uncovering more about\n",
"the customer domain will influence every step of the process. We wil examine \n",
"several of these iterative cycles of refinement in detail below. \n",
"Not every project will proceed in the same way, but projects generally have some common\n",
"important components.\n",
"\n",
Expand All @@ -133,6 +142,11 @@
"It's very rare that a real-world project will start with all the data necessary to get\n",
"to a satisfactory solution, much less to establish confidence.\n",
"\n",
"In our case, we're going to assume that we have a decent sample of system *inputs*, \n",
"in the form of but receipt images, but start without any fully annotated data. We find \n",
"this is a not-unusual situation when automating an existing process. Instead, \n",
"we'll walk through the process of building that out as we go along by collaborating with\n",
"domain experts, and make our evals progressively more comprehensive.\n",
"In our case, we're going to assume that we have a decent sample of system *inputs*\n",
"(here, photographs of receipts), but start without any fully annotated data. We'll walk\n",
"through the process of incrementally expanding our test and training sets as we go along\n",
Expand Down Expand Up @@ -498,6 +512,21 @@
"### Action Decision\n",
"\n",
"Next, we need to close the loop and get to an actual decision based on receipts. This\n",
"looks pretty similar, so we'll present the code without comment.\n",
"\n",
"Ordinarily one would start with the most capable model - `o3`, at this time - for a \n",
"first pass, and then once correctness is established experiment with different models\n",
"to analyze any tradeoffs for their business impact, and potentially consider whether \n",
"they are remediable with iteration. A client may be willing to take a certain accuracy \n",
"hit for lower latency or cost, or it may be more effective to change the architecture\n",
"to hit cost, latency, and accuracy goals. We'll get into how to make these tradeoffs\n",
"explicitly and objectively later on. \n",
"\n",
"For this cookbook, `o3` might be too good. We'll use `o4-mini` for our first pass, so \n",
"that we get a few reasoning errors we can use to illustrate the means of addressing\n",
"them when they occur.\n",
"\n",
"Next, we need to close the loop and get to an actual decision based on receipts. This\n",
"looks pretty similar, so we'll present the code without comment."
]
},
Expand Down Expand Up @@ -887,6 +916,10 @@
"metadata": {},
"source": [
"After you run that eval you'll be able to view it in the UI, and should see something\n",
"like the below. \n",
"\n",
"(Note, if you have a Zero-Data-Retention agreement, this data is not stored\n",
"by OpenAI, so will not be available in this interface.)\n",
"like:\n",
"\n",
"![Summary UI](../../../images/partner_summary_ui.png)\n",
Expand Down Expand Up @@ -1617,6 +1650,7 @@
"ARE NOT TRAVEL-RELATED, THEN IT MUST BE AUDITED.\n",
"```\n",
"\n",
"4. We added three examples, JSON input/output pairs wrapped in XML tags.\n",
"3. We added three examples, JSON input/output pairs wrapped in XML tags.\n",
"\n",
"With our prompt revisions, we'll regenerate the data to evaluate and re-run the same\n",
Expand Down
Binary file modified images/partner_development_flywheel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/partner_model_improvement_waterfall.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/partner_process_flowchart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/partner_project_lifecycle.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,13 @@
date: 2025-06-01
authors:
- shikhar-cyber
- moredatarequired
- tooluser
- eddiesiegel
tags:
- evals
- API Flywheel
- completions
- responses
- functions
- tracing
Expand Down