Skip to content

wip #1841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 14, 2025
Merged

wip #1841

merged 4 commits into from
May 14, 2025

Conversation

willhath-openai
Copy link
Contributor

Summary

Adds example scripts for creating evals from stored responses.

"cell_type": "markdown",
"metadata": {},
"source": [
"Now, lets go to the dashboard to see how we did!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this eval is fairly repeatable, so ideally there can be a "takeaway" here, like:
"4.1-mini" does about as well as our original at less cost, etc.

"outputs": [],
"source": [
"grader_system_prompt = \"\"\"\n",
"We've created a consumer-facing Evals product to help AI integrators quickly and clearly understand their models' real-world performance. Your role is to serve as a Universal Evaluator, automatically grading responses to measure how well each model output addresses user needs and expectations.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more use-case specific grader we could use here?
The data set is super cool, so I feel like it might be fun to have a specific grader here, rather than a generic one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was gonna refine it once I got it working

@willhath-openai willhath-openai merged commit 4596343 into main May 14, 2025
1 check passed
@willhath-openai willhath-openai deleted the dev/willhath/responses+evals branch May 14, 2025 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants