Skip to content

Commit 9825f03

Browse files
authored
Small updates to rft healthbench (#1858)
1 parent a140048 commit 9825f03

File tree

2 files changed

+5
-1
lines changed

2 files changed

+5
-1
lines changed

examples/fine-tuned_qa/reinforcement_finetuning_healthbench.ipynb

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,12 @@
77
"source": [
88
"# Reinforcement Fine-Tuning with the OpenAI API for Conversational Reasoning\n",
99
"\n",
10+
"*This guide is for developers and ML practitioners who have some experience with OpenAIʼs APIs and wish to use their fine-tuned models for research or other appropriate uses. OpenAI’s services are not intended for the personalized treatment or diagnosis of any medical condition and are subject to our [applicable terms](https://openai.com/policies/).*\n",
11+
"\n",
1012
"This notebook demonstrates how to use OpenAI's reinforcement fine-tuning (RFT) to improve a model's conversational reasoning capabilities (specifically asking questions to gain additional context and reduce uncertainty). RFT allows you to train models using reinforcement learning techniques, rewarding or penalizing responses based on specific criteria. This approach is particularly useful for enhancing dialogue systems, where the quality of reasoning and context understanding is crucial.\n",
1113
"\n",
14+
"For a deep dive into the Reinforcement Fine-Tuning API and how to write effective graders, see [Exploring Model Graders for Reinforcement Fine-Tuning](https://cookbook.openai.com/examples/reinforcement_fine_tuning).\n",
15+
"\n",
1216
"### HealthBench\n",
1317
"\n",
1418
"This cookbook evaluates and improves model performance on a focused subset of [HealthBench](https://openai.com/index/healthbench/), a benchmark suite for medical QA. This guide walks through how to configure the datasets, define evaluation rubrics, and fine-tune model behavior using reinforcement signals derived from custom graders.\n",

registry.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
- fine-tuning
2626
- reinforcement-learning-graders
2727

28-
- title: Reinforcement Fine-tuning with the OpenAI API
28+
- title: Reinforcement Fine-Tuning for Conversational Reasoning with the OpenAI API
2929
path: examples/fine-tuned_qa/reinforcement_finetuning_healthbench.ipynb
3030
date: 2025-05-21
3131
authors:

0 commit comments

Comments
 (0)