Skip to content

Commit 54659c5

Browse files
authored
Minh/speech transcription tutorial (openai#1807)
1 parent 80251ba commit 54659c5

15 files changed

+731
-6
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,7 @@ examples/fine-tuned_qa/local_cache/*
140140

141141
# PyCharm files
142142
.idea/
143+
.cursorignore
143144

144145
# VS Code files
145146
.vscode/

authors.yaml

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,11 @@
33
# You can optionally customize how your information shows up cookbook.openai.com over here.
44
# If your information is not present here, it will be pulled from your GitHub profile.
55

6+
minh-hoque:
7+
name: "Minhajul Hoque"
8+
website: "https://www.linkedin.com/in/minhajul-hoque-83242b163/"
9+
avatar: "https://avatars.githubusercontent.com/u/84698472?v=4"
10+
611
shikhar-cyber:
712
name: "Shikhar Kwatra"
813
website: "https://www.linkedin.com/in/shikharkwatra/"
@@ -126,13 +131,13 @@ aaronwilkowitz-openai:
126131
charuj:
127132
name: "Charu Jaiswal"
128133
website: "https://www.linkedin.com/in/charu-j-8a866471"
129-
avatar: "https://avatars.githubusercontent.com/u/18404643?v=4"
134+
avatar: "https://avatars.githubusercontent.com/u/18404643?v=4"
130135

131136
rupert-openai:
132137
name: "Rupert Truman"
133138
website: "https://www.linkedin.com/in/rupert-truman/"
134139
avatar: "https://avatars.githubusercontent.com/u/171234447"
135-
140+
136141
keelan-openai:
137142
name: "Keelan Schule"
138143
website: "https://www.linkedin.com/in/keelanschule/"
@@ -171,8 +176,8 @@ evanweiss-openai:
171176
girishd:
172177
name: "Girish Dusane"
173178
website: "https://www.linkedin.com/in/girishdusane/"
174-
avatar: "https://avatars.githubusercontent.com/u/272708"
175-
179+
avatar: "https://avatars.githubusercontent.com/u/272708"
180+
176181
lxing-oai:
177182
name: "Luke Xing"
178183
website: "https://www.linkedin.com/in/lukexing/"
@@ -227,7 +232,7 @@ erickgort:
227232
name: "Erick Gort"
228233
website: "https://www.linkedin.com/in/erick-gort-32ab1678/"
229234
avatar: "https://avatars.githubusercontent.com/u/189261906?v=4"
230-
235+
231236
kylecote-tray:
232237
name: "Kyle Cote"
233238
website: "https://github.com/kylecote-tray"
@@ -297,7 +302,7 @@ rzhao-openai:
297302
name: "Randy Zhao"
298303
website: "https://www.linkedin.com/in/randy-zhao-27433616b"
299304
avatar: "https://avatars.githubusercontent.com/u/208724779?v=4"
300-
305+
301306
brandonbaker-openai:
302307
name: "Brandon Baker"
303308
website: "https://www.linkedin.com/in/brandonbaker18"

examples/Speech_transcription_methods.ipynb

Lines changed: 672 additions & 0 deletions
Large diffs are not rendered by default.
Binary file not shown.
Binary file not shown.
Binary file not shown.
15.2 KB
Loading
40.8 KB
Loading
16.7 KB
Loading
22.9 KB
Loading
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
```{mermaid}
2+
graph LR
3+
Mic -- "PCM frames" --> VP["VoicePipeline"]
4+
VP -- "VAD & resample" --> Buf["Sentence buffer"]
5+
Buf --> GPT["gpt-4o-transcribe"]
6+
GPT --> Agent["Agent callbacks"]
7+
Agent -- "print / reply" --> App
8+
```
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
```mermaid
2+
sequenceDiagram
3+
participant Mic
4+
participant App
5+
participant WS as "WebSocket"
6+
participant OAI as "Realtime Server"
7+
8+
Mic ->> App: 20–40 ms PCM frames
9+
App ->> WS: Base64-encoded chunks<br/>input_audio_buffer.append
10+
WS ->> OAI: Audio stream
11+
OAI -->> WS: JSON transcription events<br/>(partial & complete)
12+
WS -->> App: Transcript updates
13+
```
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
```mermaid
2+
flowchart LR
3+
AudioFile["Audio file<br/>(WAV • MP3 • FLAC)"] --> Upload["Binary upload"]
4+
Upload --> API["/v1/audio/transcriptions"]
5+
API --> JSONOutput["JSON transcription<br/>+ metadata"]
6+
JSONOutput --> App["Your application"]
7+
```
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
```mermaid
2+
flowchart LR
3+
A["Finished audio file<br/>(WAV • MP3 • FLAC • …)"]
4+
B["OpenAI STT engine<br/>(gpt-4o-transcribe)"]
5+
C["Your application / UI"]
6+
7+
A -->|HTTP POST<br/>/v1/audio/transcriptions<br/>stream=true| B
8+
B -->|chunked HTTP response<br/>partial & final transcripts| C
9+
```

registry.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,16 @@
44
# should build pages for, and indicates metadata such as tags, creation date and
55
# authors for each page.
66

7+
- title: Comparing Speech-to-Text Methods with the OpenAI API
8+
path: examples/Speech_transcription_methods.ipynb
9+
date: 2025-04-29
10+
authors:
11+
- minh-hoque
12+
tags:
13+
- audio
14+
- speech
15+
- agents-sdk
16+
717
- title: Practial Guide for Model Selection for Real‑World Use Cases
818
path: examples/partners/model_selection_guide/model_selection_guide.ipynb
919
date: 2025-05-05

0 commit comments

Comments
 (0)