[Realtime API] gpt-realtime-translate emits transcript.delta containing only U+FFFD for multi-byte UTF-8 input

## Summary
When using Azure OpenAI Realtime API with the `gpt-realtime-translate` deployment, individual
`session.input_transcript.delta` and `session.output_transcript.delta` events frequently contain
only U+FFFD (Replacement Character) in the `delta` field when the source language is Japanese
(likely affects any multi-byte UTF-8 language).

The final transcript delivered via the `*.completed` / `*.done` events is correct, so this is
purely a streaming-delta issue, but it makes realtime subtitle UI unusable for multi-byte
languages.

## Environment
- Service: Azure OpenAI on Azure AI Foundry (endpoint host: `*.services.ai.azure.com`, eastus2)
- Model deployment: `gpt-realtime-translate` (model: `gpt-realtime-translate-2026-05-06`)
- Client: macOS, Swift / SwiftUI, `URLSessionWebSocketTask` (`.string` messages)
- Source language: Japanese (`ja`), Target language: `en` / `vi` (both reproduce)

## Reproduction
1. Open a WebSocket session with a `gpt-realtime-translate` deployment.
2. Send `session.update` with:
   ```json
   {
     "type": "session.update",
     "session": {
       "audio": {
         "input": { "transcription": { "model": "gpt-realtime-whisper" } },
         "output": { "language": "en" }
       }
     }
   }
   ```
3. Stream Japanese speech (PCM16 24kHz, base64) via `input_audio_buffer.append`.
4. Observe `session.input_transcript.delta` and `session.output_transcript.delta` events.

## Expected
Each `delta` field should contain either a complete grapheme cluster or at least valid UTF-8,
so that concatenating consecutive deltas reproduces the same string returned by the eventual
`*.completed.transcript`.

## Actual
Many delta events contain exactly `"delta": "�"` (a single Replacement Character).
Concatenating consecutive deltas produces strings with multiple `�` characters that cannot
be recovered on the client side.

Sample raw WebSocket payloads captured at the moment URLSession delivers `.string` to the
application (113 bytes total, `delta` field is a single U+FFFD):

```
{ ... ,"item_id":"...yc0oOJQYT","delta":"�","elapsed_ms":11000, ... }
{ ... ,"item_id":"...8UwU24ILJ","delta":"�","elapsed_ms":11000, ... }
```

The replacement character is already present in the JSON returned by the server; the client
has performed no transformation at the point of detection.

## Impact
- Realtime subtitle UI displays `�` during speech.
- Streaming display is effectively unusable for Japanese / Chinese / Korean / any multi-byte
  language.
- Final transcripts (`conversation.item.input_audio_transcription.completed`,
  `session.output_transcript.done`) are correct.

## Workaround (current)
- Discard `delta` events that contain U+FFFD and rely on `*.completed` / `*.done` for display.
- This defeats the purpose of streaming.

## Request
Please consider one of the following:
1. Ensure server-emitted `delta` strings are split only at valid UTF-8 (preferably grapheme
   cluster) boundaries.
2. Provide a `session.update` option to receive `delta` as base64-encoded raw bytes (e.g.
   `delta_b64`) so clients can concatenate and decode them safely with an incremental
   UTF-8 decoder.
3. Alternatively, document that partial transcript deltas are not guaranteed for multi-byte
   languages on Azure and recommend `*.completed` for display.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Realtime API] gpt-realtime-translate emits transcript.delta containing only U+FFFD for multi-byte UTF-8 input #43806

Summary

Environment

Reproduction

Expected

Actual

Impact

Workaround (current)

Request

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Realtime API] gpt-realtime-translate emits transcript.delta containing only U+FFFD for multi-byte UTF-8 input #43806

Description

Summary

Environment

Reproduction

Expected

Actual

Impact

Workaround (current)

Request

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions