-
Notifications
You must be signed in to change notification settings - Fork 330
Description
Summary
When using the Agents Realtime SDK with Azure OpenAI Realtime (WebRTC), the SDK emits OpenAI-specific session fields that Azure rejects (e.g., session.type
, output_modalities
). Also, with WebRTC the session is created during SDP, so initial agent config (instructions/tools) is ignored until the data channel opens; subsequent SDK-driven session.update
calls include fields Azure doesn’t accept, preventing config from applying cleanly.
Notes
I recognize these incompatibilities stem primarily from Azure OpenAI’s Realtime implementation differing from OpenAI’s GA interface (for example, rejecting session.type
, output_modalities
, and disallowing ["audio"]
-only modalities
). Still, I wanted to open this issue so the community has a clear record of the symptoms, error messages, and a practical workaround. Even if the “fix” ultimately belongs on the Azure side (which btw I have no idea where to report), a provider-aware path or extensibility hook in the SDK would help developers avoid common pitfalls and ship faster across providers.
Environment
- @openai/agents-realtime: 0.1.0
- @openai/agents: 0.1.0
- openai: 5.19.1
- Next.js: 15.5.2
- React: 19.1.1
- Node.js: 20.x
- Client: Chrome (WebRTC, ephemeral client key)
- Provider: Azure OpenAI Realtime (region WebRTC endpoint)
- gpt-realtime
Repro steps
- Browser, WebRTC:
import { RealtimeAgent, RealtimeSession } from "@openai/agents-realtime"; const agent = new RealtimeAgent({ name: "Assistant", instructions: "..." }); const session = new RealtimeSession(agent, { model: "gpt-realtime", // baseUrl points to Azure WebRTC endpoint // e.g. https://<region>.realtimeapi-preview.ai.azure.com/v1/realtimertc }); await session.connect({ apiKey: "<ephemeral>", url: "https://<region>.realtimeapi-preview.ai.azure.com/v1/realtimertc", });
- Observe transport events (
session.created
, thensession.update
).
Observed behavior
session.created
shows Azure defaults (expected for WebRTC).- SDK sends
session.update
with OpenAI-only fields:- Azure error: Unknown parameter:
session.type
. - Azure error: Unknown parameter:
session.output_modalities
.
- Azure error: Unknown parameter:
- Attempting to set
modalities: ["audio"]
:- Azure error: Invalid modalities. Supported:
["text"]
or["audio","text"]
.
- Azure error: Invalid modalities. Supported:
- Agent instructions/tools often don’t take effect because the follow-up
session.update
is rejected due to unsupported fields.
Example error events (from logs):
Unknown parameter: 'session.type'
Unknown parameter: 'session.output_modalities'
Invalid modalities: ['audio']
Expected behavior
- SDK should allow provider-aware
session.update
payloads and avoid sending fields the provider does not support. - Post-connect updates should apply minimal, Azure-accepted changes (e.g.,
instructions
,tools
,tool_choice
,voice
,model
) without error.
Workarounds that work
- Custom WebRTC transport that strips OpenAI-only fields and allowlists Azure-accepted keys for
session.update
. Also avoid sendingsession.type
in tracing updates.import { OpenAIRealtimeWebRTC } from "@openai/agents-realtime"; class AzureRealtimeWebRTC extends OpenAIRealtimeWebRTC { protected _updateTracingConfig(tracing: unknown) { this.sendEvent({ type: "session.update", session: { tracing: tracing === "auto" ? "auto" : tracing }, }); } updateSessionConfig(config: Partial<Record<string, unknown>>) { const merged = this._getMergedSessionConfig(config) as Record<string, unknown>; const session: Record<string, unknown> = {}; if (typeof merged.model === "string") session.model = merged.model; if (typeof merged.instructions === "string") session.instructions = merged.instructions; if (Array.isArray(merged.tools)) session.tools = merged.tools; if (typeof merged.tool_choice !== "undefined") session.tool_choice = merged.tool_choice; if (typeof merged.voice === "string") session.voice = merged.voice; this.sendEvent({ type: "session.update", session }); // no type/output_modalities/audio/tracing } async connect(options: any) { // Ensure Azure SDP URL includes ?model=<deployment> const baseUrl = options?.url; const model = options?.model; let urlToUse = baseUrl; if (baseUrl && model) { try { const u = new URL(baseUrl); if (!u.searchParams.has("model")) u.searchParams.set("model", model); urlToUse = u.toString(); } catch { /* keep baseUrl */ } } return await super.connect({ ...options, url: urlToUse }); } }
- Use minimal initial agent; apply
instructions
/tools
right after connect via a minimalsession.update
. - Keep
tracing: null
to avoid tracing-shaped updates that include unsupported fields.
Usage of workaround:
// usage in a browser (WebRTC)
import { RealtimeAgent, RealtimeSession } from "@openai/agents-realtime";
import { AzureRealtimeWebRTC } from "./azure-realtime-transport";
// Keep initial agent minimal; Azure creates session during SDP, then we update
const agent = new RealtimeAgent({
name: "Assistant",
instructions: "",
});
// Azure region WebRTC endpoint (no model param needed here; transport adds it)
const baseUrl = "https://<region>.realtimeapi-preview.ai.azure.com/v1/realtimertc";
const model = "gpt-realtime"; // your Azure deployment name
const transport = new AzureRealtimeWebRTC({
baseUrl,
useInsecureApiKey: true,
});
const session = new RealtimeSession(agent, {
model,
transport,
config: { tracing: null }, // avoid tracing-shaped updates Azure may reject
});
// Obtain ephemeral client key from your backend first
await session.connect({
apiKey: "<ephemeral_client_key>",
url: baseUrl,
model,
initialSessionConfig: {
instructions: "You are a helpful assistant. Always use tools when available.",
tools: [ your_tools],
},
});
Why this matters
- The SDK defaults are tuned for OpenAI GA. Azure’s Realtime (preview) differs and rejects certain fields. Without a provider-aware path, common SDK flows (
initialSessionConfig
,updateAgent
) fail on Azure.
Proposed solutions
- Add a provider adapter/flag (
provider: "openai" | "azure"
) that adjustssession.update
payloads. - Or expose a hook to transform outgoing
session.update
beforesendEvent
. - Make tracing updates optional and avoid
session.type
when disabled; avoidoutput_modalities
for non-OpenAI providers. - Document provider differences (Azure WebRTC: session created at SDP; valid modalities; minimal post-connect updates).