Skip to content

Chat-Supervisor Agent Updates #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 51 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,54 @@ Here's a quick [demo video](https://x.com/OpenAIDevs/status/1880306081517432936)
- Start the server with `npm run dev`
- Open your browser to [http://localhost:3000](http://localhost:3000) to see the app. It should automatically connect to the `simpleExample` Agent Set.

## Configuring Agents
# Agentic Patterns

## 1. Chat-Supervisor Pattern

This is demonstrated in the agentConfig [chatSupervisorDemo](src/app/agentConfigs/chatSupervisorDemo/index.ts). The chat agent uses the realtime model to converse with the user and handle basic tasks, and a more intelligent, text-based supervisor model (e.g. `gpt-4.1`) is used extensively to handle all tool calls and more challenging responses. You can define the decision boundary by "opting in" specific tasks to the chat agent as desired. For the demo, the chat agent handles greeting, chitchat, and collecting necessary information for tool calls.

## Example Flow
TODO screenshot

```mermaid
sequenceDiagram
participant User
participant ChatAgent as Chat Agent<br/>(gpt-4o-realtime-mini)
participant Supervisor as Supervisor Agent<br/>(gpt-4.1)
participant Tool as Tool

alt Basic chat or info collection
User->>ChatAgent: User message
ChatAgent->>User: Responds directly
else Requires higher intelligence and/or tool call
User->>ChatAgent: User message
ChatAgent->>User: "Let me think"
ChatAgent->>Supervisor: Forwards message/context
alt Tool call needed
Supervisor->>Tool: Calls tool
Tool->>Supervisor: Returns result
end
Supervisor->>ChatAgent: Returns response
ChatAgent->>User: Delivers response
end
```

## Benefits
- **Simpler onboarding.** If you already have a performant text-based chat agent, you can give that same prompt and set of tools to the supervisor agent, and make some tweaks to the chat agent prompt, you'll have a natural voice agent that will perform on par with your text agent.
- **Simple ramp to a full realtime agent**: Rather than switching your whole agent to the realtime api, you can move one task at a time, taking time to validate and build trust for each before deploying to production.
- **High intelligence**: You benefit from the high intelligence, excellent tool calling and instruction following of models like `gpt-4.1` in your voice agents.
- **Lower cost**: If your chat agent is only being used for basic tasks, you can use the realtime-mini model, which, even when combined with GPT-4.1, should be cheaper than using the full 4o-realtime model.
- **User experience**: It's a more natural conversational experience than using a stitched model architecture, where response latency is often 1.5s or longer after a user has finished speaking. In this architecture, the model responds to the user right away, even if it has to lean on the supervisor agent.
- However, more assistant responses will start with "Let me think", rather than responding immediately with the full response.

## Modifying for your own agent
1. Update the `Domain-Specific Agent Instructions` in [supervisorAgent](src/app/agentConfigs/chatSupervisorDemo/supervisorAgent.ts) with your existing own agent prompt and tools. This should contain the "meat" of your voice agent logic and be very speicific with what it should/shouldn't do, and how it should respond.
2. Adapt your prompt to be more appropriate for voice. For example, emphasize the importance of being concise and avoiding bulleted or numbered lists.
3. Add your tool definitions to [chatAgentInstructions](src/app/agentConfigs/chatSupervisorDemo/index.ts). We recommend a brief yaml description to ensure the model doesn't get confused and actually try calling the tool directly.
4. Customize the chatAgent instructions with your own tone, greeting, etc.
5. To minimize costs, try using `gpt-4o-mini-realtime` for the chatAgent and `gpt-4.1-mini` for the supervisor model. To maximize intelligence on particularly difficult or high-stakes tasks, consider trading off latency and adding chain-of-thought to your supervisor prompt, or using a reasoning supervisor model like `o4-mini`.

## 2. Agent Handoffs
Configuration in `src/app/agentConfigs/simpleExample.ts`
```javascript
import { AgentConfig } from "@/app/types";
Expand Down Expand Up @@ -108,20 +155,19 @@ sequenceDiagram

</details>


### Next steps
# Next steps
- Check out the configs in `src/app/agentConfigs`. The example above is a minimal demo that illustrates the core concepts.
- [frontDeskAuthentication](src/app/agentConfigs/frontDeskAuthentication) Guides the user through a step-by-step authentication flow, confirming each value character-by-character, authenticates the user with a tool call, and then transfers to another agent. Note that the second agent is intentionally "bored" to show how to prompt for personality and tone.
- [customerServiceRetail](src/app/agentConfigs/customerServiceRetail) Also guides through an authentication flow, reads a long offer from a canned script verbatim, and then walks through a complex return flow which requires looking up orders and policies, gathering user context, and checking with `o4-mini` to ensure the return is eligible. To test this flow, say that you'd like to return your snowboard and go through the necessary prompts!

### Defining your own agents
## Defining your own agents
- You can copy these to make your own multi-agent voice app! Once you make a new agent set config, add it to `src/app/agentConfigs/index.ts` and you should be able to select it in the UI in the "Scenario" dropdown menu.
- To see how to define tools and toolLogic, including a background LLM call, see [src/app/agentConfigs/customerServiceRetail/returns.ts](src/app/agentConfigs/customerServiceRetail/returns.ts)
- To see how to define a detailed personality and tone, and use a prompt state machine to collect user information step by step, see [src/app/agentConfigs/frontDeskAuthentication/authentication.ts](src/app/agentConfigs/frontDeskAuthentication/authentication.ts)
- To see how to wire up Agents into a single Agent Set, see [src/app/agentConfigs/frontDeskAuthentication/index.ts](src/app/agentConfigs/frontDeskAuthentication/index.ts)
- If you want help creating your own prompt using these conventions, we've included a metaprompt [here](src/app/agentConfigs/voiceAgentMetaprompt.txt), or you can use our [Voice Agent Metaprompter GPT](https://chatgpt.com/g/g-678865c9fb5c81918fa28699735dd08e-voice-agent-metaprompt-gpt)

### Customizing Output Guardrails
## Customizing Output Guardrails
Assistant messages are checked for safety and compliance using a guardrail function before being finalized in the transcript. This is implemented in [`src/app/hooks/useHandleServerEvent.ts`](src/app/hooks/useHandleServerEvent.ts) as the `processGuardrail` function, which is invoked on each assistant message to run a moderation/classification check. You can review or customize this logic by editing the `processGuardrail` function definition and its invocation inside `useHandleServerEvent`.

## UI
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
import { AgentConfig } from "@/app/types";
import { getNextResponse } from "./supervisorAgent";
import { getNextResponseFromSupervisor } from "./supervisorAgent";

const mainAgentInstructions = `
You are a helpful junior customer service agent. Your task is to help a customer resolve a user's issue in a way that's helpful, efficient, and correct, deferring heavily to the supervisor agent.
const chatAgentInstructions = `
You are a helpful junior customer service agent. Your task is to maintain a natural conversation flow with the user, help them resolve their query in a qay that's helpful, efficient, and correct, and to defer heavily to a more experienced and intelligent Supervisor Agent.

# General Instructions
- You are very new and can only handle basic tasks, and will rely heavily on the supervisor agent via the getNextResponse tool
- By default, you must always use the getNextResponse tool to get your next response, except for very specific exceptions.
- You are very new and can only handle basic tasks, and will rely heavily on the Supervisor Agent via the getNextResponseFromSupervisor tool
- By default, you must always use the getNextResponseFromSupervisor tool to get your next response, except for very specific exceptions.
- You represent a company called NewTelco.
- Maintain an extremely professional, unexpressive, and to-the-point tone at all times.
- Always greet the user with "Hi, you've reached NewTelco, how can I help you?"
Expand All @@ -15,7 +15,7 @@ You are a helpful junior customer service agent. Your task is to help a customer
- Do not use any of the information or values from the examples as a reference in conversation.

# Tools
- You can ONLY call getNextResponse
- You can ONLY call getNextResponseFromSupervisor
- Even if you're provided other tools in this prompt as a reference, NEVER call them directly.

# Allow List of Permitted Actions
Expand All @@ -26,7 +26,7 @@ You can take the following actions directly, and don't need to use getNextResepo
- Engage in basic chitchat (e.g., "how are you?", "thank you").
- Respond to requests to repeat or clarify information (e.g., "can you repeat that?").

## Collect information for supervisor agent tool calls
## Collect information for Supervisor Agent tool calls
- Request user information needed to call tools. Refer to the Supervisor Tools section below for the full definitions and schema.

### Supervisor Agent Tools
Expand All @@ -47,20 +47,20 @@ findNearestStore:
params:
zip_code: string (required) - The customer's 5-digit zip code.

**You must NOT answer, resolve, or attempt to handle ANY other type of request, question, or issue directly. For absolutely everything else, you MUST use the getNextResponse tool to get your response. This includes ANY factual, account-specific, or process-related questions, no matter how minor they may seem.**
**You must NOT answer, resolve, or attempt to handle ANY other type of request, question, or issue yourself. For absolutely everything else, you MUST use the getNextResponseFromSupervisor tool to get your response. This includes ANY factual, account-specific, or process-related questions, no matter how minor they may seem.**

# getNextResponse Usage
- For ALL requests that are not strictly and explicitly listed above, you MUST ALWAYS use the getNextResponse tool, which will ask the supervisor agent for a high-quality response you can use.
# getNextResponseFromSupervisor Usage
- For ALL requests that are not strictly and explicitly listed above, you MUST ALWAYS use the getNextResponseFromSupervisor tool, which will ask the supervisor Agent for a high-quality response you can use.
- For example, this could be to answer factual questions about accounts or business processes, or asking to take actions.
- Do NOT attempt to answer, resolve, or speculate on any other requests, even if you think you know the answer or it seems simple.
- You should make NO assumptions about what you can or can't do. Always defer to getNextResponse() for all non-trivial queries.
- Before calling getNextResponse, you MUST ALWAYS say something to the user (see the 'Sample Filler Phrases' section). Never call getNextResponse without first saying something to the user.
- You should make NO assumptions about what you can or can't do. Always defer to getNextResponseFromSupervisor() for all non-trivial queries.
- Before calling getNextResponseFromSupervisor, you MUST ALWAYS say something to the user (see the 'Sample Filler Phrases' section). Never call getNextResponseFromSupervisor without first saying something to the user.
- Filler phrases must NOT indicate whether you can or cannot fulfill an action; they should be neutral and not imply any outcome.
- After the filler phrase YOU MUST ALWAYS call the getNextResponse tool.
- This is required for every use of getNextResponse, without exception. Do not skip the filler phrase, even if the user has just provided information or context.
- After the filler phrase YOU MUST ALWAYS call the getNextResponseFromSupervisor tool.
- This is required for every use of getNextResponseFromSupervisor, without exception. Do not skip the filler phrase, even if the user has just provided information or context.
- You will use this tool extensively.

## How getNextResponse Works
## How getNextResponseFromSupervisor Works
- This asks supervisorAgent what to do next. supervisorAgent is a more senior, more intelligent and capable agent that has access to the full conversation transcript so far and can call the above functions.
- You must provide it with key context, ONLY from the most recent user message, as the supervisor may not have access to that message.
- This should be as concise as absolutely possible, and can be an empty string if no salient information is in the last user message.
Expand All @@ -81,8 +81,8 @@ findNearestStore:
- Assistant: "Sure, may I have your phone number so I can look that up?"
- User: 206 135 1246
- Assistant: "Okay, let me look into that" // Required filler phrase
- getNextResponse(relevantContextFromLastUserMessage="Phone number is 206 123 1246)
- getNextResponse(): "# Message\nOkay, I've pulled that up. Your last bill was $xx.xx, mainly due to $y.yy in international calls and $z.zz in data overage. Does that make sense?"
- getNextResponseFromSupervisor(relevantContextFromLastUserMessage="Phone number: 206 123 1246)
- getNextResponseFromSupervisor(): "# Message\nOkay, I've pulled that up. Your last bill was $xx.xx, mainly due to $y.yy in international calls and $z.zz in data overage. Does that make sense?"
- Assistant: "Okay, I've pulled that up. It looks like your last bill was $xx.xx, which is higher than your usual amount because of $x.xx in international calls and $x.xx in data overage charges. Does that make sense?"
- User: "Okay, yes, thank you."
- Assistant: "Of course, please let me know if I can help with anything else."
Expand All @@ -93,22 +93,22 @@ findNearestStore:
- User: "Nope that's great, bye!"
- Assistant: "Of course, thanks for calling NewTelco!"

# Additional Example (Filler Phrase Before getNextResponse)
# Additional Example (Filler Phrase Before getNextResponseFromSupervisor)
- User: "Can you tell me what my current plan includes?"
- Assistant: "One moment."
- getNextResponse(relevantContextFromLastUserMessage="Wants to know what current plan includes")
- getNextResponse(): "# Message\nYour current plan includes unlimited talk and text, plus 10GB of data per month. Would you like more details or information about upgrading?"
- getNextResponseFromSupervisor(relevantContextFromLastUserMessage="Wants to know what their current plan includes")
- getNextResponseFromSupervisor(): "# Message\nYour current plan includes unlimited talk and text, plus 10GB of data per month. Would you like more details or information about upgrading?"
- Assistant: "Your current plan includes unlimited talk and text, plus 10GB of data per month. Would you like more details or information about upgrading?"
`;

const mainAgent: AgentConfig = {
name: "mainAgent",
publicDescription: "Customer service agent for NewTelco.",
instructions: mainAgentInstructions,
const chatAgent: AgentConfig = {
name: "chatAgent",
publicDescription: "Customer service chat agent for NewTelco.",
instructions: chatAgentInstructions,
tools: [
{
type: "function",
name: "getNextResponse",
name: "getNextResponseFromSupervisor",
description:
"Determines the next response whenever the agent faces a non-trivial decision, produced by a highly intelligent supervisor agent. Returns a message describing what to do next.",
parameters: {
Expand All @@ -117,20 +117,19 @@ const mainAgent: AgentConfig = {
relevantContextFromLastUserMessage: {
type: "string",
description:
"Key information from the user described in their most recent message. This is critical to provide as the supervisor agent with full context as the last message might not be available.",
},
"Key information from the user described in their most recent message. This is critical to provide as the supervisor agent with full context as the last message might not be available. Okay to omit if the user message didn't add any new information.",
}, // Last message transcript can arrive after the tool call, in which case this is the only way to provide the supervisor with this context.
},
required: ["relevantContextFromLastUserMessage"],
additionalProperties: false,
},
},
],
toolLogic: {
getNextResponse,
getNextResponseFromSupervisor,
},
downstreamAgents: [],
};

const agents = [mainAgent];
const agents = [chatAgent];

export default agents;
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import {
exampleStoreLocations,
} from "./sampleData";

const supervisorAgentInstructions = `You are an expert supervisor agent for customer service, tasked with providing real-time guidance to a more junior agent. You will be given detailed response instructions, tools, and the full conversation history so far.
const supervisorAgentInstructions = `You are an expert customer service supervisor agent, tasked with providing real-time guidance to a more junior agent that's chatting directly with the customer. You will be given detailed response instructions, tools, and the full conversation history so far, and you should create a correct next message that the junior agent can read directly.

# Instructions
- You can provide an answer directly, or call a tool first and then answer the question
Expand Down Expand Up @@ -223,7 +223,6 @@ function filterTranscriptLogs(transcriptLogs: any[]) {
continue;
}
if (item.type === "MESSAGE") {
// Remove guardrailResult and expanded
// eslint-disable-next-line @typescript-eslint/no-unused-vars
const { guardrailResult, expanded, ...rest } = item;
filtered.push(rest);
Expand All @@ -234,7 +233,7 @@ function filterTranscriptLogs(transcriptLogs: any[]) {
return filtered;
}

export async function getNextResponse(
export async function getNextResponseFromSupervisor(
{
relevantContextFromLastUserMessage,
}: { relevantContextFromLastUserMessage: string },
Expand Down
6 changes: 3 additions & 3 deletions src/app/agentConfigs/index.ts
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
import { AllAgentConfigsType } from "@/app/types";
import frontDeskAuthentication from "./frontDeskAuthentication";
import customerServiceRetail from "./customerServiceRetail";
import customerServiceWithSupervision from "./customerServiceWithSupervision";
import chatSupervisorDemo from "./chatSupervisorDemo";
import simpleExample from "./simpleExample";

export const allAgentSets: AllAgentConfigsType = {
frontDeskAuthentication,
customerServiceRetail,
customerServiceWithSupervision,
chatSupervisorDemo,
simpleExample,
};

export const defaultAgentSetKey = "simpleExample";
export const defaultAgentSetKey = "chatSupervisorDemo";