Skip to content

Commit 5eb3d28

Browse files
Update AgentQnA example for v1.1 release (#885)
Signed-off-by: minmin-intel <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent ced68e1 commit 5eb3d28

17 files changed

+212
-104
lines changed

AgentQnA/README.md

Lines changed: 51 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -81,17 +81,13 @@ flowchart LR
8181
3. Hierarchical agent can further improve performance.
8282
Expert worker agents, such as retrieval agent, knowledge graph agent, SQL agent, etc., can provide high-quality output for different aspects of a complex query, and the supervisor agent can aggregate the information together to provide a comprehensive answer.
8383

84-
### Roadmap
84+
## Deployment with docker
8585

86-
- v0.9: Worker agent uses open-source websearch tool (duckduckgo), agents use OpenAI GPT-4o-mini as llm backend.
87-
- v1.0: Worker agent uses OPEA retrieval megaservice as tool.
88-
- v1.0 or later: agents use open-source llm backend.
89-
- v1.1 or later: add safeguards
86+
1. Build agent docker image
9087

91-
## Getting started
88+
Note: this is optional. The docker images will be automatically pulled when running the docker compose commands. This step is only needed if pulling images failed.
9289

93-
1. Build agent docker image </br>
94-
First, clone the opea GenAIComps repo
90+
First, clone the opea GenAIComps repo.
9591

9692
```
9793
export WORKDIR=<your-work-directory>
@@ -106,35 +102,63 @@ flowchart LR
106102
docker build -t opea/agent-langchain:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/agent/langchain/Dockerfile .
107103
```
108104

109-
2. Launch tool services </br>
110-
In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
111-
112-
```
113-
docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
114-
```
115-
116-
3. Set up environment for this example </br>
117-
First, clone this repo
105+
2. Set up environment for this example </br>
106+
First, clone this repo.
118107

119108
```
120109
cd $WORKDIR
121110
git clone https://github.com/opea-project/GenAIExamples.git
122111
```
123112

124-
Second, set up env vars
113+
Second, set up env vars.
125114

126115
```
127116
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
128-
# optional: OPANAI_API_KEY
117+
# for using open-source llms
118+
export HUGGINGFACEHUB_API_TOKEN=<your-HF-token>
119+
export HF_CACHE_DIR=<directory-where-llms-are-downloaded> #so that no need to redownload every time
120+
121+
# optional: OPANAI_API_KEY if you want to use OpenAI models
129122
export OPENAI_API_KEY=<your-openai-key>
130123
```
131124

132-
4. Launch agent services</br>
133-
The configurations of the supervisor agent and the worker agent are defined in the docker-compose yaml file. We currently use openAI GPT-4o-mini as LLM, and we plan to add support for llama3.1-70B-instruct (served by TGI-Gaudi) in a subsequent release.
134-
To use openai llm, run command below.
125+
3. Deploy the retrieval tool (i.e., DocIndexRetriever mega-service)
126+
127+
First, launch the mega-service.
128+
129+
```
130+
cd $WORKDIR/GenAIExamples/AgentQnA/retrieval_tool
131+
bash launch_retrieval_tool.sh
132+
```
133+
134+
Then, ingest data into the vector database. Here we provide an example. You can ingest your own data.
135+
136+
```
137+
bash run_ingest_data.sh
138+
```
139+
140+
4. Launch other tools. </br>
141+
In this example, we will use some of the mock APIs provided in the Meta CRAG KDD Challenge to demonstrate the benefits of gaining additional context from mock knowledge graphs.
142+
143+
```
144+
docker run -d -p=8080:8000 docker.io/aicrowd/kdd-cup-24-crag-mock-api:v0
145+
```
146+
147+
5. Launch agent services</br>
148+
We provide two options for `llm_engine` of the agents: 1. open-source LLMs, 2. OpenAI models via API calls.
149+
150+
To use open-source LLMs on Gaudi2, run commands below.
151+
152+
```
153+
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/hpu/gaudi
154+
bash launch_tgi_gaudi.sh
155+
bash launch_agent_service_tgi_gaudi.sh
156+
```
157+
158+
To use OpenAI models, run commands below.
135159

136160
```
137-
cd docker_compose/intel/cpu/xeon
161+
cd $WORKDIR/GenAIExamples/AgentQnA/docker_compose/intel/cpu/xeon
138162
bash launch_agent_service_openai.sh
139163
```
140164

@@ -143,10 +167,12 @@ flowchart LR
143167
First look at logs of the agent docker containers:
144168

145169
```
146-
docker logs docgrader-agent-endpoint
170+
# worker agent
171+
docker logs rag-agent-endpoint
147172
```
148173

149174
```
175+
# supervisor agent
150176
docker logs react-agent-endpoint
151177
```
152178

@@ -170,4 +196,4 @@ curl http://${ip_address}:9090/v1/chat/completions -X POST -H "Content-Type: app
170196

171197
## How to register your own tools with agent
172198

173-
You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/langchain/README.md#5-customize-agent-strategy).
199+
You can take a look at the tools yaml and python files in this example. For more details, please refer to the "Provide your own tools" section in the instructions [here](https://github.com/opea-project/GenAIComps/tree/main/comps/agent/langchain/README.md).
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Deployment on Xeon
2+
3+
We deploy the retrieval tool on Xeon. For LLMs, we support OpenAI models via API calls. For instructions on using open-source LLMs, please refer to the deployment guide [here](../../../../README.md).

AgentQnA/docker_compose/intel/cpu/xeon/compose_openai.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,10 @@
22
# SPDX-License-Identifier: Apache-2.0
33

44
services:
5-
worker-docgrader-agent:
5+
worker-rag-agent:
66
image: opea/agent-langchain:latest
7-
container_name: docgrader-agent-endpoint
7+
container_name: rag-agent-endpoint
88
volumes:
9-
- ${WORKDIR}/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
109
- ${TOOLSET_PATH}:/home/user/tools/
1110
ports:
1211
- "9095:9095"
@@ -36,8 +35,9 @@ services:
3635
supervisor-react-agent:
3736
image: opea/agent-langchain:latest
3837
container_name: react-agent-endpoint
38+
depends_on:
39+
- worker-rag-agent
3940
volumes:
40-
- ${WORKDIR}/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
4141
- ${TOOLSET_PATH}:/home/user/tools/
4242
ports:
4343
- "9090:9090"

AgentQnA/docker_compose/intel/cpu/xeon/launch_agent_service_openai.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ export recursion_limit_worker=12
77
export recursion_limit_supervisor=10
88
export model="gpt-4o-mini-2024-07-18"
99
export temperature=0
10-
export max_new_tokens=512
10+
export max_new_tokens=4096
1111
export OPENAI_API_KEY=${OPENAI_API_KEY}
1212
export WORKER_AGENT_URL="http://${ip_address}:9095/v1/chat/completions"
1313
export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"

AgentQnA/docker_compose/intel/hpu/gaudi/compose.yaml

Lines changed: 5 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,37 +2,9 @@
22
# SPDX-License-Identifier: Apache-2.0
33

44
services:
5-
tgi-server:
6-
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
7-
container_name: tgi-server
8-
ports:
9-
- "8085:80"
10-
volumes:
11-
- ${HF_CACHE_DIR}:/data
12-
environment:
13-
no_proxy: ${no_proxy}
14-
http_proxy: ${http_proxy}
15-
https_proxy: ${https_proxy}
16-
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
17-
HF_HUB_DISABLE_PROGRESS_BARS: 1
18-
HF_HUB_ENABLE_HF_TRANSFER: 0
19-
HABANA_VISIBLE_DEVICES: all
20-
OMPI_MCA_btl_vader_single_copy_mechanism: none
21-
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
22-
ENABLE_HPU_GRAPH: true
23-
LIMIT_HPU_GRAPH: true
24-
USE_FLASH_ATTENTION: true
25-
FLASH_ATTENTION_RECOMPUTE: true
26-
runtime: habana
27-
cap_add:
28-
- SYS_NICE
29-
ipc: host
30-
command: --model-id ${LLM_MODEL_ID} --max-input-length 4096 --max-total-tokens 8192 --sharded true --num-shard ${NUM_SHARDS}
31-
worker-docgrader-agent:
5+
worker-rag-agent:
326
image: opea/agent-langchain:latest
33-
container_name: docgrader-agent-endpoint
34-
depends_on:
35-
- tgi-server
7+
container_name: rag-agent-endpoint
368
volumes:
379
# - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
3810
- ${TOOLSET_PATH}:/home/user/tools/
@@ -41,7 +13,7 @@ services:
4113
ipc: host
4214
environment:
4315
ip_address: ${ip_address}
44-
strategy: rag_agent
16+
strategy: rag_agent_llama
4517
recursion_limit: ${recursion_limit_worker}
4618
llm_engine: tgi
4719
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
@@ -66,8 +38,7 @@ services:
6638
image: opea/agent-langchain:latest
6739
container_name: react-agent-endpoint
6840
depends_on:
69-
- tgi-server
70-
- worker-docgrader-agent
41+
- worker-rag-agent
7142
volumes:
7243
# - ${WORKDIR}/GenAIExamples/AgentQnA/docker_image_build/GenAIComps/comps/agent/langchain/:/home/user/comps/agent/langchain/
7344
- ${TOOLSET_PATH}:/home/user/tools/
@@ -76,7 +47,7 @@ services:
7647
ipc: host
7748
environment:
7849
ip_address: ${ip_address}
79-
strategy: react_langgraph
50+
strategy: react_llama
8051
recursion_limit: ${recursion_limit_supervisor}
8152
llm_engine: tgi
8253
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}

AgentQnA/docker_compose/intel/hpu/gaudi/launch_agent_service_tgi_gaudi.sh

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ export LLM_MODEL_ID="meta-llama/Meta-Llama-3.1-70B-Instruct"
1515
export NUM_SHARDS=4
1616
export LLM_ENDPOINT_URL="http://${ip_address}:8085"
1717
export temperature=0.01
18-
export max_new_tokens=512
18+
export max_new_tokens=4096
1919

2020
# agent related environment variables
2121
export TOOLSET_PATH=$WORKDIR/GenAIExamples/AgentQnA/tools/
@@ -27,17 +27,3 @@ export RETRIEVAL_TOOL_URL="http://${ip_address}:8889/v1/retrievaltool"
2727
export CRAG_SERVER=http://${ip_address}:8080
2828

2929
docker compose -f compose.yaml up -d
30-
31-
sleep 5s
32-
echo "Waiting tgi gaudi ready"
33-
n=0
34-
until [[ "$n" -ge 100 ]] || [[ $ready == true ]]; do
35-
docker logs tgi-server &> tgi-gaudi-service.log
36-
n=$((n+1))
37-
if grep -q Connected tgi-gaudi-service.log; then
38-
break
39-
fi
40-
sleep 5s
41-
done
42-
sleep 5s
43-
echo "Service started successfully"
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# LLM related environment variables
5+
export HF_CACHE_DIR=${HF_CACHE_DIR}
6+
ls $HF_CACHE_DIR
7+
export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
8+
export LLM_MODEL_ID="meta-llama/Meta-Llama-3.1-70B-Instruct"
9+
export NUM_SHARDS=4
10+
11+
docker compose -f tgi_gaudi.yaml up -d
12+
13+
sleep 5s
14+
echo "Waiting tgi gaudi ready"
15+
n=0
16+
until [[ "$n" -ge 100 ]] || [[ $ready == true ]]; do
17+
docker logs tgi-server &> tgi-gaudi-service.log
18+
n=$((n+1))
19+
if grep -q Connected tgi-gaudi-service.log; then
20+
break
21+
fi
22+
sleep 5s
23+
done
24+
sleep 5s
25+
echo "Service started successfully"
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
services:
5+
tgi-server:
6+
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
7+
container_name: tgi-server
8+
ports:
9+
- "8085:80"
10+
volumes:
11+
- ${HF_CACHE_DIR}:/data
12+
environment:
13+
no_proxy: ${no_proxy}
14+
http_proxy: ${http_proxy}
15+
https_proxy: ${https_proxy}
16+
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
17+
HF_HUB_DISABLE_PROGRESS_BARS: 1
18+
HF_HUB_ENABLE_HF_TRANSFER: 0
19+
HABANA_VISIBLE_DEVICES: all
20+
OMPI_MCA_btl_vader_single_copy_mechanism: none
21+
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
22+
ENABLE_HPU_GRAPH: true
23+
LIMIT_HPU_GRAPH: true
24+
USE_FLASH_ATTENTION: true
25+
FLASH_ATTENTION_RECOMPUTE: true
26+
runtime: habana
27+
cap_add:
28+
- SYS_NICE
29+
ipc: host
30+
command: --model-id ${LLM_MODEL_ID} --max-input-length 4096 --max-total-tokens 8192 --sharded true --num-shard ${NUM_SHARDS}
File renamed without changes.

0 commit comments

Comments
 (0)