Skip to content

Commit b1bb6db

Browse files
Add compose example for DocSum amd rocm deployment (#1125)
Signed-off-by: Artem Astafev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 7949045 commit b1bb6db

File tree

4 files changed

+469
-0
lines changed

4 files changed

+469
-0
lines changed
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# Build and deploy DocSum Application on AMD GPU (ROCm)
2+
3+
## Build images
4+
5+
## 🚀 Build Docker Images
6+
7+
First of all, you need to build Docker Images locally and install the python package of it.
8+
9+
### 1. Build LLM Image
10+
11+
```bash
12+
git clone https://github.com/opea-project/GenAIComps.git
13+
cd GenAIComps
14+
docker build -t opea/llm-docsum-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/summarization/tgi/langchain/Dockerfile .
15+
```
16+
17+
Then run the command `docker images`, you will have the following four Docker Images:
18+
19+
### 2. Build MegaService Docker Image
20+
21+
To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `docsum.py` Python script. Build the MegaService Docker image via below command:
22+
23+
```bash
24+
git clone https://github.com/opea-project/GenAIExamples
25+
cd GenAIExamples/DocSum/
26+
docker build -t opea/docsum:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
27+
```
28+
29+
### 3. Build UI Docker Image
30+
31+
Build the frontend Docker image via below command:
32+
33+
```bash
34+
cd GenAIExamples/DocSum/ui
35+
docker build -t opea/docsum-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile .
36+
```
37+
38+
Then run the command `docker images`, you will have the following Docker Images:
39+
40+
1. `opea/llm-docsum-tgi:latest`
41+
2. `opea/docsum:latest`
42+
3. `opea/docsum-ui:latest`
43+
44+
### 4. Build React UI Docker Image
45+
46+
Build the frontend Docker image via below command:
47+
48+
```bash
49+
cd GenAIExamples/DocSum/ui
50+
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/docsum"
51+
docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT -f ./docker/Dockerfile.react .
52+
53+
docker build -t opea/docsum-react-ui:latest --build-arg BACKEND_SERVICE_ENDPOINT=$BACKEND_SERVICE_ENDPOINT --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
54+
```
55+
56+
Then run the command `docker images`, you will have the following Docker Images:
57+
58+
1. `opea/llm-docsum-tgi:latest`
59+
2. `opea/docsum:latest`
60+
3. `opea/docsum-ui:latest`
61+
4. `opea/docsum-react-ui:latest`
62+
63+
## 🚀 Start Microservices and MegaService
64+
65+
### Required Models
66+
67+
Default model is "Intel/neural-chat-7b-v3-3". Change "LLM_MODEL_ID" in environment variables below if you want to use another model.
68+
For gated models, you also need to provide [HuggingFace token](https://huggingface.co/docs/hub/security-tokens) in "HUGGINGFACEHUB_API_TOKEN" environment variable.
69+
70+
### Setup Environment Variables
71+
72+
Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below.
73+
74+
```bash
75+
export DOCSUM_TGI_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"
76+
export DOCSUM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
77+
export HOST_IP=${host_ip}
78+
export DOCSUM_TGI_SERVICE_PORT="18882"
79+
export DOCSUM_TGI_LLM_ENDPOINT="http://${HOST_IP}:${DOCSUM_TGI_SERVICE_PORT}"
80+
export DOCSUM_HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
81+
export DOCSUM_LLM_SERVER_PORT="8008"
82+
export DOCSUM_BACKEND_SERVER_PORT="8888"
83+
export DOCSUM_FRONTEND_PORT="5173"
84+
```
85+
86+
Note: Please replace with `host_ip` with your external IP address, do not use localhost.
87+
88+
Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered<node>, where <node> is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus)
89+
90+
Example for set isolation for 1 GPU
91+
92+
```
93+
- /dev/dri/card0:/dev/dri/card0
94+
- /dev/dri/renderD128:/dev/dri/renderD128
95+
```
96+
97+
Example for set isolation for 2 GPUs
98+
99+
```
100+
- /dev/dri/card0:/dev/dri/card0
101+
- /dev/dri/renderD128:/dev/dri/renderD128
102+
- /dev/dri/card1:/dev/dri/card1
103+
- /dev/dri/renderD129:/dev/dri/renderD129
104+
```
105+
106+
Please find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus)
107+
108+
### Start Microservice Docker Containers
109+
110+
```bash
111+
cd GenAIExamples/DocSum/docker_compose/amd/gpu/rocm
112+
docker compose up -d
113+
```
114+
115+
### Validate Microservices
116+
117+
1. TGI Service
118+
119+
```bash
120+
curl http://${host_ip}:8008/generate \
121+
-X POST \
122+
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \
123+
-H 'Content-Type: application/json'
124+
```
125+
126+
2. LLM Microservice
127+
128+
```bash
129+
curl http://${host_ip}:9000/v1/chat/docsum \
130+
-X POST \
131+
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \
132+
-H 'Content-Type: application/json'
133+
```
134+
135+
3. MegaService
136+
137+
```bash
138+
curl http://${host_ip}:8888/v1/docsum -H "Content-Type: application/json" -d '{
139+
"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens":32, "language":"en", "stream":false
140+
}'
141+
```
142+
143+
## 🚀 Launch the Svelte UI
144+
145+
Open this URL `http://{host_ip}:5173` in your browser to access the frontend.
146+
147+
![project-screenshot](https://github.com/intel-ai-tce/GenAIExamples/assets/21761437/93b1ed4b-4b76-4875-927e-cc7818b4825b)
148+
149+
Here is an example for summarizing a article.
150+
151+
![image](https://github.com/intel-ai-tce/GenAIExamples/assets/21761437/67ecb2ec-408d-4e81-b124-6ded6b833f55)
152+
153+
## 🚀 Launch the React UI (Optional)
154+
155+
To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `docsum-rocm-ui-server` service with the `docsum-rocm-react-ui-server` service as per the config below:
156+
157+
```yaml
158+
docsum-rocm-react-ui-server:
159+
image: ${REGISTRY:-opea}/docsum-react-ui:${TAG:-latest}
160+
container_name: docsum-rocm-react-ui-server
161+
depends_on:
162+
- docsum-rocm-backend-server
163+
ports:
164+
- "5174:80"
165+
environment:
166+
- no_proxy=${no_proxy}
167+
- https_proxy=${https_proxy}
168+
- http_proxy=${http_proxy}
169+
- DOC_BASE_URL=${BACKEND_SERVICE_ENDPOINT}
170+
```
171+
172+
Open this URL `http://{host_ip}:5175` in your browser to access the frontend.
173+
174+
![project-screenshot](../../../../assets/img/docsum-ui-react.png)
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Copyright (C) 2024 Advanced Micro Devices, Inc.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
services:
5+
docsum-tgi-service:
6+
image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
7+
container_name: docsum-tgi-service
8+
ports:
9+
- "${DOCSUM_TGI_SERVICE_PORT}:80"
10+
environment:
11+
no_proxy: ${no_proxy}
12+
http_proxy: ${http_proxy}
13+
https_proxy: ${https_proxy}
14+
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${DOCSUM_TGI_SERVICE_PORT}"
15+
HUGGINGFACEHUB_API_TOKEN: ${DOCSUM_HUGGINGFACEHUB_API_TOKEN}
16+
volumes:
17+
- "/var/opea/docsum-service/data:/data"
18+
shm_size: 1g
19+
devices:
20+
- /dev/kfd:/dev/kfd
21+
- /dev/dri/${DOCSUM_CARD_ID}:/dev/dri/${DOCSUM_CARD_ID}
22+
- /dev/dri/${DOCSUM_RENDER_ID}:/dev/dri/${DOCSUM_RENDER_ID}
23+
cap_add:
24+
- SYS_PTRACE
25+
group_add:
26+
- video
27+
security_opt:
28+
- seccomp:unconfined
29+
ipc: host
30+
command: --model-id ${DOCSUM_LLM_MODEL_ID}
31+
docsum-llm-server:
32+
image: ${REGISTRY:-opea}/llm-docsum-tgi:${TAG:-latest}
33+
container_name: docsum-llm-server
34+
depends_on:
35+
- docsum-tgi-service
36+
ports:
37+
- "${DOCSUM_LLM_SERVER_PORT}:9000"
38+
ipc: host
39+
group_add:
40+
- video
41+
security_opt:
42+
- seccomp:unconfined
43+
cap_add:
44+
- SYS_PTRACE
45+
devices:
46+
- /dev/kfd:/dev/kfd
47+
- /dev/dri/${DOCSUM_CARD_ID}:/dev/dri/${DOCSUM_CARD_ID}
48+
- /dev/dri/${DOCSUM_RENDER_ID}:/dev/dri/${DOCSUM_RENDER_ID}
49+
environment:
50+
no_proxy: ${no_proxy}
51+
http_proxy: ${http_proxy}
52+
https_proxy: ${https_proxy}
53+
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${DOCSUM_TGI_SERVICE_PORT}"
54+
HUGGINGFACEHUB_API_TOKEN: ${DOCSUM_HUGGINGFACEHUB_API_TOKEN}
55+
restart: unless-stopped
56+
docsum-backend-server:
57+
image: ${REGISTRY:-opea}/docsum:${TAG:-latest}
58+
container_name: docsum-backend-server
59+
depends_on:
60+
- docsum-tgi-service
61+
- docsum-llm-server
62+
ports:
63+
- "${DOCSUM_BACKEND_SERVER_PORT}:8888"
64+
environment:
65+
- no_proxy=${no_proxy}
66+
- https_proxy=${https_proxy}
67+
- http_proxy=${http_proxy}
68+
- MEGA_SERVICE_HOST_IP=${HOST_IP}
69+
- LLM_SERVICE_HOST_IP=${HOST_IP}
70+
ipc: host
71+
restart: always
72+
docsum-ui-server:
73+
image: ${REGISTRY:-opea}/docsum-ui:${TAG:-latest}
74+
container_name: docsum-ui-server
75+
depends_on:
76+
- docsum-backend-server
77+
ports:
78+
- "${DOCSUM_FRONTEND_PORT}:5173"
79+
environment:
80+
- no_proxy=${no_proxy}
81+
- https_proxy=${https_proxy}
82+
- http_proxy=${http_proxy}
83+
- DOC_BASE_URL="http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum"
84+
ipc: host
85+
restart: always
86+
87+
networks:
88+
default:
89+
driver: bridge
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/usr/bin/env bash
2+
3+
# Copyright (C) 2024 Advanced Micro Devices, Inc.
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
export DOCSUM_TGI_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm"
7+
export DOCSUM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
8+
export HOST_IP=${host_ip}
9+
export DOCSUM_TGI_SERVICE_PORT="8008"
10+
export DOCSUM_TGI_LLM_ENDPOINT="http://${HOST_IP}:${DOCSUM_TGI_SERVICE_PORT}"
11+
export DOCSUM_HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
12+
export DOCSUM_LLM_SERVER_PORT="9000"
13+
export DOCSUM_BACKEND_SERVER_PORT="8888"
14+
export DOCSUM_FRONTEND_PORT="5173"
15+
export BACKEND_SERVICE_ENDPOINT="http://${HOST_IP}:${DOCSUM_BACKEND_SERVER_PORT}/v1/docsum"

0 commit comments

Comments
 (0)