|
| 1 | +# Build Mega Service of AudioQnA on AMD ROCm GPU |
| 2 | + |
| 3 | +This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice |
| 4 | +pipeline on server on AMD ROCm GPU platform. |
| 5 | + |
| 6 | +## 🚀 Build Docker images |
| 7 | + |
| 8 | +### 1. Source Code install GenAIComps |
| 9 | + |
| 10 | +```bash |
| 11 | +git clone https://github.com/opea-project/GenAIComps.git |
| 12 | +cd GenAIComps |
| 13 | +``` |
| 14 | + |
| 15 | +### 2. Build ASR Image |
| 16 | + |
| 17 | +```bash |
| 18 | +docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/dependency/Dockerfile . |
| 19 | + |
| 20 | + |
| 21 | +docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile . |
| 22 | +``` |
| 23 | + |
| 24 | +### 3. Build LLM Image |
| 25 | + |
| 26 | +```bash |
| 27 | +docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile . |
| 28 | +``` |
| 29 | + |
| 30 | +Note: |
| 31 | +For compose for ROCm example AMD optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm (https://github.com/huggingface/text-generation-inference) |
| 32 | + |
| 33 | +### 4. Build TTS Image |
| 34 | + |
| 35 | +```bash |
| 36 | +docker build -t opea/speecht5:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/dependency/Dockerfile . |
| 37 | + |
| 38 | +docker build -t opea/tts:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/tts/speecht5/Dockerfile . |
| 39 | +``` |
| 40 | + |
| 41 | +### 6. Build MegaService Docker Image |
| 42 | + |
| 43 | +To construct the Mega Service, we utilize the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline within the `audioqna.py` Python script. Build the MegaService Docker image using the command below: |
| 44 | + |
| 45 | +```bash |
| 46 | +git clone https://github.com/opea-project/GenAIExamples.git |
| 47 | +cd GenAIExamples/AudioQnA/ |
| 48 | +docker build --no-cache -t opea/audioqna:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . |
| 49 | +``` |
| 50 | + |
| 51 | +Then run the command `docker images`, you will have following images ready: |
| 52 | + |
| 53 | +1. `opea/whisper:latest` |
| 54 | +2. `opea/asr:latest` |
| 55 | +3. `opea/llm-tgi:latest` |
| 56 | +4. `opea/speecht5:latest` |
| 57 | +5. `opea/tts:latest` |
| 58 | +6. `opea/audioqna:latest` |
| 59 | + |
| 60 | +## 🚀 Set the environment variables |
| 61 | + |
| 62 | +Before starting the services with `docker compose`, you have to recheck the following environment variables. |
| 63 | + |
| 64 | +```bash |
| 65 | +export host_ip=<your External Public IP> # export host_ip=$(hostname -I | awk '{print $1}') |
| 66 | +export HUGGINGFACEHUB_API_TOKEN=<your HF token> |
| 67 | + |
| 68 | +export TGI_LLM_ENDPOINT=http://$host_ip:3006 |
| 69 | +export LLM_MODEL_ID=Intel/neural-chat-7b-v3-3 |
| 70 | + |
| 71 | +export ASR_ENDPOINT=http://$host_ip:7066 |
| 72 | +export TTS_ENDPOINT=http://$host_ip:7055 |
| 73 | + |
| 74 | +export MEGA_SERVICE_HOST_IP=${host_ip} |
| 75 | +export ASR_SERVICE_HOST_IP=${host_ip} |
| 76 | +export TTS_SERVICE_HOST_IP=${host_ip} |
| 77 | +export LLM_SERVICE_HOST_IP=${host_ip} |
| 78 | + |
| 79 | +export ASR_SERVICE_PORT=3001 |
| 80 | +export TTS_SERVICE_PORT=3002 |
| 81 | +export LLM_SERVICE_PORT=3007 |
| 82 | +``` |
| 83 | + |
| 84 | +or use set_env.sh file to setup environment variables. |
| 85 | + |
| 86 | +Note: Please replace with host_ip with your external IP address, do not use localhost. |
| 87 | + |
| 88 | +Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered, where is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) |
| 89 | + |
| 90 | +Example for set isolation for 1 GPU |
| 91 | + |
| 92 | + - /dev/dri/card0:/dev/dri/card0 |
| 93 | + - /dev/dri/renderD128:/dev/dri/renderD128 |
| 94 | + |
| 95 | +Example for set isolation for 2 GPUs |
| 96 | + |
| 97 | + - /dev/dri/card0:/dev/dri/card0 |
| 98 | + - /dev/dri/renderD128:/dev/dri/renderD128 |
| 99 | + - /dev/dri/card0:/dev/dri/card0 |
| 100 | + - /dev/dri/renderD129:/dev/dri/renderD129 |
| 101 | + |
| 102 | +Please find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) |
| 103 | + |
| 104 | +## 🚀 Start the MegaService |
| 105 | + |
| 106 | +```bash |
| 107 | +cd GenAIExamples/AudioQnA/docker_compose/amd/gpu/rocm/ |
| 108 | +docker compose up -d |
| 109 | +``` |
| 110 | + |
| 111 | +In following cases, you could build docker image from source by yourself. |
| 112 | + |
| 113 | +- Failed to download the docker image. |
| 114 | +- If you want to use a specific version of Docker image. |
| 115 | + |
| 116 | +Please refer to 'Build Docker Images' in below. |
| 117 | + |
| 118 | +## 🚀 Consume the AudioQnA Service |
| 119 | + |
| 120 | +Test the AudioQnA megaservice by recording a .wav file, encoding the file into the base64 format, and then sending the |
| 121 | +base64 string to the megaservice endpoint. The megaservice will return a spoken response as a base64 string. To listen |
| 122 | +to the response, decode the base64 string and save it as a .wav file. |
| 123 | + |
| 124 | +```bash |
| 125 | +curl http://${host_ip}:3008/v1/audioqna \ |
| 126 | + -X POST \ |
| 127 | + -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "max_tokens":64}' \ |
| 128 | + -H 'Content-Type: application/json' | sed 's/^"//;s/"$//' | base64 -d > output.wav |
| 129 | +``` |
| 130 | + |
| 131 | +## 🚀 Test MicroServices |
| 132 | + |
| 133 | +```bash |
| 134 | +# whisper service |
| 135 | +curl http://${host_ip}:7066/v1/asr \ |
| 136 | + -X POST \ |
| 137 | + -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \ |
| 138 | + -H 'Content-Type: application/json' |
| 139 | + |
| 140 | +# asr microservice |
| 141 | +curl http://${host_ip}:3001/v1/audio/transcriptions \ |
| 142 | + -X POST \ |
| 143 | + -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \ |
| 144 | + -H 'Content-Type: application/json' |
| 145 | + |
| 146 | +# tgi service |
| 147 | +curl http://${host_ip}:3006/generate \ |
| 148 | + -X POST \ |
| 149 | + -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ |
| 150 | + -H 'Content-Type: application/json' |
| 151 | + |
| 152 | +# llm microservice |
| 153 | +curl http://${host_ip}:3007/v1/chat/completions\ |
| 154 | + -X POST \ |
| 155 | + -d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \ |
| 156 | + -H 'Content-Type: application/json' |
| 157 | + |
| 158 | +# speecht5 service |
| 159 | +curl http://${host_ip}:7055/v1/tts \ |
| 160 | + -X POST \ |
| 161 | + -d '{"text": "Who are you?"}' \ |
| 162 | + -H 'Content-Type: application/json' |
| 163 | + |
| 164 | +# tts microservice |
| 165 | +curl http://${host_ip}:3002/v1/audio/speech \ |
| 166 | + -X POST \ |
| 167 | + -d '{"text": "Who are you?"}' \ |
| 168 | + -H 'Content-Type: application/json' |
| 169 | + |
| 170 | +``` |
0 commit comments