Skip to content

Commit 95d0268

Browse files
authored
EC-RAG bug fix for users (#2164)
Signed-off-by: Yongbozzz <[email protected]>
1 parent a89c4a4 commit 95d0268

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1152
-301
lines changed

EdgeCraftRAG/docker_compose/intel/gpu/arc/README.md

100644100755
Lines changed: 69 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,19 @@ This section describes how to quickly deploy and test the EdgeCraftRAG service m
1212

1313
1. [Prerequisites](#prerequisites)
1414
2. [Access the Code](#access-the-code)
15-
3. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
16-
4. [Configure the Deployment Environment](#configure-the-deployment-environment)
17-
5. [Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose)
18-
6. [Check the Deployment Status](#check-the-deployment-status)
19-
7. [Test the Pipeline](#test-the-pipeline)
15+
3. [Prepare models](#prepare-models)
16+
4. [Prepare env variables and configurations](#prepare-env-variables-and-configurations)
17+
5. [Configure the Deployment Environment](#configure-the-deployment-environment)
18+
6. [Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose)
19+
7. [Access UI](#access-ui)
2020
8. [Cleanup the Deployment](#cleanup-the-deployment)
2121

2222
### Prerequisites
2323

2424
EC-RAG supports vLLM deployment(default method) and local OpenVINO deployment for Intel Arc GPU. Prerequisites are shown as below:
2525
Hardware: Intel Arc A770
2626
OS: Ubuntu Server 22.04.1 or newer (at least 6.2 LTS kernel)
27-
Driver & libraries: please refer to [Installing Client GPUs](https://dgpu-docs.intel.com/driver/client/overview.html) for detailed driver & libraries setup
27+
Driver & libraries: please to [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers) for detailed driver & libraries setup
2828

2929
Below steps are based on **vLLM** as inference engine, if you want to choose **OpenVINO**, please refer to [OpenVINO Local Inference](../../../../docs/Advanced_Setup.md#openvino-local-inference)
3030

@@ -34,7 +34,7 @@ Clone the GenAIExample repository and access the EdgeCraftRAG Intel® Arc® plat
3434

3535
```
3636
git clone https://github.com/opea-project/GenAIExamples.git
37-
cd GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc/
37+
cd GenAIExamples/EdgeCraftRAG
3838
```
3939

4040
Checkout a released version, such as v1.3:
@@ -43,60 +43,95 @@ Checkout a released version, such as v1.3:
4343
git checkout v1.3
4444
```
4545

46-
### Generate a HuggingFace Access Token
46+
### Prepare models
4747

48-
Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
48+
```bash
49+
# Prepare models for embedding, reranking:
50+
export MODEL_PATH="${PWD}/models" # Your model path for embedding, reranking and LLM models
51+
mkdir -p $MODEL_PATH
52+
pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
53+
optimum-cli export openvino -m BAAI/bge-small-en-v1.5 ${MODEL_PATH}/BAAI/bge-small-en-v1.5 --task sentence-similarity
54+
optimum-cli export openvino -m BAAI/bge-reranker-large ${MODEL_PATH}/BAAI/bge-reranker-large --task text-classification
55+
56+
# Prepare LLM model
57+
export LLM_MODEL="Qwen/Qwen3-8B" # Your model id
58+
pip install modelscope
59+
modelscope download --model $LLM_MODEL --local_dir "${MODEL_PATH}/${LLM_MODEL}"
60+
# Optionally, you can also download models with huggingface:
61+
# pip install -U huggingface_hub
62+
# huggingface-cli download $LLM_MODEL --local-dir "${MODEL_PATH}/${LLM_MODEL}"
63+
```
4964

50-
### Configure the Deployment Environment
65+
### Prepare env variables and configurations
5166

5267
Below steps are for single Intel Arc GPU inference, if you want to setup multi Intel Arc GPUs inference, please refer to [Multi-ARC Setup](../../../../docs/Advanced_Setup.md#multi-arc-setup)
53-
To set up environment variables for deploying EdgeCraftRAG service, source the set_env.sh script in this directory:
5468

55-
```
56-
source set_env.sh
69+
#### Prepare env variables for vLLM deployment
70+
71+
```bash
72+
ip_address=$(hostname -I | awk '{print $1}')
73+
# Use `ip a` to check your active ip
74+
export HOST_IP=$ip_address # Your host ip
75+
76+
# Check group id of video and render
77+
export VIDEOGROUPID=$(getent group video | cut -d: -f3)
78+
export RENDERGROUPID=$(getent group render | cut -d: -f3)
79+
80+
# If you have a proxy configured, uncomment below line
81+
# export no_proxy=${no_proxy},${HOST_IP},edgecraftrag,edgecraftrag-server
82+
# export NO_PROXY=${NO_PROXY},${HOST_IP},edgecraftrag,edgecraftrag-server
83+
# If you have a HF mirror configured, it will be imported to the container
84+
# export HF_ENDPOINT=https://hf-mirror.com # your HF mirror endpoint"
85+
86+
# Make sure all 3 folders have 1000:1000 permission, otherwise
87+
# chown 1000:1000 ${MODEL_PATH} ${PWD} # the default value of DOC_PATH and TMPFILE_PATH is PWD ,so here we give permission to ${PWD}
88+
# In addition, also make sure the .cache folder has 1000:1000 permission, otherwise
89+
# chown 1000:1000 -R $HOME/.cache
5790
```
5891

5992
For more advanced env variables and configurations, please refer to [Prepare env variables for vLLM deployment](../../../../docs/Advanced_Setup.md#prepare-env-variables-for-vllm-deployment)
6093

61-
### Deploy the Service Using Docker Compose
62-
63-
To deploy the EdgeCraftRAG service, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
94+
#### Generate nginx config file
6495

6596
```bash
66-
docker compose up -d
97+
export VLLM_SERVICE_PORT_0=8100 # You can set your own port for vllm service
98+
# Generate your nginx config file
99+
# nginx-conf-generator.sh requires 2 parameters: DP_NUM and output filepath
100+
bash nginx/nginx-conf-generator.sh 1 nginx/nginx.conf
101+
# set NGINX_CONFIG_PATH
102+
export NGINX_CONFIG_PATH="${PWD}/nginx/nginx.conf"
67103
```
68104

69-
The EdgeCraftRAG docker images should automatically be downloaded from the `OPEA registry` and deployed on the Intel® Arc® Platform
105+
### Deploy the Service Using Docker Compose
70106

71-
### Check the Deployment Status
107+
```bash
108+
# EC-RAG support Milvus as persistent database, by default milvus is disabled, you can choose to set MILVUS_ENABLED=1 to enable it
109+
export MILVUS_ENABLED=0
110+
# If you enable Milvus, the default storage path is PWD, uncomment if you want to change:
111+
# export DOCKER_VOLUME_DIRECTORY= # change to your preference
72112

73-
After running docker compose, check if all the containers launched via docker compose have started:
113+
# EC-RAG support chat history round setting, by default chat history is disabled, you can set CHAT_HISTORY_ROUND to control it
114+
# export CHAT_HISTORY_ROUND= # change to your preference
74115

116+
# Launch EC-RAG service with compose
117+
docker compose -f docker_compose/intel/gpu/arc/compose_vllm.yaml up -d
75118
```
76-
docker ps -a
77-
```
78-
79-
For the default deployment, the following 5 containers should be running:
80119

81-
### Test the Pipeline
120+
### Access UI
82121

83-
Once the EdgeCraftRAG service are running, test the pipeline using the following command:
84-
85-
```bash
86-
curl http://${host_ip}:16011/v1/chatqna -H 'Content-Type: application/json' -d '{
87-
"messages":"What is the test id?","max_tokens":5 }'
88-
```
122+
Open your browser, access http://${HOST_IP}:8082
89123

90-
For detailed operations on UI and EC-RAG settings, please refer to [Explore_Edge_Craft_RAG](../../../../docs/Explore_Edge_Craft_RAG.md)
124+
> Your browser should be running on the same host of your console, otherwise you will need to access UI with your host domain name instead of ${HOST_IP}.
91125
92-
**Note** The value of _host_ip_ was set using the _set_env.sh_ script and can be found in the _.env_ file.
126+
Below is the UI front page, for detailed operations on UI and EC-RAG settings, please refer to [Explore_Edge_Craft_RAG](../../../../docs/Explore_Edge_Craft_RAG.md)
127+
![front_page](../../../../assets/img/front_page.png)
93128

94129
### Cleanup the Deployment
95130

96131
To stop the containers associated with the deployment, execute the following command:
97132

98133
```
99-
docker compose -f compose.yaml down
134+
docker compose -f docker_compose/intel/gpu/arc/compose_vllm.yaml down
100135
```
101136

102137
All the EdgeCraftRAG containers will be stopped and then removed on completion of the "down" command.

EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,8 +162,8 @@ services:
162162
SERVED_MODEL_NAME: ${LLM_MODEL}
163163
TENSOR_PARALLEL_SIZE: ${TENSOR_PARALLEL_SIZE:-1}
164164
MAX_NUM_SEQS: ${MAX_NUM_SEQS:-64}
165-
MAX_NUM_BATCHED_TOKENS: ${MAX_NUM_BATCHED_TOKENS:-5000}
166-
MAX_MODEL_LEN: ${MAX_MODEL_LEN:-5000}
165+
MAX_NUM_BATCHED_TOKENS: ${MAX_NUM_BATCHED_TOKENS:-10240}
166+
MAX_MODEL_LEN: ${MAX_MODEL_LEN:-10240}
167167
LOAD_IN_LOW_BIT: ${LOAD_IN_LOW_BIT:-fp8}
168168
CCL_DG2_USM: ${CCL_DG2_USM:-""}
169169
PORT: ${VLLM_SERVICE_PORT_0:-8100}

EdgeCraftRAG/docker_compose/intel/gpu/arc/multi-arc-yaml-generator.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -181,8 +181,8 @@ for ((i = 0; i < PORT_NUM; i++)); do
181181
SERVED_MODEL_NAME: \${LLM_MODEL}
182182
TENSOR_PARALLEL_SIZE: \${TENSOR_PARALLEL_SIZE:-1}
183183
MAX_NUM_SEQS: \${MAX_NUM_SEQS:-64}
184-
MAX_NUM_BATCHED_TOKENS: \${MAX_NUM_BATCHED_TOKENS:-5000}
185-
MAX_MODEL_LEN: \${MAX_MODEL_LEN:-5000}
184+
MAX_NUM_BATCHED_TOKENS: \${MAX_NUM_BATCHED_TOKENS:-10240}
185+
MAX_MODEL_LEN: \${MAX_MODEL_LEN:-10240}
186186
LOAD_IN_LOW_BIT: \${LOAD_IN_LOW_BIT:-fp8}
187187
CCL_DG2_USM: \${CCL_DG2_USM:-""}
188188
PORT: \${VLLM_SERVICE_PORT_$i:-8$((i+1))00}

EdgeCraftRAG/docs/API_Guide.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,50 @@ curl -X GET http://${HOST_IP}:16010/v1/settings/models/BAAI/bge-reranker-large -
9797
curl -X DELETE http://${HOST_IP}:16010/v1/settings/models/BAAI/bge-reranker-large -H "Content-Type: application/json" | jq '.'
9898
```
9999

100+
## Knowledge Base Management
101+
102+
### Create a knowledge base
103+
104+
```bash
105+
curl -X POST http://${HOST_IP}:16010/v1/knowledge -H "Content-Type: application/json" -d '{"name": "default_kb","description": "Your knowledge base Description","active":true}' | jq '.'
106+
```
107+
108+
### Update a knowledge base
109+
110+
```bash
111+
curl -X PATCH http://${HOST_IP}:16010/v1/knowledge/patch -H "Content-Type: application/json" -d '{"name": "default_kb","active":"True","description": "Your knowledge base Description","active":"True"}' | jq '.'
112+
```
113+
114+
### Check all knowledge base
115+
116+
```bash
117+
curl -X GET http://${HOST_IP}:16010/v1/knowledge -H "Content-Type: application/json" | jq '.'
118+
```
119+
120+
### Activate knowledge base
121+
122+
```bash
123+
curl -X PATCH http://${HOST_IP}:16010/v1/knowledge/patch -H "Content-Type: application/json" -d '{"name": "default_kb","active":true}' | jq '.'
124+
```
125+
126+
### Remove a knowledge base
127+
128+
```bash
129+
curl -X DELETE http://${HOST_IP}:16010/v1/knowledge/default_kb -H "Content-Type: application/json" | jq '.'
130+
```
131+
132+
### Add file to knowledge base
133+
134+
```bash
135+
curl -X POST http://${HOST_IP}:16010/v1/knowledge/default_kb/files -H "Content-Type: application/json" -d '{"local_path": "docs/#REPLACE WITH YOUR DIR WITHIN MOUNTED DOC PATH#"}' | jq '.'
136+
```
137+
138+
### Delete file to knowledge base
139+
140+
```bash
141+
curl -X DELETE http://${HOST_IP}:16010/v1/knowledge/default_kb/files -H "Content-Type: application/json" -d '{"local_path": "docs/#REPLACE WITH YOUR DIR WITHIN MOUNTED DOC PATH#"}' | jq '.'
142+
```
143+
100144
## File Management
101145

102146
### Add a text

EdgeCraftRAG/docs/Advanced_Setup.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ export RENDERGROUPID=$(getent group render | cut -d: -f3)
9494

9595
# By default, the ports of the containers are set, uncomment if you want to change
9696
# export MEGA_SERVICE_PORT=16011
97-
# export PIPELINE_SERVICE_PORT=16011
97+
# export PIPELINE_SERVICE_PORT=16010
9898
# export UI_SERVICE_PORT="8082"
9999

100100
# Make sure all 3 folders have 1000:1000 permission, otherwise
@@ -111,6 +111,12 @@ export MILVUS_ENABLED=0
111111
# If you enable Milvus, the default storage path is PWD, uncomment if you want to change:
112112
# export DOCKER_VOLUME_DIRECTORY= # change to your preference
113113

114+
# EC-RAG support chat history round setting, by default chat history is disabled, you can set CHAT_HISTORY_ROUND to control it
115+
# export CHAT_HISTORY_ROUND= # change to your preference
116+
117+
# EC-RAG support pipeline performance benchmark, use ENABLE_BENCHMARK=true/false to turn on/off benchmark
118+
# export ENABLE_BENCHMARK= # change to your preference
119+
114120
# Launch EC-RAG service with compose
115121
docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
116122
```
@@ -119,7 +125,7 @@ docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
119125

120126
EC-RAG support run inference with multi-ARC in multiple isolated containers
121127
Docker Images preparation is the same as local inference section, please refer to [Build Docker Images](#1-optional-build-docker-images-for-mega-service-server-and-ui-by-your-own)
122-
Model preparation is the same as vLLM inference section, please refer to [Prepare models](../README.md#2-prepare-models)
128+
Model preparation is the same as vLLM inference section, please refer to [Prepare models](../docker_compose/intel/gpu/arc/README.md#2-prepare-models)
123129
After docker images preparation and model preparation, please follow below steps to run multi-ARC Setup(Below steps show 2 vLLM container(2 DP) with multi Intel Arc GPUs):
124130

125131
### 1. Prepare env variables and configurations
@@ -148,7 +154,7 @@ export RENDERGROUPID=$(getent group render | cut -d: -f3)
148154

149155
# By default, the ports of the containers are set, uncomment if you want to change
150156
# export MEGA_SERVICE_PORT=16011
151-
# export PIPELINE_SERVICE_PORT=16011
157+
# export PIPELINE_SERVICE_PORT=16010
152158
# export UI_SERVICE_PORT="8082"
153159

154160
# Make sure all 3 folders have 1000:1000 permission, otherwise
@@ -167,8 +173,8 @@ export SELECTED_XPU_1=1 # Which GPU to select to run for container 1
167173

168174
# Below are the extra env you can set for vllm
169175
export MAX_NUM_SEQS=64 # MAX_NUM_SEQS value
170-
export MAX_NUM_BATCHED_TOKENS=4000 # MAX_NUM_BATCHED_TOKENS value
171-
export MAX_MODEL_LEN=3000 # MAX_MODEL_LEN value
176+
export MAX_NUM_BATCHED_TOKENS=5000 # MAX_NUM_BATCHED_TOKENS value
177+
export MAX_MODEL_LEN=5000 # MAX_MODEL_LEN value
172178
export LOAD_IN_LOW_BIT=fp8 # the weight type value, expected: sym_int4, asym_int4, sym_int5, asym_int5 or sym_int8
173179
export CCL_DG2_USM="" # Need to set to 1 on Core to enable USM (Shared Memory GPUDirect). Xeon supports P2P and doesn't need this.
174180
```
@@ -189,4 +195,4 @@ bash docker_compose/intel/gpu/arc/multi-arc-yaml-generator.sh $DP_NUM docker_com
189195

190196
### 3. Start Edge Craft RAG Services with Docker Compose
191197

192-
This section is the same as default vLLM inference section, please refer to [Start Edge Craft RAG Services with Docker Compose](../README.md#4-start-edge-craft-rag-services-with-docker-compose)
198+
This section is the same as default vLLM inference section, please refer to [Start Edge Craft RAG Services with Docker Compose](../docker_compose/intel/gpu/arc/README.md#deploy-the-service-using-docker-compose)

EdgeCraftRAG/docs/Explore_Edge_Craft_RAG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
To create a default pipeline, you need to click the `Create Pipeline` button in the `Pipeline Setting` page.
88
![create_pipeline](../assets/img/create_pipeline.png)
99

10-
Then follow the pipeline create guide in UI to set your pipeline, please note that in `Indexer Type` you can set MilvusVector as indexer(Please make sure Milvus is enabled before set MilvusVector as indexer, you can refer to [Enable Milvus](../README.md#4-start-edge-craft-rag-services-with-docker-compose)).
10+
Then follow the pipeline create guide in UI to set your pipeline, please note that in `Indexer Type` you can set MilvusVector as indexer(Please make sure Milvus is enabled before set MilvusVector as indexer, you can refer to [Enable Milvus](../docker_compose/intel/gpu/arc/README.md#deploy-the-service-using-docker-compose)).
1111
if choosing MilvusVector, you need to verify vector uri first, please input 'Your_IP:milvus_port' then click `Test` button. Note that milvus_port is 19530
1212
![milvus](../assets/img/milvus.png)
1313

EdgeCraftRAG/edgecraftrag/api/v1/data.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from edgecraftrag.api_schema import DataIn, FilesIn
77
from edgecraftrag.context import ctx
88
from fastapi import FastAPI, File, HTTPException, UploadFile, status
9+
from werkzeug.utils import secure_filename
910

1011
data_app = FastAPI()
1112

@@ -103,9 +104,19 @@ async def upload_file(file_name: str, file: UploadFile = File(...)):
103104
try:
104105
# DIR for server to save files uploaded by UI
105106
UI_DIRECTORY = os.getenv("TMPFILE_PATH", "/home/user/ui_cache")
106-
UPLOAD_DIRECTORY = os.path.join(UI_DIRECTORY, file_name)
107+
UPLOAD_DIRECTORY = os.path.normpath(os.path.join(UI_DIRECTORY, file_name))
108+
if not UPLOAD_DIRECTORY.startswith(os.path.abspath(UI_DIRECTORY) + os.sep):
109+
raise HTTPException(
110+
status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid file_name: directory traversal detected"
111+
)
107112
os.makedirs(UPLOAD_DIRECTORY, exist_ok=True)
108-
file_path = os.path.join(UPLOAD_DIRECTORY, file.filename)
113+
safe_filename = secure_filename(file.filename)
114+
# Sanitize the uploaded file's name
115+
safe_filename = secure_filename(file.filename)
116+
file_path = os.path.normpath(os.path.join(UPLOAD_DIRECTORY, safe_filename))
117+
# Ensure file_path is within UPLOAD_DIRECTORY
118+
if not file_path.startswith(os.path.abspath(UPLOAD_DIRECTORY)):
119+
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid uploaded file name")
109120
with open(file_path, "wb") as buffer:
110121
buffer.write(await file.read())
111122
return file_path

0 commit comments

Comments
 (0)