You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expand all lines: EdgeCraftRAG/docker_compose/intel/gpu/arc/README.md
+69-34Lines changed: 69 additions & 34 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,19 +12,19 @@ This section describes how to quickly deploy and test the EdgeCraftRAG service m
12
12
13
13
1.[Prerequisites](#prerequisites)
14
14
2.[Access the Code](#access-the-code)
15
-
3.[Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
16
-
4.[Configure the Deployment Environment](#configure-the-deployment-environment)
17
-
5.[Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose)
18
-
6.[Check the Deployment Status](#check-the-deployment-status)
19
-
7.[Test the Pipeline](#test-the-pipeline)
15
+
3.[Prepare models](#prepare-models)
16
+
4.[Prepare env variables and configurations](#prepare-env-variables-and-configurations)
17
+
5.[Configure the Deployment Environment](#configure-the-deployment-environment)
18
+
6.[Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose)
19
+
7.[Access UI](#access-ui)
20
20
8.[Cleanup the Deployment](#cleanup-the-deployment)
21
21
22
22
### Prerequisites
23
23
24
24
EC-RAG supports vLLM deployment(default method) and local OpenVINO deployment for Intel Arc GPU. Prerequisites are shown as below:
25
25
Hardware: Intel Arc A770
26
26
OS: Ubuntu Server 22.04.1 or newer (at least 6.2 LTS kernel)
27
-
Driver & libraries: please refer to [Installing Client GPUs](https://dgpu-docs.intel.com/driver/client/overview.html) for detailed driver & libraries setup
27
+
Driver & libraries: please to [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers) for detailed driver & libraries setup
28
28
29
29
Below steps are based on **vLLM** as inference engine, if you want to choose **OpenVINO**, please refer to [OpenVINO Local Inference](../../../../docs/Advanced_Setup.md#openvino-local-inference)
30
30
@@ -34,7 +34,7 @@ Clone the GenAIExample repository and access the EdgeCraftRAG Intel® Arc® plat
cd GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc/
37
+
cd GenAIExamples/EdgeCraftRAG
38
38
```
39
39
40
40
Checkout a released version, such as v1.3:
@@ -43,60 +43,95 @@ Checkout a released version, such as v1.3:
43
43
git checkout v1.3
44
44
```
45
45
46
-
### Generate a HuggingFace Access Token
46
+
### Prepare models
47
47
48
-
Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
48
+
```bash
49
+
# Prepare models for embedding, reranking:
50
+
export MODEL_PATH="${PWD}/models"# Your model path for embedding, reranking and LLM models
Below steps are for single Intel Arc GPU inference, if you want to setup multi Intel Arc GPUs inference, please refer to [Multi-ARC Setup](../../../../docs/Advanced_Setup.md#multi-arc-setup)
53
-
To set up environment variables for deploying EdgeCraftRAG service, source the set_env.sh script in this directory:
54
68
55
-
```
56
-
source set_env.sh
69
+
#### Prepare env variables for vLLM deployment
70
+
71
+
```bash
72
+
ip_address=$(hostname -I | awk '{print $1}')
73
+
# Use `ip a` to check your active ip
74
+
export HOST_IP=$ip_address# Your host ip
75
+
76
+
# Check group id of video and render
77
+
export VIDEOGROUPID=$(getent group video | cut -d: -f3)
78
+
export RENDERGROUPID=$(getent group render | cut -d: -f3)
79
+
80
+
# If you have a proxy configured, uncomment below line
# If you have a HF mirror configured, it will be imported to the container
84
+
# export HF_ENDPOINT=https://hf-mirror.com # your HF mirror endpoint"
85
+
86
+
# Make sure all 3 folders have 1000:1000 permission, otherwise
87
+
# chown 1000:1000 ${MODEL_PATH} ${PWD} # the default value of DOC_PATH and TMPFILE_PATH is PWD ,so here we give permission to ${PWD}
88
+
# In addition, also make sure the .cache folder has 1000:1000 permission, otherwise
89
+
# chown 1000:1000 -R $HOME/.cache
57
90
```
58
91
59
92
For more advanced env variables and configurations, please refer to [Prepare env variables for vLLM deployment](../../../../docs/Advanced_Setup.md#prepare-env-variables-for-vllm-deployment)
60
93
61
-
### Deploy the Service Using Docker Compose
62
-
63
-
To deploy the EdgeCraftRAG service, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
94
+
#### Generate nginx config file
64
95
65
96
```bash
66
-
docker compose up -d
97
+
export VLLM_SERVICE_PORT_0=8100 # You can set your own port for vllm service
98
+
# Generate your nginx config file
99
+
# nginx-conf-generator.sh requires 2 parameters: DP_NUM and output filepath
"messages":"What is the test id?","max_tokens":5 }'
88
-
```
122
+
Open your browser, access http://${HOST_IP}:8082
89
123
90
-
For detailed operations on UI and EC-RAG settings, please refer to [Explore_Edge_Craft_RAG](../../../../docs/Explore_Edge_Craft_RAG.md)
124
+
> Your browser should be running on the same host of your console, otherwise you will need to access UI with your host domain name instead of ${HOST_IP}.
91
125
92
-
**Note** The value of _host_ip_ was set using the _set_env.sh_ script and can be found in the _.env_ file.
126
+
Below is the UI front page, for detailed operations on UI and EC-RAG settings, please refer to [Explore_Edge_Craft_RAG](../../../../docs/Explore_Edge_Craft_RAG.md)
curl -X POST http://${HOST_IP}:16010/v1/knowledge/default_kb/files -H "Content-Type: application/json" -d '{"local_path": "docs/#REPLACE WITH YOUR DIR WITHIN MOUNTED DOC PATH#"}'| jq '.'
136
+
```
137
+
138
+
### Delete file to knowledge base
139
+
140
+
```bash
141
+
curl -X DELETE http://${HOST_IP}:16010/v1/knowledge/default_kb/files -H "Content-Type: application/json" -d '{"local_path": "docs/#REPLACE WITH YOUR DIR WITHIN MOUNTED DOC PATH#"}'| jq '.'
# By default, the ports of the containers are set, uncomment if you want to change
96
96
# export MEGA_SERVICE_PORT=16011
97
-
# export PIPELINE_SERVICE_PORT=16011
97
+
# export PIPELINE_SERVICE_PORT=16010
98
98
# export UI_SERVICE_PORT="8082"
99
99
100
100
# Make sure all 3 folders have 1000:1000 permission, otherwise
@@ -111,6 +111,12 @@ export MILVUS_ENABLED=0
111
111
# If you enable Milvus, the default storage path is PWD, uncomment if you want to change:
112
112
# export DOCKER_VOLUME_DIRECTORY= # change to your preference
113
113
114
+
# EC-RAG support chat history round setting, by default chat history is disabled, you can set CHAT_HISTORY_ROUND to control it
115
+
# export CHAT_HISTORY_ROUND= # change to your preference
116
+
117
+
# EC-RAG support pipeline performance benchmark, use ENABLE_BENCHMARK=true/false to turn on/off benchmark
118
+
# export ENABLE_BENCHMARK= # change to your preference
119
+
114
120
# Launch EC-RAG service with compose
115
121
docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
116
122
```
@@ -119,7 +125,7 @@ docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
119
125
120
126
EC-RAG support run inference with multi-ARC in multiple isolated containers
121
127
Docker Images preparation is the same as local inference section, please refer to [Build Docker Images](#1-optional-build-docker-images-for-mega-service-server-and-ui-by-your-own)
122
-
Model preparation is the same as vLLM inference section, please refer to [Prepare models](../README.md#2-prepare-models)
128
+
Model preparation is the same as vLLM inference section, please refer to [Prepare models](../docker_compose/intel/gpu/arc/README.md#2-prepare-models)
123
129
After docker images preparation and model preparation, please follow below steps to run multi-ARC Setup(Below steps show 2 vLLM container(2 DP) with multi Intel Arc GPUs):
### 3. Start Edge Craft RAG Services with Docker Compose
191
197
192
-
This section is the same as default vLLM inference section, please refer to [Start Edge Craft RAG Services with Docker Compose](../README.md#4-start-edge-craft-rag-services-with-docker-compose)
198
+
This section is the same as default vLLM inference section, please refer to [Start Edge Craft RAG Services with Docker Compose](../docker_compose/intel/gpu/arc/README.md#deploy-the-service-using-docker-compose)
Then follow the pipeline create guide in UI to set your pipeline, please note that in `Indexer Type` you can set MilvusVector as indexer(Please make sure Milvus is enabled before set MilvusVector as indexer, you can refer to [Enable Milvus](../README.md#4-start-edge-craft-rag-services-with-docker-compose)).
10
+
Then follow the pipeline create guide in UI to set your pipeline, please note that in `Indexer Type` you can set MilvusVector as indexer(Please make sure Milvus is enabled before set MilvusVector as indexer, you can refer to [Enable Milvus](../docker_compose/intel/gpu/arc/README.md#deploy-the-service-using-docker-compose)).
11
11
if choosing MilvusVector, you need to verify vector uri first, please input 'Your_IP:milvus_port' then click `Test` button. Note that milvus_port is 19530
0 commit comments