opea-project
diff --git a/‎EdgeCraftRAG/docker_compose/intel/gpu/arc/README.md‎
100644100755
Lines changed: 69 additions & 34 deletions b/‎EdgeCraftRAG/docker_compose/intel/gpu/arc/README.md‎
100644100755
Lines changed: 69 additions & 34 deletions
diff --git a/‎EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml‎
Lines changed: 2 additions & 2 deletions b/‎EdgeCraftRAG/docker_compose/intel/gpu/arc/compose_vllm.yaml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎EdgeCraftRAG/docker_compose/intel/gpu/arc/multi-arc-yaml-generator.sh‎
Lines changed: 2 additions & 2 deletions b/‎EdgeCraftRAG/docker_compose/intel/gpu/arc/multi-arc-yaml-generator.sh‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎EdgeCraftRAG/docs/API_Guide.md‎
Lines changed: 44 additions & 0 deletions b/‎EdgeCraftRAG/docs/API_Guide.md‎
Lines changed: 44 additions & 0 deletions
diff --git a/‎EdgeCraftRAG/docs/Advanced_Setup.md‎
Lines changed: 12 additions & 6 deletions b/‎EdgeCraftRAG/docs/Advanced_Setup.md‎
Lines changed: 12 additions & 6 deletions
diff --git a/‎EdgeCraftRAG/docs/Explore_Edge_Craft_RAG.md‎
Lines changed: 1 addition & 1 deletion b/‎EdgeCraftRAG/docs/Explore_Edge_Craft_RAG.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎EdgeCraftRAG/edgecraftrag/api/v1/data.py‎
Lines changed: 13 additions & 2 deletions b/‎EdgeCraftRAG/edgecraftrag/api/v1/data.py‎
Lines changed: 13 additions & 2 deletions
@@ -12,19 +12,19 @@ This section describes how to quickly deploy and test the EdgeCraftRAG service m
 
 1. [Prerequisites](#prerequisites)
 2. [Access the Code](#access-the-code)
-3. [Generate a HuggingFace Access Token](#generate-a-huggingface-access-token)
-4. [Configure the Deployment Environment](#configure-the-deployment-environment)
-5. [Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose)
-6. [Check the Deployment Status](#check-the-deployment-status)
-7. [Test the Pipeline](#test-the-pipeline)
+3. [Prepare models](#prepare-models)
+4. [Prepare env variables and configurations](#prepare-env-variables-and-configurations)
+5. [Configure the Deployment Environment](#configure-the-deployment-environment)
+6. [Deploy the Service Using Docker Compose](#deploy-the-service-using-docker-compose)
+7. [Access UI](#access-ui)
 8. [Cleanup the Deployment](#cleanup-the-deployment)
 
 ### Prerequisites
 
 EC-RAG supports vLLM deployment(default method) and local OpenVINO deployment for Intel Arc GPU. Prerequisites are shown as below:  
 Hardware: Intel Arc A770  
 OS: Ubuntu Server 22.04.1 or newer (at least 6.2 LTS kernel)  
-Driver & libraries: please refer to [Installing Client GPUs](https://dgpu-docs.intel.com/driver/client/overview.html) for detailed driver & libraries setup
+Driver & libraries: please to [Installing GPUs Drivers](https://dgpu-docs.intel.com/driver/installation-rolling.html#installing-gpu-drivers) for detailed driver & libraries setup
 
 Below steps are based on **vLLM** as inference engine, if you want to choose **OpenVINO**, please refer to [OpenVINO Local Inference](../../../../docs/Advanced_Setup.md#openvino-local-inference)
 
@@ -34,7 +34,7 @@ Clone the GenAIExample repository and access the EdgeCraftRAG Intel® Arc® plat
 
 ```
 git clone https://github.com/opea-project/GenAIExamples.git
-cd GenAIExamples/EdgeCraftRAG/docker_compose/intel/gpu/arc/
+cd GenAIExamples/EdgeCraftRAG
 ```
 
 Checkout a released version, such as v1.3:
@@ -43,60 +43,95 @@ Checkout a released version, such as v1.3:
 git checkout v1.3
 ```
 
-### Generate a HuggingFace Access Token
+### Prepare models
 
-Some HuggingFace resources, such as some models, are only accessible if you have an access token. If you do not already have a HuggingFace access token, you can create one by first creating an account by following the steps provided at [HuggingFace](https://huggingface.co/) and then generating a [user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).
+```bash
+# Prepare models for embedding, reranking:
+export MODEL_PATH="${PWD}/models" # Your model path for embedding, reranking and LLM models
+mkdir -p $MODEL_PATH
+pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
+optimum-cli export openvino -m BAAI/bge-small-en-v1.5 ${MODEL_PATH}/BAAI/bge-small-en-v1.5 --task sentence-similarity
+optimum-cli export openvino -m BAAI/bge-reranker-large ${MODEL_PATH}/BAAI/bge-reranker-large --task text-classification
+
+# Prepare LLM model
+export LLM_MODEL="Qwen/Qwen3-8B" # Your model id
+pip install modelscope
+modelscope download --model $LLM_MODEL --local_dir "${MODEL_PATH}/${LLM_MODEL}"
+# Optionally, you can also download models with huggingface:
+# pip install -U huggingface_hub
+# huggingface-cli download $LLM_MODEL --local-dir "${MODEL_PATH}/${LLM_MODEL}"
+```
 
-### Configure the Deployment Environment
+### Prepare env variables and configurations
 
 Below steps are for single Intel Arc GPU inference, if you want to setup multi Intel Arc GPUs inference, please refer to [Multi-ARC Setup](../../../../docs/Advanced_Setup.md#multi-arc-setup)
-To set up environment variables for deploying EdgeCraftRAG service, source the set_env.sh script in this directory:
 
-```
-source set_env.sh
+#### Prepare env variables for vLLM deployment
+
+```bash
+ip_address=$(hostname -I | awk '{print $1}')
+# Use `ip a` to check your active ip
+export HOST_IP=$ip_address # Your host ip
+
+# Check group id of video and render
+export VIDEOGROUPID=$(getent group video | cut -d: -f3)
+export RENDERGROUPID=$(getent group render | cut -d: -f3)
+
+# If you have a proxy configured, uncomment below line
+# export no_proxy=${no_proxy},${HOST_IP},edgecraftrag,edgecraftrag-server
+# export NO_PROXY=${NO_PROXY},${HOST_IP},edgecraftrag,edgecraftrag-server
+# If you have a HF mirror configured, it will be imported to the container
+# export HF_ENDPOINT=https://hf-mirror.com # your HF mirror endpoint"
+
+# Make sure all 3 folders have 1000:1000 permission, otherwise
+# chown 1000:1000 ${MODEL_PATH} ${PWD} # the default value of DOC_PATH and TMPFILE_PATH is PWD ,so here we give permission to ${PWD}
+# In addition, also make sure the .cache folder has 1000:1000 permission, otherwise
+# chown 1000:1000 -R $HOME/.cache
 ```
 
 For more advanced env variables and configurations, please refer to [Prepare env variables for vLLM deployment](../../../../docs/Advanced_Setup.md#prepare-env-variables-for-vllm-deployment)
 
-### Deploy the Service Using Docker Compose
-
-To deploy the EdgeCraftRAG service, execute the `docker compose up` command with the appropriate arguments. For a default deployment, execute:
+#### Generate nginx config file
 
 ```bash
-docker compose up -d
+export VLLM_SERVICE_PORT_0=8100 # You can set your own port for vllm service
+# Generate your nginx config file
+# nginx-conf-generator.sh requires 2 parameters: DP_NUM and output filepath
+bash nginx/nginx-conf-generator.sh 1 nginx/nginx.conf
+# set NGINX_CONFIG_PATH
+export NGINX_CONFIG_PATH="${PWD}/nginx/nginx.conf"
 ```
 
-The EdgeCraftRAG docker images should automatically be downloaded from the `OPEA registry` and deployed on the Intel® Arc® Platform
+### Deploy the Service Using Docker Compose
 
-### Check the Deployment Status
+```bash
+# EC-RAG support Milvus as persistent database, by default milvus is disabled, you can choose to set MILVUS_ENABLED=1 to enable it
+export MILVUS_ENABLED=0
+# If you enable Milvus, the default storage path is PWD, uncomment if you want to change:
+# export DOCKER_VOLUME_DIRECTORY= # change to your preference
 
-After running docker compose, check if all the containers launched via docker compose have started:
+# EC-RAG support chat history round setting, by default chat history is disabled, you can set CHAT_HISTORY_ROUND to control it
+# export CHAT_HISTORY_ROUND= # change to your preference
 
+# Launch EC-RAG service with compose
+docker compose -f docker_compose/intel/gpu/arc/compose_vllm.yaml up -d
 ```
-docker ps -a
-```
-
-For the default deployment, the following 5 containers should be running:
 
-### Test the Pipeline
+### Access UI
 
-Once the EdgeCraftRAG service are running, test the pipeline using the following command:
-
-```bash
-curl http://${host_ip}:16011/v1/chatqna -H 'Content-Type: application/json' -d '{
-     "messages":"What is the test id?","max_tokens":5 }'
-```
+Open your browser, access http://${HOST_IP}:8082
 
-For detailed operations on UI and EC-RAG settings, please refer to [Explore_Edge_Craft_RAG](../../../../docs/Explore_Edge_Craft_RAG.md)
+> Your browser should be running on the same host of your console, otherwise you will need to access UI with your host domain name instead of ${HOST_IP}.
 
-**Note** The value of _host_ip_ was set using the _set_env.sh_ script and can be found in the _.env_ file.
+Below is the UI front page, for detailed operations on UI and EC-RAG settings, please refer to [Explore_Edge_Craft_RAG](../../../../docs/Explore_Edge_Craft_RAG.md)
+![front_page](../../../../assets/img/front_page.png)
 
 ### Cleanup the Deployment
 
 To stop the containers associated with the deployment, execute the following command:
 
 ```
-docker compose -f compose.yaml down
+docker compose -f docker_compose/intel/gpu/arc/compose_vllm.yaml down
 ```
 
 All the EdgeCraftRAG containers will be stopped and then removed on completion of the "down" command.
 
@@ -162,8 +162,8 @@ services:
       SERVED_MODEL_NAME: ${LLM_MODEL}
       TENSOR_PARALLEL_SIZE: ${TENSOR_PARALLEL_SIZE:-1}
       MAX_NUM_SEQS: ${MAX_NUM_SEQS:-64}
-      MAX_NUM_BATCHED_TOKENS: ${MAX_NUM_BATCHED_TOKENS:-5000}
-      MAX_MODEL_LEN: ${MAX_MODEL_LEN:-5000}
+      MAX_NUM_BATCHED_TOKENS: ${MAX_NUM_BATCHED_TOKENS:-10240}
+      MAX_MODEL_LEN: ${MAX_MODEL_LEN:-10240}
       LOAD_IN_LOW_BIT: ${LOAD_IN_LOW_BIT:-fp8}
       CCL_DG2_USM: ${CCL_DG2_USM:-""}
       PORT: ${VLLM_SERVICE_PORT_0:-8100}
 
@@ -181,8 +181,8 @@ for ((i = 0; i < PORT_NUM; i++)); do
       SERVED_MODEL_NAME: \${LLM_MODEL}
       TENSOR_PARALLEL_SIZE: \${TENSOR_PARALLEL_SIZE:-1}
       MAX_NUM_SEQS: \${MAX_NUM_SEQS:-64}
-      MAX_NUM_BATCHED_TOKENS: \${MAX_NUM_BATCHED_TOKENS:-5000}
-      MAX_MODEL_LEN: \${MAX_MODEL_LEN:-5000}
+      MAX_NUM_BATCHED_TOKENS: \${MAX_NUM_BATCHED_TOKENS:-10240}
+      MAX_MODEL_LEN: \${MAX_MODEL_LEN:-10240}
       LOAD_IN_LOW_BIT: \${LOAD_IN_LOW_BIT:-fp8}
       CCL_DG2_USM: \${CCL_DG2_USM:-""}
       PORT: \${VLLM_SERVICE_PORT_$i:-8$((i+1))00}
 
@@ -97,6 +97,50 @@ curl -X GET http://${HOST_IP}:16010/v1/settings/models/BAAI/bge-reranker-large -
 curl -X DELETE http://${HOST_IP}:16010/v1/settings/models/BAAI/bge-reranker-large -H "Content-Type: application/json" | jq '.'
 ```
 
+## Knowledge Base Management
+
+### Create a knowledge base
+
+```bash
+curl -X POST http://${HOST_IP}:16010/v1/knowledge -H "Content-Type: application/json" -d '{"name": "default_kb","description": "Your knowledge base Description","active":true}' | jq '.'
+```
+
+### Update a knowledge base
+
+```bash
+curl -X PATCH http://${HOST_IP}:16010/v1/knowledge/patch -H "Content-Type: application/json" -d '{"name": "default_kb","active":"True","description": "Your knowledge base Description","active":"True"}' | jq '.'
+```
+
+### Check all knowledge base
+
+```bash
+curl -X GET http://${HOST_IP}:16010/v1/knowledge -H "Content-Type: application/json" | jq '.'
+```
+
+### Activate knowledge base
+
+```bash
+curl -X PATCH http://${HOST_IP}:16010/v1/knowledge/patch -H "Content-Type: application/json" -d '{"name": "default_kb","active":true}' | jq '.'
+```
+
+### Remove a knowledge base
+
+```bash
+curl -X DELETE http://${HOST_IP}:16010/v1/knowledge/default_kb -H "Content-Type: application/json" | jq '.'
+```
+
+### Add file to knowledge base
+
+```bash
+curl -X POST http://${HOST_IP}:16010/v1/knowledge/default_kb/files -H "Content-Type: application/json" -d '{"local_path": "docs/#REPLACE WITH YOUR DIR WITHIN MOUNTED DOC PATH#"}' | jq '.'
+```
+
+### Delete file to knowledge base
+
+```bash
+curl -X DELETE http://${HOST_IP}:16010/v1/knowledge/default_kb/files -H "Content-Type: application/json" -d '{"local_path": "docs/#REPLACE WITH YOUR DIR WITHIN MOUNTED DOC PATH#"}' | jq '.'
+```
+
 ## File Management
 
 ### Add a text
 
@@ -94,7 +94,7 @@ export RENDERGROUPID=$(getent group render | cut -d: -f3)
 
 # By default, the ports of the containers are set, uncomment if you want to change
 # export MEGA_SERVICE_PORT=16011
-# export PIPELINE_SERVICE_PORT=16011
+# export PIPELINE_SERVICE_PORT=16010
 # export UI_SERVICE_PORT="8082"
 
 # Make sure all 3 folders have 1000:1000 permission, otherwise
@@ -111,6 +111,12 @@ export MILVUS_ENABLED=0
 # If you enable Milvus, the default storage path is PWD, uncomment if you want to change:
 # export DOCKER_VOLUME_DIRECTORY= # change to your preference
 
+# EC-RAG support chat history round setting, by default chat history is disabled, you can set CHAT_HISTORY_ROUND to control it
+# export CHAT_HISTORY_ROUND= # change to your preference
+
+# EC-RAG support pipeline performance benchmark, use ENABLE_BENCHMARK=true/false to turn on/off benchmark
+# export ENABLE_BENCHMARK= # change to your preference
+
 # Launch EC-RAG service with compose
 docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
 ```
@@ -119,7 +125,7 @@ docker compose -f docker_compose/intel/gpu/arc/compose.yaml up -d
 
 EC-RAG support run inference with multi-ARC in multiple isolated containers
 Docker Images preparation is the same as local inference section, please refer to [Build Docker Images](#1-optional-build-docker-images-for-mega-service-server-and-ui-by-your-own)
-Model preparation is the same as vLLM inference section, please refer to [Prepare models](../README.md#2-prepare-models)
+Model preparation is the same as vLLM inference section, please refer to [Prepare models](../docker_compose/intel/gpu/arc/README.md#2-prepare-models)
 After docker images preparation and model preparation, please follow below steps to run multi-ARC Setup(Below steps show 2 vLLM container(2 DP) with multi Intel Arc GPUs):
 
 ### 1. Prepare env variables and configurations
@@ -148,7 +154,7 @@ export RENDERGROUPID=$(getent group render | cut -d: -f3)
 
 # By default, the ports of the containers are set, uncomment if you want to change
 # export MEGA_SERVICE_PORT=16011
-# export PIPELINE_SERVICE_PORT=16011
+# export PIPELINE_SERVICE_PORT=16010
 # export UI_SERVICE_PORT="8082"
 
 # Make sure all 3 folders have 1000:1000 permission, otherwise
@@ -167,8 +173,8 @@ export SELECTED_XPU_1=1 # Which GPU to select to run for container 1
 
 # Below are the extra env you can set for vllm
 export MAX_NUM_SEQS=64 # MAX_NUM_SEQS value
-export MAX_NUM_BATCHED_TOKENS=4000 # MAX_NUM_BATCHED_TOKENS value
-export MAX_MODEL_LEN=3000 # MAX_MODEL_LEN value
+export MAX_NUM_BATCHED_TOKENS=5000 # MAX_NUM_BATCHED_TOKENS value
+export MAX_MODEL_LEN=5000 # MAX_MODEL_LEN value
 export LOAD_IN_LOW_BIT=fp8 # the weight type value, expected: sym_int4, asym_int4, sym_int5, asym_int5 or sym_int8
 export CCL_DG2_USM="" # Need to set to 1 on Core to enable USM (Shared Memory GPUDirect). Xeon supports P2P and doesn't need this.
 ```
@@ -189,4 +195,4 @@ bash docker_compose/intel/gpu/arc/multi-arc-yaml-generator.sh $DP_NUM docker_com
 
 ### 3. Start Edge Craft RAG Services with Docker Compose
 
-This section is the same as default vLLM inference section, please refer to [Start Edge Craft RAG Services with Docker Compose](../README.md#4-start-edge-craft-rag-services-with-docker-compose)
+This section is the same as default vLLM inference section, please refer to [Start Edge Craft RAG Services with Docker Compose](../docker_compose/intel/gpu/arc/README.md#deploy-the-service-using-docker-compose)
@@ -7,7 +7,7 @@
 To create a default pipeline, you need to click the `Create Pipeline` button in the `Pipeline Setting` page.
 ![create_pipeline](../assets/img/create_pipeline.png)
 
-Then follow the pipeline create guide in UI to set your pipeline, please note that in `Indexer Type` you can set MilvusVector as indexer(Please make sure Milvus is enabled before set MilvusVector as indexer, you can refer to [Enable Milvus](../README.md#4-start-edge-craft-rag-services-with-docker-compose)).  
+Then follow the pipeline create guide in UI to set your pipeline, please note that in `Indexer Type` you can set MilvusVector as indexer(Please make sure Milvus is enabled before set MilvusVector as indexer, you can refer to [Enable Milvus](../docker_compose/intel/gpu/arc/README.md#deploy-the-service-using-docker-compose)).  
 if choosing MilvusVector, you need to verify vector uri first, please input 'Your_IP:milvus_port' then click `Test` button. Note that milvus_port is 19530
 ![milvus](../assets/img/milvus.png)
 
 
@@ -6,6 +6,7 @@
 from edgecraftrag.api_schema import DataIn, FilesIn
 from edgecraftrag.context import ctx
 from fastapi import FastAPI, File, HTTPException, UploadFile, status
+from werkzeug.utils import secure_filename
 
 data_app = FastAPI()
 
@@ -103,9 +104,19 @@ async def upload_file(file_name: str, file: UploadFile = File(...)):
     try:
         # DIR for server to save files uploaded by UI
         UI_DIRECTORY = os.getenv("TMPFILE_PATH", "/home/user/ui_cache")
-        UPLOAD_DIRECTORY = os.path.join(UI_DIRECTORY, file_name)
+        UPLOAD_DIRECTORY = os.path.normpath(os.path.join(UI_DIRECTORY, file_name))
+        if not UPLOAD_DIRECTORY.startswith(os.path.abspath(UI_DIRECTORY) + os.sep):
+            raise HTTPException(
+                status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid file_name: directory traversal detected"
+            )
         os.makedirs(UPLOAD_DIRECTORY, exist_ok=True)
-        file_path = os.path.join(UPLOAD_DIRECTORY, file.filename)
+        safe_filename = secure_filename(file.filename)
+        # Sanitize the uploaded file's name
+        safe_filename = secure_filename(file.filename)
+        file_path = os.path.normpath(os.path.join(UPLOAD_DIRECTORY, safe_filename))
+        # Ensure file_path is within UPLOAD_DIRECTORY
+        if not file_path.startswith(os.path.abspath(UPLOAD_DIRECTORY)):
+            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid uploaded file name")
         with open(file_path, "wb") as buffer:
             buffer.write(await file.read())
         return file_path