SYNC Server LLM is a gRPC-based server that performs document retrieval and summarization. It leverages Qdrant for vector search and OpenAI models to generate summaries of retrieved content based on user-provided keywords.
git clone --recurse-submodules https://github.com/NCTU-SYNC/sync-server-llm.git
cd sync-server-llm
uv sync --no-dev --frozen
uv run gen-protosBefore running the server, you need to:
-
Configure the server settings in
configs/config.toml -
Create a
.envfile with the following environment variables:Variable Description OPENAI_API_KEYYour ChatGPT API key QDRANT_HOSTThe Qdrant host address QDRANT_PORTThe Qdrant host REST API port QDRANT_COLLECTIONThe Qdrant collection name
You can run SYNC Server LLM using one of the following methods:
uv run scripts/serve.py --config configs/config.tomlNotes:
- Make sure to set up and run the Qdrant server before starting
-
Build the Docker image:
docker build -t sync/backend-llm . -
Run the container:
docker run -p 50051:50051 \ --env-file .env \ -v $(pwd)/path/to/configs:/app/configs/config.toml \ -v $(pwd)/path/to/hf_cache:/tmp/llama_index \ sync/backend-llmNotes:
- For Windows users, add
--gpus=allto use GPU capabilities (requires Docker with GPU support) - We strongly recommend mounting the
hf_cachedirectory to avoid re-downloading Hugging Face models on container restart - Make sure to set up and run the Qdrant server before starting
- For Windows users, add
A docker-compose.yaml file is included in the repository to simplify deployment with both the server and Qdrant database.
-
Build the services:
docker-compose build
-
Start the services:
docker-compose up -d
To test the server, you can use the provided client example:
uv run scripts/client.pyRefer to the protobuf files in the protos/ directory for the features provided by the server.