You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AudioQnA/docker_compose/intel/cpu/xeon/README.md
+28-5Lines changed: 28 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -134,11 +134,34 @@ docker compose -f compose.yaml down
134
134
135
135
In the context of deploying an AudioQnA pipeline on an Intel® Xeon® platform, we can pick and choose different large language model serving frameworks, or single English TTS/multi-language TTS component. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git).
|[compose.yaml](./compose.yaml)| Default compose file using vllm as serving framework and redis as vector database |
140
+
|[compose_tgi.yaml](./compose_tgi.yaml)| The LLM serving framework is TGI. All other configurations remain the same as the default |
141
+
|[compose_multilang.yaml](./compose_multilang.yaml)| The TTS component is GPT-SoVITS. All other configurations remain the same as the default |
142
+
|[compose_remote.yaml](./compose_remote.yaml)| The LLM used is hosted on a remote server and an endpoint is used to access this model. Additional environment variables need to be set before running. See [instructions](#running-llm-models-with-remote-endpoints) below. |
143
+
144
+
### Running LLM models with remote endpoints
145
+
146
+
When models are deployed on a remote server, a base URL and an API key are required to access them. To set up a remote server and acquire the base URL and API key, refer to [Intel® AI for Enterprise Inference](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/enterprise-inference.html) offerings.
147
+
148
+
Set the following environment variables.
149
+
150
+
-`REMOTE_ENDPOINT` is the HTTPS endpoint of the remote server with the model of choice (i.e. https://api.example.com). **Note:** If the API for the models does not use LiteLLM, the second part of the model card needs to be appended to the URL. For example, set `REMOTE_ENDPOINT` to https://api.example.com/Llama-3.3-70B-Instruct if the model card is `meta-llama/Llama-3.3-70B-Instruct`.
151
+
-`API_KEY` is the access token or key to access the model(s) on the server.
152
+
-`LLM_MODEL_ID` is the model card which may need to be overwritten depending on what it is set to `set_env.sh`.
0 commit comments