Skip to content

Commit 1ff85f6

Browse files
Upgrade TGI Gaudi version to v2.0.6 (#1088)
Signed-off-by: lvliang-intel <[email protected]> Co-authored-by: chen, suyue <[email protected]>
1 parent f7a7f8a commit 1ff85f6

File tree

74 files changed

+94
-85
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+94
-85
lines changed

AgentQnA/docker_compose/intel/hpu/gaudi/tgi_gaudi.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
services:
55
tgi-server:
6-
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
6+
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
77
container_name: tgi-server
88
ports:
99
- "8085:80"

AudioQnA/docker_compose/intel/hpu/gaudi/compose.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ services:
5151
environment:
5252
TTS_ENDPOINT: ${TTS_ENDPOINT}
5353
tgi-service:
54-
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
54+
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
5555
container_name: tgi-gaudi-server
5656
ports:
5757
- "3006:80"

AudioQnA/kubernetes/intel/README_gmc.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ The AudioQnA uses the below prebuilt images if you choose a Xeon deployment
2525
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
2626
For Gaudi:
2727

28-
- tgi-service: ghcr.io/huggingface/tgi-gaudi:2.0.5
28+
- tgi-service: ghcr.io/huggingface/tgi-gaudi:2.0.6
2929
- whisper-gaudi: opea/whisper-gaudi:latest
3030
- speecht5-gaudi: opea/speecht5-gaudi:latest
3131

AudioQnA/kubernetes/intel/hpu/gaudi/manifest/audioqna.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,7 @@ spec:
271271
- envFrom:
272272
- configMapRef:
273273
name: audio-qna-config
274-
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
274+
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
275275
name: llm-dependency-deploy-demo
276276
securityContext:
277277
capabilities:

AudioQnA/tests/test_compose_on_gaudi.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ function build_docker_images() {
2222
service_list="audioqna whisper-gaudi asr llm-tgi speecht5-gaudi tts"
2323
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
2424

25-
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
25+
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
2626
docker images && sleep 1s
2727
}
2828

AudioQnA/tests/test_compose_on_xeon.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ function build_docker_images() {
2222
service_list="audioqna whisper asr llm-tgi speecht5 tts"
2323
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
2424

25-
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
25+
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
2626
docker images && sleep 1s
2727
}
2828

AvatarChatbot/docker_compose/intel/hpu/gaudi/compose.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ services:
5454
environment:
5555
TTS_ENDPOINT: ${TTS_ENDPOINT}
5656
tgi-service:
57-
image: ghcr.io/huggingface/tgi-gaudi:2.0.5
57+
image: ghcr.io/huggingface/tgi-gaudi:2.0.6
5858
container_name: tgi-gaudi-server
5959
ports:
6060
- "3006:80"

AvatarChatbot/tests/test_compose_on_gaudi.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ function build_docker_images() {
2929
service_list="avatarchatbot whisper-gaudi asr llm-tgi speecht5-gaudi tts wav2lip-gaudi animation"
3030
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
3131

32-
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
32+
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
3333

3434
docker images && sleep 1s
3535
}

AvatarChatbot/tests/test_compose_on_xeon.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ function build_docker_images() {
2929
service_list="avatarchatbot whisper asr llm-tgi speecht5 tts wav2lip animation"
3030
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log
3131

32-
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.5
32+
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.6
3333

3434
docker images && sleep 1s
3535
}

ChatQnA/benchmark/accuracy/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ To setup a LLM model, we can use [tgi-gaudi](https://github.com/huggingface/tgi-
4848
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.1 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2
4949
5050
# for better performance, set `PREFILL_BATCH_BUCKET_SIZE`, `BATCH_BUCKET_SIZE`, `max-batch-total-tokens`, `max-batch-prefill-tokens`
51-
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} -e PREFILL_BATCH_BUCKET_SIZE=1 -e BATCH_BUCKET_SIZE=8 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.5 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 2048
51+
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} -e PREFILL_BATCH_BUCKET_SIZE=1 -e BATCH_BUCKET_SIZE=8 --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.6 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 2048
5252
```
5353

5454
### Prepare Dataset

0 commit comments

Comments
 (0)