Skip to content

Commit 72dac20

Browse files
authored
Preparing for release 1.7.0 (candle update + modernbert). (#570)
1 parent 24ba906 commit 72dac20

File tree

5 files changed

+13
-13
lines changed

5 files changed

+13
-13
lines changed

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ default-members = [
2626
resolver = "2"
2727

2828
[workspace.package]
29-
version = "1.6.1"
29+
version = "1.7.0"
3030
edition = "2021"
3131
authors = ["Olivier Dehaene", "Nicolas Patry", "Alvaro Bartolome"]
3232
homepage = "https://github.com/huggingface/text-embeddings-inference"

docs/openapi.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
"name": "Apache 2.0",
1111
"url": "https://www.apache.org/licenses/LICENSE-2.0"
1212
},
13-
"version": "1.6.0"
13+
"version": "1.7.0"
1414
},
1515
"paths": {
1616
"/decode": {

docs/source/en/private_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,5 +37,5 @@ model=<your private model>
3737
volume=$PWD/data
3838
token=<your cli Hugging Face Hub token>
3939

40-
docker run --gpus all -e HF_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id $model
40+
docker run --gpus all -e HF_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
4141
```

docs/source/en/quick_tour.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Finally, deploy your model. Let's say you want to use `BAAI/bge-large-en-v1.5`.
3333
model=BAAI/bge-large-en-v1.5
3434
volume=$PWD/data
3535

36-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id $model
36+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
3737
```
3838

3939
<Tip>
@@ -66,7 +66,7 @@ Let's say you want to use `BAAI/bge-reranker-large`:
6666
model=BAAI/bge-reranker-large
6767
volume=$PWD/data
6868

69-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id $model
69+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
7070
```
7171

7272
Once you have deployed a model, you can use the `rerank` endpoint to rank the similarity between a query and a list
@@ -87,7 +87,7 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
8787
model=SamLowe/roberta-base-go_emotions
8888
volume=$PWD/data
8989

90-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id $model
90+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
9191
```
9292

9393
Once you have deployed the model you can use the `predict` endpoint to get the emotions most associated with an input:
@@ -139,5 +139,5 @@ git clone https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5
139139
volume=$PWD
140140

141141
# Mount the models directory inside the container with a volume and set the model ID
142-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id /data/gte-base-en-v1.5
142+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id /data/gte-base-en-v1.5
143143
```

docs/source/en/supported_models.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -66,13 +66,13 @@ Find the appropriate Docker image for your hardware in the following table:
6666

6767
| Architecture | Image |
6868
|-------------------------------------|--------------------------------------------------------------------------|
69-
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 |
69+
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 |
7070
| Volta | NOT SUPPORTED |
71-
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.6 (experimental) |
72-
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.6 |
73-
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.6 |
74-
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.6 |
75-
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.6 (experimental) |
71+
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.7 (experimental) |
72+
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.7 |
73+
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.7 |
74+
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.7 |
75+
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.7 (experimental) |
7676

7777
**Warning**: Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
7878
You can turn Flash Attention v1 ON by using the `USE_FLASH_ATTENTION=True` environment variable.

0 commit comments

Comments
 (0)