Skip to content

Commit b3b799f

Browse files
authored
Update text-embeddings-router --help output (#603)
1 parent 079eff0 commit b3b799f

File tree

2 files changed

+50
-68
lines changed

2 files changed

+50
-68
lines changed

README.md

Lines changed: 23 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -129,31 +129,26 @@ NVIDIA drivers on your machine need to be compatible with CUDA version 12.2 or h
129129

130130
To see all options to serve your models:
131131

132-
```shell
133-
text-embeddings-router --help
134-
```
132+
```console
133+
$ text-embeddings-router --help
134+
Text Embedding Webserver
135135

136-
```
137136
Usage: text-embeddings-router [OPTIONS]
138137

139138
Options:
140139
--model-id <MODEL_ID>
141-
The name of the model to load. Can be a MODEL_ID as listed on <https://hf.co/models> like `thenlper/gte-base`.
142-
Or it can be a local directory containing the necessary files as saved by `save_pretrained(...)` methods of
143-
transformers
140+
The name of the model to load. Can be a MODEL_ID as listed on <https://hf.co/models> like `BAAI/bge-large-en-v1.5`. Or it can be a local directory containing the necessary files as saved by `save_pretrained(...)` methods of transformers
144141

145142
[env: MODEL_ID=]
146-
[default: thenlper/gte-base]
143+
[default: BAAI/bge-large-en-v1.5]
147144

148145
--revision <REVISION>
149-
The actual revision of the model if you're referring to a model on the hub. You can use a specific commit id
150-
or a branch like `refs/pr/2`
146+
The actual revision of the model if you're referring to a model on the hub. You can use a specific commit id or a branch like `refs/pr/2`
151147

152148
[env: REVISION=]
153149

154150
--tokenization-workers <TOKENIZATION_WORKERS>
155-
Optionally control the number of tokenizer workers used for payload tokenization, validation and truncation.
156-
Default to the number of CPU cores on the machine
151+
Optionally control the number of tokenizer workers used for payload tokenization, validation and truncation. Default to the number of CPU cores on the machine
157152

158153
[env: TOKENIZATION_WORKERS=]
159154

@@ -175,14 +170,11 @@ Options:
175170
Possible values:
176171
- cls: Select the CLS token as embedding
177172
- mean: Apply Mean pooling to the model embeddings
178-
- splade: Apply SPLADE (Sparse Lexical and Expansion) to the model embeddings. This option is only
179-
available if the loaded model is a `ForMaskedLM` Transformer model
173+
- splade: Apply SPLADE (Sparse Lexical and Expansion) to the model embeddings. This option is only available if the loaded model is a `ForMaskedLM` Transformer model
180174
- last-token: Select the last token as embedding
181175

182176
--max-concurrent-requests <MAX_CONCURRENT_REQUESTS>
183-
The maximum amount of concurrent requests for this particular deployment.
184-
Having a low limit will refuse clients requests instead of having them wait for too long and is usually good
185-
to handle backpressure correctly
177+
The maximum amount of concurrent requests for this particular deployment. Having a low limit will refuse clients requests instead of having them wait for too long and is usually good to handle backpressure correctly
186178

187179
[env: MAX_CONCURRENT_REQUESTS=]
188180
[default: 512]
@@ -194,8 +186,7 @@ Options:
194186

195187
For `max_batch_tokens=1000`, you could fit `10` queries of `total_tokens=100` or a single query of `1000` tokens.
196188

197-
Overall this number should be the largest possible until the model is compute bound. Since the actual memory
198-
overhead depends on the model implementation, text-embeddings-inference cannot infer this number automatically.
189+
Overall this number should be the largest possible until the model is compute bound. Since the actual memory overhead depends on the model implementation, text-embeddings-inference cannot infer this number automatically.
199190

200191
[env: MAX_BATCH_TOKENS=]
201192
[default: 16384]
@@ -223,9 +214,7 @@ Options:
223214

224215
Must be a key in the `sentence-transformers` configuration `prompts` dictionary.
225216

226-
For example if ``default_prompt_name`` is "query" and the ``prompts`` is {"query": "query: ", ...}, then the
227-
sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because
228-
the prompt text will be prepended before any text to encode.
217+
For example if ``default_prompt_name`` is "query" and the ``prompts`` is {"query": "query: ", ...}, then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text to encode.
229218

230219
The argument '--default-prompt-name <DEFAULT_PROMPT_NAME>' cannot be used with '--default-prompt <DEFAULT_PROMPT>`
231220

@@ -234,9 +223,7 @@ Options:
234223
--default-prompt <DEFAULT_PROMPT>
235224
The prompt that should be used by default for encoding. If not set, no prompt will be applied.
236225

237-
For example if ``default_prompt`` is "query: " then the sentence "What is the capital of France?" will be
238-
encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text
239-
to encode.
226+
For example if ``default_prompt`` is "query: " then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text to encode.
240227

241228
The argument '--default-prompt <DEFAULT_PROMPT>' cannot be used with '--default-prompt-name <DEFAULT_PROMPT_NAME>`
242229

@@ -260,15 +247,13 @@ Options:
260247
[default: 3000]
261248

262249
--uds-path <UDS_PATH>
263-
The name of the unix socket some text-embeddings-inference backends will use as they communicate internally
264-
with gRPC
250+
The name of the unix socket some text-embeddings-inference backends will use as they communicate internally with gRPC
265251

266252
[env: UDS_PATH=]
267253
[default: /tmp/text-embeddings-inference-server]
268254

269255
--huggingface-hub-cache <HUGGINGFACE_HUB_CACHE>
270-
The location of the huggingface hub cache. Used to override the location if you want to provide a mounted disk
271-
for instance
256+
The location of the huggingface hub cache. Used to override the location if you want to provide a mounted disk for instance
272257

273258
[env: HUGGINGFACE_HUB_CACHE=]
274259

@@ -283,8 +268,7 @@ Options:
283268
--api-key <API_KEY>
284269
Set an api key for request authorization.
285270

286-
By default the server responds to every request. With an api key set, the requests must have the Authorization
287-
header set with the api key as Bearer token.
271+
By default the server responds to every request. With an api key set, the requests must have the Authorization header set with the api key as Bearer token.
288272

289273
[env: API_KEY=]
290274

@@ -294,8 +278,6 @@ Options:
294278
[env: JSON_OUTPUT=]
295279

296280
--disable-spans
297-
Disables the span logging trace
298-
299281
[env: DISABLE_SPANS=]
300282

301283
--otlp-endpoint <OTLP_ENDPOINT>
@@ -309,8 +291,8 @@ Options:
309291
[env: OTLP_SERVICE_NAME=]
310292
[default: text-embeddings-inference.server]
311293

312-
--prometheus-port <PORT>
313-
The Prometheus metrics port to listen on
294+
--prometheus-port <PROMETHEUS_PORT>
295+
The Prometheus port to listen on
314296

315297
[env: PROMETHEUS_PORT=]
316298
[default: 9000]
@@ -319,6 +301,12 @@ Options:
319301
Unused for gRPC servers
320302

321303
[env: CORS_ALLOW_ORIGIN=]
304+
305+
-h, --help
306+
Print help (see a summary with '-h')
307+
308+
-V, --version
309+
Print version
322310
```
323311

324312
### Docker Images

docs/source/en/cli_arguments.md

Lines changed: 27 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -18,31 +18,26 @@ rendered properly in your Markdown viewer.
1818

1919
To see all options to serve your models, run the following:
2020

21-
```shell
22-
text-embeddings-router --help
23-
```
21+
```console
22+
$ text-embeddings-router --help
23+
Text Embedding Webserver
2424

25-
```
2625
Usage: text-embeddings-router [OPTIONS]
2726

2827
Options:
2928
--model-id <MODEL_ID>
30-
The name of the model to load. Can be a MODEL_ID as listed on <https://hf.co/models> like `thenlper/gte-base`.
31-
Or it can be a local directory containing the necessary files as saved by `save_pretrained(...)` methods of
32-
transformers
29+
The name of the model to load. Can be a MODEL_ID as listed on <https://hf.co/models> like `BAAI/bge-large-en-v1.5`. Or it can be a local directory containing the necessary files as saved by `save_pretrained(...)` methods of transformers
3330

3431
[env: MODEL_ID=]
35-
[default: thenlper/gte-base]
32+
[default: BAAI/bge-large-en-v1.5]
3633

3734
--revision <REVISION>
38-
The actual revision of the model if you're referring to a model on the hub. You can use a specific commit id
39-
or a branch like `refs/pr/2`
35+
The actual revision of the model if you're referring to a model on the hub. You can use a specific commit id or a branch like `refs/pr/2`
4036

4137
[env: REVISION=]
4238

4339
--tokenization-workers <TOKENIZATION_WORKERS>
44-
Optionally control the number of tokenizer workers used for payload tokenization, validation and truncation.
45-
Default to the number of CPU cores on the machine
40+
Optionally control the number of tokenizer workers used for payload tokenization, validation and truncation. Default to the number of CPU cores on the machine
4641

4742
[env: TOKENIZATION_WORKERS=]
4843

@@ -64,14 +59,11 @@ Options:
6459
Possible values:
6560
- cls: Select the CLS token as embedding
6661
- mean: Apply Mean pooling to the model embeddings
67-
- splade: Apply SPLADE (Sparse Lexical and Expansion) to the model embeddings. This option is only
68-
available if the loaded model is a `ForMaskedLM` Transformer model
62+
- splade: Apply SPLADE (Sparse Lexical and Expansion) to the model embeddings. This option is only available if the loaded model is a `ForMaskedLM` Transformer model
6963
- last-token: Select the last token as embedding
7064

7165
--max-concurrent-requests <MAX_CONCURRENT_REQUESTS>
72-
The maximum amount of concurrent requests for this particular deployment.
73-
Having a low limit will refuse clients requests instead of having them wait for too long and is usually good
74-
to handle backpressure correctly
66+
The maximum amount of concurrent requests for this particular deployment. Having a low limit will refuse clients requests instead of having them wait for too long and is usually good to handle backpressure correctly
7567

7668
[env: MAX_CONCURRENT_REQUESTS=]
7769
[default: 512]
@@ -83,8 +75,7 @@ Options:
8375

8476
For `max_batch_tokens=1000`, you could fit `10` queries of `total_tokens=100` or a single query of `1000` tokens.
8577

86-
Overall this number should be the largest possible until the model is compute bound. Since the actual memory
87-
overhead depends on the model implementation, text-embeddings-inference cannot infer this number automatically.
78+
Overall this number should be the largest possible until the model is compute bound. Since the actual memory overhead depends on the model implementation, text-embeddings-inference cannot infer this number automatically.
8879

8980
[env: MAX_BATCH_TOKENS=]
9081
[default: 16384]
@@ -112,9 +103,7 @@ Options:
112103

113104
Must be a key in the `sentence-transformers` configuration `prompts` dictionary.
114105

115-
For example if ``default_prompt_name`` is "query" and the ``prompts`` is {"query": "query: ", ...}, then the
116-
sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because
117-
the prompt text will be prepended before any text to encode.
106+
For example if ``default_prompt_name`` is "query" and the ``prompts`` is {"query": "query: ", ...}, then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text to encode.
118107

119108
The argument '--default-prompt-name <DEFAULT_PROMPT_NAME>' cannot be used with '--default-prompt <DEFAULT_PROMPT>`
120109

@@ -123,9 +112,7 @@ Options:
123112
--default-prompt <DEFAULT_PROMPT>
124113
The prompt that should be used by default for encoding. If not set, no prompt will be applied.
125114

126-
For example if ``default_prompt`` is "query: " then the sentence "What is the capital of France?" will be
127-
encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text
128-
to encode.
115+
For example if ``default_prompt`` is "query: " then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text to encode.
129116

130117
The argument '--default-prompt <DEFAULT_PROMPT>' cannot be used with '--default-prompt-name <DEFAULT_PROMPT_NAME>`
131118

@@ -149,15 +136,13 @@ Options:
149136
[default: 3000]
150137

151138
--uds-path <UDS_PATH>
152-
The name of the unix socket some text-embeddings-inference backends will use as they communicate internally
153-
with gRPC
139+
The name of the unix socket some text-embeddings-inference backends will use as they communicate internally with gRPC
154140

155141
[env: UDS_PATH=]
156142
[default: /tmp/text-embeddings-inference-server]
157143

158144
--huggingface-hub-cache <HUGGINGFACE_HUB_CACHE>
159-
The location of the huggingface hub cache. Used to override the location if you want to provide a mounted disk
160-
for instance
145+
The location of the huggingface hub cache. Used to override the location if you want to provide a mounted disk for instance
161146

162147
[env: HUGGINGFACE_HUB_CACHE=]
163148

@@ -172,8 +157,7 @@ Options:
172157
--api-key <API_KEY>
173158
Set an api key for request authorization.
174159

175-
By default the server responds to every request. With an api key set, the requests must have the Authorization
176-
header set with the api key as Bearer token.
160+
By default the server responds to every request. With an api key set, the requests must have the Authorization header set with the api key as Bearer token.
177161

178162
[env: API_KEY=]
179163

@@ -183,8 +167,6 @@ Options:
183167
[env: JSON_OUTPUT=]
184168

185169
--disable-spans
186-
Disables the span logging trace
187-
188170
[env: DISABLE_SPANS=]
189171

190172
--otlp-endpoint <OTLP_ENDPOINT>
@@ -198,8 +180,20 @@ Options:
198180
[env: OTLP_SERVICE_NAME=]
199181
[default: text-embeddings-inference.server]
200182

183+
--prometheus-port <PROMETHEUS_PORT>
184+
The Prometheus port to listen on
185+
186+
[env: PROMETHEUS_PORT=]
187+
[default: 9000]
188+
201189
--cors-allow-origin <CORS_ALLOW_ORIGIN>
202190
Unused for gRPC servers
203191

204192
[env: CORS_ALLOW_ORIGIN=]
193+
194+
-h, --help
195+
Print help (see a summary with '-h')
196+
197+
-V, --version
198+
Print version
205199
```

0 commit comments

Comments
 (0)