You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: The embed method is versatile and can be used with any embeddings service, e.g. OpenAI API embeddings, not just for Baseten deployments.
60
60
61
-
### Asynchronous Embedding
61
+
####Asynchronous Embedding
62
62
63
63
```python
64
64
asyncdefasync_embed():
@@ -76,8 +76,22 @@ async def async_embed():
76
76
# asyncio.run(async_embed())
77
77
```
78
78
79
-
### Synchronous Batch POST
79
+
#### Embedding Benchmarks
80
+
Comparison against `pip install openai` for `/v1/embeddings`. Tested with the `./scripts/compare_latency_openai.py` with mini_batch_size of 128, and 4 server-side replicas. Results with OpenAI similar, OpenAI allows a max mini_batch_size of 2048.
81
+
82
+
| Number of inputs / embeddings | Number of Tasks | InferenceClient (s) | AsyncOpenAI (s) | Speedup |
The batch_post method is generic. It can be used to send POST requests to any URL, not limited to Baseten endpoints. The input and output can be any JSON item.
# Build and install the Rust extension in development mode
205
-
maturin develop
206
-
cargo fmt
207
-
# Run tests
208
-
pytest tests
209
-
```
210
-
211
-
## Error Handling
204
+
### Error Handling
212
205
213
206
The client can raise several types of errors. Here's how to handle common ones:
214
207
@@ -245,6 +238,27 @@ except requests.exceptions.HTTPError as e:
245
238
246
239
For asynchronous methods (`aembed`, `arerank`, `aclassify`, `abatch_post`), the same exceptions will be raised by the `await` call and can be caught using a `try...except` block within an `async def` function.
0 commit comments