Skip to content

Commit 3e91470

Browse files
authored
Documentation / readme update (#1681)
1 parent 40392dc commit 3e91470

File tree

4 files changed

+53
-39
lines changed

4 files changed

+53
-39
lines changed

baseten-inference-client/Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

baseten-inference-client/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "baseten_inference_client"
3-
version = "0.0.1-rc3"
3+
version = "0.0.1-rc4"
44
edition = "2021"
55

66
[dependencies]

baseten-inference-client/README.md

Lines changed: 49 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ base_url_embed = "https://model-yqv0rjjw.api.baseten.co/environments/production/
2121
# base_url_embed = "https://api.openai.com" or "https://api.mixedbread.com"
2222
client = InferenceClient(base_url=base_url_embed, api_key=api_key)
2323
```
24-
25-
### Synchronous Embedding
24+
### Embeddings
25+
#### Synchronous Embedding
2626

2727
```python
2828
texts = ["Hello world", "Example text", "Another sample"]
@@ -58,7 +58,7 @@ if numpy_array.shape[0] > 0:
5858

5959
Note: The embed method is versatile and can be used with any embeddings service, e.g. OpenAI API embeddings, not just for Baseten deployments.
6060

61-
### Asynchronous Embedding
61+
#### Asynchronous Embedding
6262

6363
```python
6464
async def async_embed():
@@ -76,8 +76,22 @@ async def async_embed():
7676
# asyncio.run(async_embed())
7777
```
7878

79-
### Synchronous Batch POST
79+
#### Embedding Benchmarks
80+
Comparison against `pip install openai` for `/v1/embeddings`. Tested with the `./scripts/compare_latency_openai.py` with mini_batch_size of 128, and 4 server-side replicas. Results with OpenAI similar, OpenAI allows a max mini_batch_size of 2048.
81+
82+
| Number of inputs / embeddings | Number of Tasks | InferenceClient (s) | AsyncOpenAI (s) | Speedup |
83+
|-------------------------------:|---------------:|---------------------:|----------------:|--------:|
84+
| 128 | 1 | 0.12 | 0.13 | 1.08× |
85+
| 512 | 4 | 0.14 | 0.21 | 1.50× |
86+
| 8 192 | 64 | 0.83 | 1.95 | 2.35× |
87+
| 131 072 | 1 024 | 4.63 | 39.07 | 8.44× |
88+
| 2 097 152 | 16 384 | 70.92 | 903.68 | 12.74× |
89+
90+
### Gerneral Batch POST
8091

92+
The batch_post method is generic. It can be used to send POST requests to any URL, not limited to Baseten endpoints. The input and output can be any JSON item.
93+
94+
#### Synchronous Batch POST
8195
```python
8296
payload1 = {"model": "my_model", "input": ["Batch request sample 1"]}
8397
payload2 = {"model": "my_model", "input": ["Batch request sample 2"]}
@@ -90,10 +104,7 @@ response1, response2 = client.batch_post(
90104
print("Batch POST responses:", response1, response2)
91105
```
92106

93-
Note: The batch_post method is generic. It can be used to send POST requests to any URL,
94-
not limited to Baseten endpoints.
95-
96-
### Asynchronous Batch POST
107+
#### Asynchronous Batch POST
97108

98109
```python
99110
async def async_batch_post():
@@ -109,8 +120,10 @@ async def async_batch_post():
109120
# To run:
110121
# asyncio.run(async_batch_post())
111122
```
123+
### Reranking
124+
Reranking compatible with BEI or text-embeddings-inference.
112125

113-
### Synchronous Reranking
126+
#### Synchronous Reranking
114127

115128
```python
116129
query = "What is the best framework?"
@@ -127,7 +140,7 @@ for res in rerank_response.data:
127140
print(f"Index: {res.index} Score: {res.score}")
128141
```
129142

130-
### Asynchronous Reranking
143+
#### Asynchronous Reranking
131144

132145
```python
133146
async def async_rerank():
@@ -148,7 +161,9 @@ async def async_rerank():
148161
# asyncio.run(async_rerank())
149162
```
150163

151-
### Synchronous Classification
164+
### Classification
165+
Predicy (classification endpoint) compatible with BEI or text-embeddings-inference.
166+
#### Synchronous Classification
152167

153168
```python
154169
texts_to_classify = [
@@ -167,8 +182,7 @@ for group in classify_response.data:
167182
print(f"Label: {result.label}, Score: {result.score}")
168183
```
169184

170-
### Asynchronous Classification
171-
185+
#### Asynchronous Classification
172186
```python
173187
async def async_classify():
174188
texts = ["Async positive", "Async negative"]
@@ -187,28 +201,7 @@ async def async_classify():
187201
```
188202

189203

190-
## Development
191-
192-
```bash
193-
# Install prerequisites
194-
sudo apt-get install patchelf
195-
# Install cargo if not already installed.
196-
197-
# Set up a Python virtual environment
198-
python -m venv .venv
199-
source .venv/bin/activate
200-
201-
# Install development dependencies
202-
pip install maturin[patchelf] pytest requests numpy
203-
204-
# Build and install the Rust extension in development mode
205-
maturin develop
206-
cargo fmt
207-
# Run tests
208-
pytest tests
209-
```
210-
211-
## Error Handling
204+
### Error Handling
212205

213206
The client can raise several types of errors. Here's how to handle common ones:
214207

@@ -245,6 +238,27 @@ except requests.exceptions.HTTPError as e:
245238

246239
For asynchronous methods (`aembed`, `arerank`, `aclassify`, `abatch_post`), the same exceptions will be raised by the `await` call and can be caught using a `try...except` block within an `async def` function.
247240

241+
## Development
242+
243+
```bash
244+
# Install prerequisites
245+
sudo apt-get install patchelf
246+
# Install cargo if not already installed.
247+
248+
# Set up a Python virtual environment
249+
python -m venv .venv
250+
source .venv/bin/activate
251+
252+
# Install development dependencies
253+
pip install maturin[patchelf] pytest requests numpy
254+
255+
# Build and install the Rust extension in development mode
256+
maturin develop
257+
cargo fmt
258+
# Run tests
259+
pytest tests
260+
```
261+
248262
## Contributions
249263
Feel free to contribute to this repo, tag @michaelfeil for review.
250264

baseten-inference-client/src/lib.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -668,7 +668,7 @@ impl InferenceClient {
668668
if payloads.is_empty() {
669669
return Err(PyValueError::new_err("Payloads list cannot be empty"));
670670
}
671-
InferenceClient::validate_concurrency_parameters(max_concurrent_requests, 1)?; // Batch size is effectively 1
671+
InferenceClient::validate_concurrency_parameters(max_concurrent_requests, 1000)?; // sent batch size to 1000 to allow higher batch
672672
let timeout_duration = InferenceClient::validate_and_get_timeout_duration(timeout_s)?;
673673

674674
// Depythonize all payloads in the current thread (GIL is held)
@@ -749,7 +749,7 @@ impl InferenceClient {
749749
if payloads.is_empty() {
750750
return Err(PyValueError::new_err("Payloads list cannot be empty"));
751751
}
752-
InferenceClient::validate_concurrency_parameters(max_concurrent_requests, 1)?; // Batch size is effectively 1
752+
InferenceClient::validate_concurrency_parameters(max_concurrent_requests, 1000)?; // sent batch size to 1000 to allow higher batch
753753
let timeout_duration = InferenceClient::validate_and_get_timeout_duration(timeout_s)?;
754754

755755
// Depythonize all payloads in the current thread (GIL is held by `py` argument)

0 commit comments

Comments
 (0)