Skip to content

Commit 82dbd7e

Browse files
authored
Improve upsert throughput by 3x (#334)
## Problem Python SDK upsert throughput is low compared to other SDKs - for example I can achive 880 vector upserts/sec with the Python SDK, compared to 3500 upserts/sec with the Java SDK. Profiling the Python SDK performing these upserts shows a large percentage of time in gRPC / protobuf serialisation / deserialisation. ## Solution Upgrade protobuf from v3 to v4. This adds a number of performance improvements in parsing / serialization as documented at https://protobuf.dev/news/2022-05-06/#python-updates This increases upsert() throughput by 3x (measured by upserting 1M 768 dimension indexes to a pod-based index in batches of 500): * Before: 880 vectors/sec * After: 2580 vectors/sec As per the documentation, this results in an incompatible change with the _generated_ Python code, so this depends on a related change to pinecone-protos to change the version of protobuf used to generate the Python code there. ## Type of Change - [x] None of the above: Performance improvement. ## Test Plan Use existing regression tests.
1 parent e123da1 commit 82dbd7e

File tree

7 files changed

+866
-2076
lines changed

7 files changed

+866
-2076
lines changed

.github/workflows/testing-dependency.yaml

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -56,12 +56,11 @@ jobs:
5656
# - 4.1.0
5757
- 4.3.3
5858
protobuf_version:
59-
- 3.20.3
59+
- 4.25.3
60+
protoc-gen-openapiv2:
61+
- 0.0.1
6062
googleapis_common_protos_version:
61-
- 1.53.0
6263
- 1.62.0
63-
grpc_gateway_protoc_gen_openapiv2_version:
64-
- 0.1.0
6564
steps:
6665
- uses: actions/checkout@v4
6766
- uses: ./.github/actions/test-dependency-grpc
@@ -92,12 +91,11 @@ jobs:
9291
- 3.1.3
9392
- 4.3.3
9493
protobuf_version:
95-
- 3.20.3
94+
- 4.25.3
95+
protoc-gen-openapiv2:
96+
- 0.0.1
9697
googleapis_common_protos_version:
97-
- 1.53.0
9898
- 1.62.0
99-
grpc_gateway_protoc_gen_openapiv2_version:
100-
- 0.1.0
10199
steps:
102100
- uses: actions/checkout@v4
103101
- uses: ./.github/actions/test-dependency-grpc

pinecone/core/grpc/protos/vector_service_pb2.py

Lines changed: 189 additions & 1628 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)