You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Problem
gRPC has a built-in retry mechanism[1] which we configure to
automatically retry on status UNAVAILABLE messages from Pinecone.
However, it has been observed that VectorService/Upsert method is _not_
being retried automatically and causes an exception to be thrown to the
application:
Traceback (most recent call last):
File ".venv/lib/python3.11/site-packages/pinecone/grpc/base.py", line
150, in wrapped
return func(
^^^^^
File ".venv/lib64/python3.11/site-packages/grpc/_channel.py", line 1181,
in __call__
return _end_unary_response_blocking(state, call, False, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib64/python3.11/site-packages/grpc/_channel.py", line 1006,
in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that
terminated with:
status = StatusCode.UNAVAILABLE
details = "unavailable"
debug_error_string = "UNKNOWN:Error received from peer
ipv4:34.223.120.220:443
{created_time:"2024-05-10T11:54:43.047741403+00:00", grpc_status:14,
grpc_message:"unavailable"}"
Enabling gRPC's tracing[2] by setting env vars 'GRPC_VERBOSITY=debug
GRPC_TRACE=all' (warning - this is _very_ verbose!) highlighted that
when we do get an StatusCode.UNAVAILABLE, retry is not considered as the
request is too large ("committing" in this context means it effectively
disables retry attempts):
0514 14:00:43.870499051 4093173 retry_filter_legacy_call_data.cc:1855]
chand=0x7ff708006080 calld=0x56377b0b11e0: exceeded retry buffer size,
committing
As per gRPC's options[3], the max buffer size is controlled via:
/** Per-RPC retry buffer size, in bytes. Default is 256 KiB. */
#define GRPC_ARG_PER_RPC_RETRY_BUFFER_SIZE
"grpc.per_rpc_retry_buffer_size"
Given Upsert messages are frequently larger than 256KiB (it is common to
batch up to the 2 MB limit), we will fail to retry any batches larger
than 256kB.
## Solution
Address this by changing the retry buffer size to the same size as the
maximum message we support (currently 128MB, more than sufficient to
retry any UpsertRequest).
[1]: https://grpc.io/docs/guides/retry/
[2]:
https://github.com/grpc/grpc/blob/master/doc/environment_variables.md
[3]:
https://github.com/grpc/grpc/blob/befeeba0f57c6ed3608935d8317fd26289e7e080/include/grpc/impl/channel_arg_names.h#L321
## Type of Change
- [x] Bug fix (non-breaking change which fixes an issue)
## Test Plan
No existing test infra to automate testing of this (no way to do error
injection); manually verified that previously seen (intermittent)
UNAVAILABLE responses are correctly retried.
0 commit comments