-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Expected behavior
When the Redis container is paused (not stopped), the connection attempt should fail, triggering the retry mechanism. The retry number should increase monotonically until until the specified maximum number of retries is reached.
Example logs of expected behavior:
INFO - Attempt 1/5. Backing off for 0.5 seconds
INFO - Attempt 2/5. Backing off for 1.0 seconds
INFO - Attempt 3/5. Backing off for 2.0 seconds
INFO - Attempt 4/5. Backing off for 4.0 seconds
The print statement was added at the end of the following except
block:
Lines 60 to 70 in ea01a30
while True: | |
try: | |
return do() | |
except self._supported_errors as error: | |
failures += 1 | |
fail(error) | |
if self._retries >= 0 and failures > self._retries: | |
raise error | |
backoff = self._backoff.compute(failures) | |
if backoff > 0: | |
sleep(backoff) |
Actual behavior
Instead of progressing through the retry attempts, the retry mechanism gets stuck at the first attempt, repeating indefinitely.
Example logs of actual behavior:
INFO - Attempt 1/5. Backing off for 0.5 seconds
INFO - Attempt 1/5. Backing off for 0.5 seconds
INFO - Attempt 1/5. Backing off for 0.5 seconds
INFO - Attempt 1/5. Backing off for 0.5 seconds
Root Cause
The issue occurs because the sock.connect
(line 575 of the _connect
method) succeeds even when the container is paused
. However, subsequent read operations fail with Timeout
.
Lines 728 to 763 in ea01a30
def _connect(self): | |
"Create a TCP socket connection" | |
# we want to mimic what socket.create_connection does to support | |
# ipv4/ipv6, but we want to set options prior to calling | |
# socket.connect() | |
err = None | |
for res in socket.getaddrinfo( | |
self.host, self.port, self.socket_type, socket.SOCK_STREAM | |
): | |
family, socktype, proto, canonname, socket_address = res | |
sock = None | |
try: | |
sock = socket.socket(family, socktype, proto) | |
# TCP_NODELAY | |
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) | |
# TCP_KEEPALIVE | |
if self.socket_keepalive: | |
sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) | |
for k, v in self.socket_keepalive_options.items(): | |
sock.setsockopt(socket.IPPROTO_TCP, k, v) | |
# set the socket_connect_timeout before we connect | |
sock.settimeout(self.socket_connect_timeout) | |
# connect | |
sock.connect(socket_address) | |
# set the socket_timeout now that we're connected | |
sock.settimeout(self.socket_timeout) | |
return sock | |
except OSError as _: | |
err = _ | |
if sock is not None: | |
sock.close() |
Possible solution
To properly detect when the connection is truly established, we can send a PING
command immediately after connect()
and verify the response.
Add the following after sock.connect()
to ensure the connection is functional:
ping_parts = self._command_packer.pack("PING")
for part in ping_parts:
sock.sendall(part)
response = sock.recv(7)
if not str_if_bytes(response).startswith("+PONG"):
raise OSError(f"Redis handshake failed: unexpected response {response!r}")
Additional Comments
-
There may be a better way to handle the read operation for the
PING
response using existing methods, but calling_send_ping
directly does not work in this case. -
This issue also affects the asynchronous version of
redis-py
.
Let me know if you'd like a clearer example to reproduce the behavior.