Description
Problem
Hi Jedis experts! I am testing Jedis when there is a master node failure. I noticed that when the master is down, there is a decent chance that Jedis will throw error redis.clients.jedis.exceptions.JedisConnectionException: Failed to connect to XXX
. This can take an extremely long time to recover, and I could not find a convenient way to manually refresh the slot cache, or let Jedis automatically refreshes it.
Here are my thoughts of how to tackle it:
- We can expose
renewSlotCache()
function inJedisCluster
: GitHub diff. So when the client runs into theredis.clients.jedis.exceptions.JedisConnectionException
, client can runJedisCluster.renewSlotCache ()
then retry - In ClusterConnectionProvider. getConnectionFromSlot(), we can ping the returned connection to check the connection. However given that Redis master fail is rare, this can be too expensive and I prefer option 1.
Any advice of what can be the best way to renew the slot cache is greatly appreciated!
My Redis Cluster setup
3 master and 3 replicas with these configs
Redis-server --port $PORT --protected-mode $PROTECTED_MODE --cluster-enabled yes --cluster-config-file nodes-${PORT}.conf --cluster-node-timeout $TIMEOUT --appendonly yes --appendfilename appendonly-${PORT}.aof --appenddirname appendonlydir-${PORT} --dbfilename dump-${PORT}.rdb --logfile ${PORT}.log --daemonize yes ${ADDITIONAL_OPTIONS} --cluster-require-full-coverage no --cluster-node-timeout 15000
Reproduce with Sample Code
This is a minimal code,
- Write a key to slot
5139
and sleep for 50 seconds. My time is--cluster-node-timeout 15000
, so 50s sleep should be more than enough for the fail over to finish. - manually run
redis-cli -p 30001 cluster shards
to find the master node for this slot, thenRedis-cli -p XXX shutdown
to kill this master - try
cluster.getConnectionFromSlot(slot);
- try
cluster.getConnectionFromSlot(slot);
again
package com.example;
import java.util.HashSet;
import java.util.Set;
import redis.clients.jedis.HostAndPort;
import redis.clients.jedis.JedisCluster;
import redis.clients.jedis.util.JedisClusterCRC16;
public class MinimalRedisClusterTest {
public static void main(String[] args) {
// Define your cluster nodes. Adjust ports if needed.
Set<HostAndPort> nodes = new HashSet<>();
nodes.add(new HostAndPort("127.0.0.1", 30001));
nodes.add(new HostAndPort("127.0.0.1", 30002));
nodes.add(new HostAndPort("127.0.0.1", 30003));
nodes.add(new HostAndPort("127.0.0.1", 30004));
nodes.add(new HostAndPort("127.0.0.1", 30005));
nodes.add(new HostAndPort("127.0.0.1", 30006));
// Create a JedisCluster instance
int connectionTimeout = 2000; // ms
int soTimeout = 2000; // ms
int maxAttempts = 5; // 💥 Here is where you set maxAttempts
String password = null;
JedisCluster cluster = new JedisCluster(nodes, connectionTimeout, soTimeout, maxAttempts, password, null);
String key = "test-key";
int slot = JedisClusterCRC16.getSlot(key);
System.out.println("Key: " + key + " is in slot: " + slot);
try {
Thread.sleep(50000); // Sleep for a second to ensure the cluster is stable
// Now, try to get the connection from the slot.
// (Shutdown the Redis master that holds this slot before running this line!)
try {
System.out.println("Attempting to get connection from slot...");
cluster.getConnectionFromSlot(slot);
System.out.println("Got connection successfully.");
} catch (Exception e) {
System.err.println("Error while getting connection from slot:");
e.printStackTrace();
}
} catch (Exception ex) {
System.err.println("Error initializing JedisCluster:");
ex.printStackTrace();
}
// Second retry
try {
try {
System.out.println("Attempting to get connection from slot...");
cluster.getConnectionFromSlot(slot);
System.out.println("Got connection successfully.");
} catch (Exception e) {
System.err.println("Error while getting connection from slot:");
e.printStackTrace();
}
} catch (Exception ex) {
System.err.println("Error initializing JedisCluster:");
ex.printStackTrace();
}
}
}
Then I can see both connections failed
Attempting to get connection from slot...
Error while getting connection from slot:
redis.clients.jedis.exceptions.JedisConnectionException: Failed to connect to 127.0.0.1:30001.
at redis.clients.jedis.DefaultJedisSocketFactory.connectToFirstSuccessfulHost(DefaultJedisSocketFactory.java:68)
at redis.clients.jedis.DefaultJedisSocketFactory.createSocket(DefaultJedisSocketFactory.java:94)
at redis.clients.jedis.Connection.connect(Connection.java:232)
at redis.clients.jedis.Connection.initializeFromClientConfig(Connection.java:455)
at redis.clients.jedis.Connection.<init>(Connection.java:77)
at redis.clients.jedis.ConnectionFactory.lambda$connectionSupplier$1(ConnectionFactory.java:72)
at redis.clients.jedis.ConnectionFactory.makeObject(ConnectionFactory.java:96)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:571)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:298)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:223)
at redis.clients.jedis.util.Pool.getResource(Pool.java:38)
at redis.clients.jedis.ConnectionPool.getResource(ConnectionPool.java:52)
at redis.clients.jedis.providers.ClusterConnectionProvider.getConnectionFromSlot(ClusterConnectionProvider.java:173)
at redis.clients.jedis.JedisCluster.getConnectionFromSlot(JedisCluster.java:352)
at com.example.MinimalRedisClusterTest.main(MinimalRedisClusterTest.java:35)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:279)
at java.base/java.lang.Thread.run(Thread.java:1575)
Suppressed: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682)
at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:549)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:592)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
at java.base/java.net.Socket.connect(Socket.java:760)
at redis.clients.jedis.DefaultJedisSocketFactory.connectToFirstSuccessfulHost(DefaultJedisSocketFactory.java:80)
... 16 more
Attempting to get connection from slot...
Error while getting connection from slot:
redis.clients.jedis.exceptions.JedisConnectionException: Failed to connect to 127.0.0.1:30001.
...
Results for Option 1
Tested the diff in option 1: https://github.com/redis/jedis/compare/master...taowang487:taowang487/renewSlotCache?expand=1
In my sample code the before the second connection I renew the slot cache
diff --git a/src/main/java/com/example/MinimalRedisClusterTest.java b/src/main/java/com/example/MinimalRedisClusterTest.java
index fdf9284..516202c 100644
--- a/src/main/java/com/example/MinimalRedisClusterTest.java
+++ b/src/main/java/com/example/MinimalRedisClusterTest.java
@@ -52,7 +52,7 @@ public class MinimalRedisClusterTest {
// (Shutdown the Redis master that holds this slot before running this line!)
try {
System.out.println("Attempting to get connection from slot...");
+ cluster.renewSlotCache();
cluster.getConnectionFromSlot(slot);
System.out.println("Got connection successfully.");
} catch (Exception e) {
(END)
Now the second connection can pass
Key: test-key is in slot: 5139
Attempting to get connection from slot...
Error while getting connection from slot:
redis.clients.jedis.exceptions.JedisConnectionException: Failed to connect to 127.0.0.1:30006.
at redis.clients.jedis.DefaultJedisSocketFactory.connectToFirstSuccessfulHost(DefaultJedisSocketFactory.java:68)
at redis.clients.jedis.DefaultJedisSocketFactory.createSocket(DefaultJedisSocketFactory.java:94)
at redis.clients.jedis.Connection.connect(Connection.java:232)
at redis.clients.jedis.Connection.initializeFromClientConfig(Connection.java:455)
at redis.clients.jedis.Connection.<init>(Connection.java:77)
at redis.clients.jedis.ConnectionFactory.lambda$connectionSupplier$1(ConnectionFactory.java:72)
at redis.clients.jedis.ConnectionFactory.makeObject(ConnectionFactory.java:96)
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:571)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:298)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:223)
at redis.clients.jedis.util.Pool.getResource(Pool.java:38)
at redis.clients.jedis.ConnectionPool.getResource(ConnectionPool.java:52)
at redis.clients.jedis.providers.ClusterConnectionProvider.getConnectionFromSlot(ClusterConnectionProvider.java:174)
at redis.clients.jedis.JedisCluster.getConnectionFromSlot(JedisCluster.java:352)
at com.example.MinimalRedisClusterTest.main(MinimalRedisClusterTest.java:35)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:279)
at java.base/java.lang.Thread.run(Thread.java:1575)
Suppressed: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682)
at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:549)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:592)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
at java.base/java.net.Socket.connect(Socket.java:760)
at redis.clients.jedis.DefaultJedisSocketFactory.connectToFirstSuccessfulHost(DefaultJedisSocketFactory.java:80)
... 16 more
Attempting to get connection from slot...
Got connection successfully.