Skip to content

Best way to "renewSlotCache" in JedisCluster? #4154

Open
@taowang487

Description

@taowang487

Problem

Hi Jedis experts! I am testing Jedis when there is a master node failure. I noticed that when the master is down, there is a decent chance that Jedis will throw error redis.clients.jedis.exceptions.JedisConnectionException: Failed to connect to XXX. This can take an extremely long time to recover, and I could not find a convenient way to manually refresh the slot cache, or let Jedis automatically refreshes it.

Here are my thoughts of how to tackle it:

  1. We can expose renewSlotCache() function in JedisCluster : GitHub diff. So when the client runs into the redis.clients.jedis.exceptions.JedisConnectionException, client can run JedisCluster.renewSlotCache () then retry
  2. In ClusterConnectionProvider. getConnectionFromSlot(), we can ping the returned connection to check the connection. However given that Redis master fail is rare, this can be too expensive and I prefer option 1.

Any advice of what can be the best way to renew the slot cache is greatly appreciated!

My Redis Cluster setup

3 master and 3 replicas with these configs

Redis-server --port $PORT --protected-mode $PROTECTED_MODE --cluster-enabled yes --cluster-config-file nodes-${PORT}.conf --cluster-node-timeout $TIMEOUT --appendonly yes --appendfilename appendonly-${PORT}.aof --appenddirname appendonlydir-${PORT} --dbfilename dump-${PORT}.rdb --logfile ${PORT}.log --daemonize yes ${ADDITIONAL_OPTIONS} --cluster-require-full-coverage no --cluster-node-timeout 15000

Reproduce with Sample Code

This is a minimal code,

  • Write a key to slot 5139 and sleep for 50 seconds. My time is --cluster-node-timeout 15000, so 50s sleep should be more than enough for the fail over to finish.
  • manually run redis-cli -p 30001 cluster shards to find the master node for this slot, then Redis-cli -p XXX shutdown to kill this master
  • try cluster.getConnectionFromSlot(slot);
  • try cluster.getConnectionFromSlot(slot); again
package com.example;

import java.util.HashSet;
import java.util.Set;

import redis.clients.jedis.HostAndPort;
import redis.clients.jedis.JedisCluster;
import redis.clients.jedis.util.JedisClusterCRC16;

public class MinimalRedisClusterTest {
  public static void main(String[] args) {
    // Define your cluster nodes. Adjust ports if needed.
    Set<HostAndPort> nodes = new HashSet<>();
    nodes.add(new HostAndPort("127.0.0.1", 30001));
    nodes.add(new HostAndPort("127.0.0.1", 30002));
    nodes.add(new HostAndPort("127.0.0.1", 30003));
    nodes.add(new HostAndPort("127.0.0.1", 30004));
    nodes.add(new HostAndPort("127.0.0.1", 30005));
    nodes.add(new HostAndPort("127.0.0.1", 30006));

    // Create a JedisCluster instance
    int connectionTimeout = 2000; // ms
    int soTimeout = 2000;          // ms
    int maxAttempts = 5;           // 💥 Here is where you set maxAttempts
    String password = null;

    JedisCluster cluster = new JedisCluster(nodes, connectionTimeout, soTimeout, maxAttempts, password, null);
    String key = "test-key";
    int slot = JedisClusterCRC16.getSlot(key);
    System.out.println("Key: " + key + " is in slot: " + slot);

    try {
      Thread.sleep(50000); // Sleep for a second to ensure the cluster is stable

      // Now, try to get the connection from the slot.
      // (Shutdown the Redis master that holds this slot before running this line!)
      try {
        System.out.println("Attempting to get connection from slot...");
        cluster.getConnectionFromSlot(slot);
        System.out.println("Got connection successfully.");
      } catch (Exception e) {
        System.err.println("Error while getting connection from slot:");
        e.printStackTrace();
      }
    } catch (Exception ex) {
      System.err.println("Error initializing JedisCluster:");
      ex.printStackTrace();
    }

    // Second retry
    try {
      try {
        System.out.println("Attempting to get connection from slot...");
        cluster.getConnectionFromSlot(slot);
        System.out.println("Got connection successfully.");
      } catch (Exception e) {
        System.err.println("Error while getting connection from slot:");
        e.printStackTrace();
      }
    } catch (Exception ex) {
      System.err.println("Error initializing JedisCluster:");
      ex.printStackTrace();
    }
  }
}

Then I can see both connections failed

Attempting to get connection from slot...
Error while getting connection from slot:
redis.clients.jedis.exceptions.JedisConnectionException: Failed to connect to 127.0.0.1:30001.
        at redis.clients.jedis.DefaultJedisSocketFactory.connectToFirstSuccessfulHost(DefaultJedisSocketFactory.java:68)
        at redis.clients.jedis.DefaultJedisSocketFactory.createSocket(DefaultJedisSocketFactory.java:94)
        at redis.clients.jedis.Connection.connect(Connection.java:232)
        at redis.clients.jedis.Connection.initializeFromClientConfig(Connection.java:455)
        at redis.clients.jedis.Connection.<init>(Connection.java:77)
        at redis.clients.jedis.ConnectionFactory.lambda$connectionSupplier$1(ConnectionFactory.java:72)
        at redis.clients.jedis.ConnectionFactory.makeObject(ConnectionFactory.java:96)
        at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:571)
        at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:298)
        at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:223)
        at redis.clients.jedis.util.Pool.getResource(Pool.java:38)
        at redis.clients.jedis.ConnectionPool.getResource(ConnectionPool.java:52)
        at redis.clients.jedis.providers.ClusterConnectionProvider.getConnectionFromSlot(ClusterConnectionProvider.java:173)
        at redis.clients.jedis.JedisCluster.getConnectionFromSlot(JedisCluster.java:352)
        at com.example.MinimalRedisClusterTest.main(MinimalRedisClusterTest.java:35)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:279)
        at java.base/java.lang.Thread.run(Thread.java:1575)
        Suppressed: java.net.ConnectException: Connection refused
                at java.base/sun.nio.ch.Net.pollConnect(Native Method)
                at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682)
                at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:549)
                at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:592)
                at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
                at java.base/java.net.Socket.connect(Socket.java:760)
                at redis.clients.jedis.DefaultJedisSocketFactory.connectToFirstSuccessfulHost(DefaultJedisSocketFactory.java:80)
                ... 16 more
Attempting to get connection from slot...
Error while getting connection from slot:
redis.clients.jedis.exceptions.JedisConnectionException: Failed to connect to 127.0.0.1:30001.
        ...

Results for Option 1

Tested the diff in option 1: https://github.com/redis/jedis/compare/master...taowang487:taowang487/renewSlotCache?expand=1

In my sample code the before the second connection I renew the slot cache

diff --git a/src/main/java/com/example/MinimalRedisClusterTest.java b/src/main/java/com/example/MinimalRedisClusterTest.java
index fdf9284..516202c 100644
--- a/src/main/java/com/example/MinimalRedisClusterTest.java
+++ b/src/main/java/com/example/MinimalRedisClusterTest.java
@@ -52,7 +52,7 @@ public class MinimalRedisClusterTest {
       // (Shutdown the Redis master that holds this slot before running this line!)
       try {
         System.out.println("Attempting to get connection from slot...");
+        cluster.renewSlotCache();
         cluster.getConnectionFromSlot(slot);
         System.out.println("Got connection successfully.");
       } catch (Exception e) {
(END)

Now the second connection can pass

Key: test-key is in slot: 5139
Attempting to get connection from slot...
Error while getting connection from slot:
redis.clients.jedis.exceptions.JedisConnectionException: Failed to connect to 127.0.0.1:30006.
        at redis.clients.jedis.DefaultJedisSocketFactory.connectToFirstSuccessfulHost(DefaultJedisSocketFactory.java:68)
        at redis.clients.jedis.DefaultJedisSocketFactory.createSocket(DefaultJedisSocketFactory.java:94)
        at redis.clients.jedis.Connection.connect(Connection.java:232)
        at redis.clients.jedis.Connection.initializeFromClientConfig(Connection.java:455)
        at redis.clients.jedis.Connection.<init>(Connection.java:77)
        at redis.clients.jedis.ConnectionFactory.lambda$connectionSupplier$1(ConnectionFactory.java:72)
        at redis.clients.jedis.ConnectionFactory.makeObject(ConnectionFactory.java:96)
        at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:571)
        at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:298)
        at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:223)
        at redis.clients.jedis.util.Pool.getResource(Pool.java:38)
        at redis.clients.jedis.ConnectionPool.getResource(ConnectionPool.java:52)
        at redis.clients.jedis.providers.ClusterConnectionProvider.getConnectionFromSlot(ClusterConnectionProvider.java:174)
        at redis.clients.jedis.JedisCluster.getConnectionFromSlot(JedisCluster.java:352)
        at com.example.MinimalRedisClusterTest.main(MinimalRedisClusterTest.java:35)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:279)
        at java.base/java.lang.Thread.run(Thread.java:1575)
        Suppressed: java.net.ConnectException: Connection refused
                at java.base/sun.nio.ch.Net.pollConnect(Native Method)
                at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682)
                at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:549)
                at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:592)
                at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
                at java.base/java.net.Socket.connect(Socket.java:760)
                at redis.clients.jedis.DefaultJedisSocketFactory.connectToFirstSuccessfulHost(DefaultJedisSocketFactory.java:80)
                ... 16 more
Attempting to get connection from slot...
Got connection successfully.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions