Skip to content

System events requests replied by subset of servers #6919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
giovanni-riela opened this issue May 22, 2025 · 0 comments
Open

System events requests replied by subset of servers #6919

giovanni-riela opened this issue May 22, 2025 · 0 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@giovanni-riela
Copy link

Observed behavior

Short description

Each nats pod knows each other but when querying "all servers" subject I obtain 2 replies from just one or (very rarely) 2 of them. The main effect of this is I miss the window of closure of some connections because the wrong server replied. This applies to all $SYS.REQ.SERVER.PING.* and $SYS.REQ.ACCOUNT.> topics.

Behavior

I am working with a nats cluster with 3 pods and trying to poll information about account connections (in_bytes, out_msgs, etc.). After noticing my DB was not closing some connections and tried manually making the request via CLI and this is what I got:

nats --server nats://localhost:4222,nats://localhost:4223,nats://localhost:4224 --creds dev_core.creds req --timeout 10s --replies 3 '$SYS.REQ.SERVER.PING.CONNZ' '' | jq -r .server

17:49:46 Sending request on "$SYS.REQ.SERVER.PING.CONNZ"
17:49:46 Received with rtt 100.912292ms
17:49:46 Received with rtt 107.530959ms
{
  "name": "nats-1",
  "host": "0.0.0.0",
  "id": "<SERVER_ID>",
  "cluster": "nats",
  "ver": "2.11.1-binary",
  "jetstream": true,
  "flags": 7,
  "seq": 1155199,
  "time": "2025-05-22T15:49:46.332294185Z"
}
{
  "name": "nats-1",
  "host": "0.0.0.0",
  "id": "<SERVER_ID>",
  "cluster": "nats",
  "ver": "2.11.1-binary",
  "jetstream": true,
  "flags": 7,
  "seq": 1155200,
  "time": "2025-05-22T15:49:46.334570425Z"
}

After a few retries:

nats --server nats://localhost:4222,nats://localhost:4223,nats://localhost:4224 --creds dev_core.creds req --replies 3 '$SYS.REQ.SERVER.PING.CONNZ' '' | jq -r .server

17:54:53 Sending request on "$SYS.REQ.SERVER.PING.CONNZ"
17:54:53 Received with rtt 88.672084ms
17:54:53 Received with rtt 93.720625ms
{
  "name": "nats-2",
  "host": "0.0.0.0",
  "id": "<SERVER_ID>",
  "cluster": "nats",
  "ver": "2.11.1-binary",
  "jetstream": true,
  "flags": 7,
  "seq": 7640492,
  "time": "2025-05-22T15:54:53.134881309Z"
}
{
  "name": "nats-2",
  "host": "0.0.0.0",
  "id": "<SERVER_ID>",
  "cluster": "nats",
  "ver": "2.11.1-binary",
  "jetstream": true,
  "flags": 7,
  "seq": 7640493,
  "time": "2025-05-22T15:54:53.139403479Z"
}

And querying for a specific connection

nats --server nats://localhost:4222,nats://localhost:4223,nats://localhost:4224 --creds dev_core.creds req --
replies 3 '$SYS.REQ.SERVER.PING.CONNZ' '{"cid": 1558}' | jq -r
                    
17:57:08 Sending request on "$SYS.REQ.SERVER.PING.CONNZ"
17:57:08 Received with rtt 28.971875ms
17:57:08 Received with rtt 28.991375ms
{
  "server": {
    "name": "nats-0",
    "host": "0.0.0.0",
    "id": "<SERVER_ID>",
    "cluster": "nats",
    "ver": "2.11.1-binary",
    "jetstream": true,
    "flags": 7,
    "seq": 2204899,
    "time": "2025-05-22T15:57:08.170533416Z"
  },
  "data": {
    "server_id": "<SERVER_ID>",
    "now": "2025-05-22T15:57:08.170504616Z",
    "num_connections": 0,
    "total": 608,
    "offset": 0,
    "limit": 1,
    "connections": []
  }
}
{
  "server": {
    "name": "nats-0",
    "host": "0.0.0.0",
    "id": "<SERVER_ID>",
    "cluster": "nats",
    "ver": "2.11.1-binary",
    "jetstream": true,
    "flags": 7,
    "seq": 2204900,
    "time": "2025-05-22T15:57:08.170621396Z"
  },
  "data": {
    "server_id": "<SERVER_ID>",
    "now": "2025-05-22T15:57:08.170521466Z",
    "num_connections": 0,
    "total": 2,
    "offset": 0,
    "limit": 1,
    "connections": []
  }
}

Second try:

nats --server nats://localhost:4222,nats://localhost:4223,nats://localhost:4224 --creds dev_core.creds req --
replies 3 '$SYS.REQ.SERVER.PING.CONNZ' '{"cid": 1558}' | jq -r 

17:57:10 Sending request on "$SYS.REQ.SERVER.PING.CONNZ"
17:57:10 Received with rtt 30.211917ms
17:57:10 Received with rtt 30.287042ms
{
  "server": {
    "name": "nats-1",
    "host": "0.0.0.0",
    "id": "<SERVER_ID>",
    "cluster": "nats",
    "ver": "2.11.1-binary",
    "jetstream": true,
    "flags": 7,
    "seq": 1161292,
    "time": "2025-05-22T15:57:10.586998265Z"
  },
  "data": {
    "server_id": "<SERVER_ID>",
    "now": "2025-05-22T15:57:10.586970545Z",
    "num_connections": 1,
    "total": 631,
    "offset": 0,
    "limit": 1,
    "connections": [
      {
        "cid": 1558,
        "kind": "Client",
        "type": "nats",
        "ip": "10.224.0.3",
        "port": 35608,
        "start": "2025-05-22T15:04:21.74362088Z",
        "last_activity": "2025-05-22T15:04:21.74861689Z",
        "rtt": "149µs",
        "uptime": "52m48s",
        "idle": "52m48s",
        "pending_bytes": 0,
        "in_msgs": 1,
        "out_msgs": 3,
        "in_bytes": 1238,
        "out_bytes": 1025,
        "subscriptions": 1,
        "name": "nsc_push",
        "lang": "go",
        "version": "1.23.0"
      }
    ]
  }
}
{
  "server": {
    "name": "nats-1",
    "host": "0.0.0.0",
    "id": "<SERVER_ID>",
    "cluster": "nats",
    "ver": "2.11.1-binary",
    "jetstream": true,
    "flags": 7,
    "seq": 1161293,
    "time": "2025-05-22T15:57:10.587076094Z"
  },
  "data": {
    "server_id": "<SERVER_ID>",
    "now": "2025-05-22T15:57:10.586989605Z",
    "num_connections": 1,
    "total": 1,
    "offset": 0,
    "limit": 1,
    "connections": [
      {
        "cid": 1558,
        "kind": "Client",
        "type": "nats",
        "ip": "10.224.0.3",
        "port": 35608,
        "start": "2025-05-22T15:04:21.74362088Z",
        "last_activity": "2025-05-22T15:04:21.74861689Z",
        "rtt": "149µs",
        "uptime": "52m48s",
        "idle": "52m48s",
        "pending_bytes": 0,
        "in_msgs": 1,
        "out_msgs": 3,
        "in_bytes": 1238,
        "out_bytes": 1025,
        "subscriptions": 1,
        "name": "nsc_push",
        "lang": "go",
        "version": "1.23.0"
      }
    ]
  }
}

I will be posting more examples if required, but I get the same pattern happens when calling nats server list: only one of the servers is randomly shown (even when specifying the number of expected servers). The same exact thing happens even if I do not include all 3 addresses in the --server flag and a random server replies at each call.

Expected behavior

When making a request to $SYS.REQ.SERVER.PING.* or $SYS.REQ.ACCOUNT.> endpoints obtain responses from all 3 servers.

Server and client version

Server Version: 2.11.1-binary
Client Version (golang): v1.41.2
Client Version (NATS CLI): 0.2.3

Host environment

Server config:

{
  "cluster": {
    "name": "nats",
    "no_advertise": true,
    "port": 6222,
    "routes": [
      "nats://nats-0.nats-headless:6222",
      "nats://nats-1.nats-headless:6222",
      "nats://nats-2.nats-headless:6222"
    ]
  },
  "http_port": 8222,
  "jetstream": {
    "max_file_store": 1Gi,
    "max_memory_store": 0,
    "store_dir": "/data"
  },
  "lame_duck_duration": "30s",
  "lame_duck_grace_period": "10s",
  "pid_file": "/var/run/nats/nats.pid",
  "port": 4222,
  "resolver": {
    "dir": "/data/resolver",
    "interval": "2m",
    "timeout": "1.9s",
    "type": "full"
  },
  "server_name": $SERVER_NAME
}

Steps to reproduce

Setup Cluster and query System events subjects

@giovanni-riela giovanni-riela added the defect Suspected defect such as a bug or regression label May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

1 participant