Skip to content

CASSGO-72 Connection trouble with Amazon Keyspaces #1873

@yomipq

Description

@yomipq

I have a trouble connecting to AWS Amazon Keyspaces (for Apache Cassandra). My program on EC2 can connect to Amazon Keyspaces without any issues for a while, but after a few days or weeks, it loses the connection and any query causes the error below.

gocql: no hosts available in the pool

Go version: 1.23.4
GoCQL version: 1.7.0

I built the program with gocql_debug enabled, and I got following logs.

2025/03/25 03:20:29 gocql: Session.handleNodeConnected: 172.16.1.14:9142
2025/03/25 03:20:29 gocql: conns of pool after stopped "172.16.1.14": 2
2025/03/25 03:20:29 gocql: Session.handleNodeConnected: 172.16.1.28:9142
2025/03/25 03:20:29 gocql: conns of pool after stopped "172.16.1.28": 2
2025/03/25 03:21:29 Session.ring:[172.16.1.14:UP][172.16.1.28:UP]

...

2025/03/26 15:11:11 gocql: unable to dial "[HostInfo hostname=\"\" connectAddress=\"127.0.0.1\" peer=\"<nil>\" rpc_address=\"127.0.0.1\" broadcast_address=\"127.0.0.1\" preferred_ip=\"<nil>\" connect_addr=\"127.0.0.1\" connect_addr_source=\"connect_address\" port=9142 data_centre=\"ap-northeast-1\" rack=\"ap-northeast-1\" host_id=\"be0f3a14-e107-3fee-a5e5-415c10539abd\" version=\"v3.11.2\" state=UP num_tokens=0]": dial tcp 127.0.0.1:9142: connect: connection refused
2025/03/26 15:11:11 gocql: filling stopped "127.0.0.1": dial tcp 127.0.0.1:9142: connect: connection refused
2025/03/26 15:11:11 gocql: conns of pool after stopped "127.0.0.1": 0
2025/03/26 15:11:11 gocql: Session.handleNodeDown: 127.0.0.1:9142
2025/03/26 15:11:11 gocql: unable to refresh ring: get existing host=[HostInfo hostname="" connectAddress="172.16.1.14" peer="172.16.1.14" rpc_address="172.16.1.14" broadcast_address="<nil>" preferred_ip="172.16.1.14" connect_addr="172.16.1.14" connect_addr_source="connect_address" port=9142 data_centre="ap-northeast-1" rack="ap-northeast-1" host_id="be0f3a14-e107-3fee-a5e5-415c10539abd" version="v3.11.2" state=UP num_tokens=1] from prevHosts: cannot find host
2025/03/26 15:11:29 Session.ring:[127.0.0.1:DOWN][172.16.1.28:UP]

...

2025/03/26 22:43:35 gocql: unable to dial "[HostInfo hostname=\"\" connectAddress=\"127.0.0.1\" peer=\"<nil>\" rpc_address=\"127.0.0.1\" broadcast_address=\"127.0.0.1\" preferred_ip=\"<nil>\" connect_addr=\"127.0.0.1\" connect_addr_source=\"connect_address\" port=9142 data_centre=\"ap-northeast-1\" rack=\"ap-northeast-1\" host_id=\"b666465e-cb85-3efa-b3ab-f6cf139e5a39\" version=\"v3.11.2\" state=UP num_tokens=0]": dial tcp 127.0.0.1:9142: connect: connection refused
2025/03/26 22:43:35 gocql: filling stopped "127.0.0.1": dial tcp 127.0.0.1:9142: connect: connection refused
2025/03/26 22:43:35 gocql: conns of pool after stopped "127.0.0.1": 0
2025/03/26 22:43:35 gocql: Session.handleNodeDown: 127.0.0.1:9142
2025/03/26 22:43:35 gocql: unable to refresh ring: get existing host=[HostInfo hostname="" connectAddress="172.16.1.28" peer="172.16.1.28" rpc_address="172.16.1.28" broadcast_address="<nil>" preferred_ip="172.16.1.28" connect_addr="172.16.1.28" connect_addr_source="connect_address" port=9142 data_centre="ap-northeast-1" rack="ap-northeast-1" host_id="b666465e-cb85-3efa-b3ab-f6cf139e5a39" version="v3.11.2" state=UP num_tokens=1] from prevHosts: cannot find host
2025/03/26 22:44:29 Session.ring:[127.0.0.1:DOWN][127.0.0.1:DOWN]

On startup, It has two hosts 172.16.1.14 and 172.16.1.28. After a while, the connection to 172.16.1.14 got lost with error cannot find host and try to reconnect to 127.0.0.1 instead of 172.16.1.14. After another while, the other connection also got lost with the same error and also try to reconnect to 127.0.0.1 instead of 172.16.1.28. As a result, all connections got lost.

So here are my questions:

First, in what situation the error cannot find host occur? Is this an expected error? I read the source code, but I couldn't understand it well.
Second, what makes it reconnect to 127.0.0.1 instead of original address? Is this an expected behavior?

If anyone has any idea, please let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions