Description
Bug report
-
I confirm this is a bug with Supabase, not with my own application.
-
I confirm I have searched the Docs, GitHub Discussions, and Discord.
Describe the bug
Currently https://github.com/supabase/realtime/blob/cd04f2f744834296b5a4b3e360e95c3fab5f9165/rel/env.sh.eex is preventing any way of running the Postgres (or any other) Cluster Strategy.
None of the specific cases can be met/configured for a selfhosted instance. It is difficult/impossible/fragile to get the ip
variable to actually configure, so it falls back to 127.0.0.1
This produces cluster attempt logs such as SYN[[email protected]]
, and the cluster strategy breaks
To Reproduce
I'm moving a lot of this over to a local k8s cluster, so this reproduction steps may not be as clear as they should be.
I think the supabase docker compose file could be tweaked with CLUSTER_STRATEGIES=POSTGRES
to try and get the cluster strategy to work.
The realtime config will have to be duplicated to run 2 instances, as the realtime containers with broken cluster strategy will fight over the same replication slot so SLOT_NAME_SUFFIX
will need to be unique to each container
Both containers will connect to their respective replication slots, and will handle postgres realtime updates fine.
However, broadcast between the instances will not work (broadcast will only work within an instance).
No idea how to direct traffic between the 2 instances (previously, I have used external HAProxy. k8s handles that automatically as a service)
This is because the first step of env.sh.eex
is to try and extract the instance's IP address from etc/hosts
. If it doesnt exactly match the fly.io
config, it will fail to an empty string.
Later on - as ip
is an empty string and no other conditions are met - it defaults to 127.0.0.1
Expected behavior
A way to set ip
, RELEASE_DISTRIBUTION
, RELEASE_NODE
manually allowing for more advanced selfhosters to configure the clustering. Perhaps some additional logging about this could be helpful
Additional Context
I commented on the issue #760 (specifically #760 (comment) ) regarding this, with a fix that is working for me.
This includes logs of the cluster strategy working between multiple instances, with lots of things like Node [email protected] has joined the cluster, sending discover message
which are completely absent when it fails to configure an IP and falls back to using 127.0.0.1
ip addresses.
I have rebuilt the image "internally" with this new env.sh.eex
and have been using & testing it. There is now only 1 replication slot being used, and broadcast between instances appears to be working correctly.
I'm not great with bash and I can't test this within your environment. I also suck at GH pull requests etc, so I'll let you form the final fix for this :)
Again, sorry for the poor bug report, but hopefully it is enough