-
-
Notifications
You must be signed in to change notification settings - Fork 358
Janitor's 10-minute delay causes silent failures in local development #1369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @jumski, Thanks for the detailed report. When we seed the environment it should create the migrations & initial partitions. We've recently fixed an issue on the seeding process which might've impacted your environment? 🤔 See https://github.com/supabase/realtime/blob/main/run.sh#L65-L68 When Realtime boots you should see In production we also ensure that when tenants are created the migrations are executed right away as you can see here and this should cover the first partitions. Oh and finally (which should catch most problems) when Realtime connects to the database it checks if all migrations ran etc and if partitions were created: realtime/lib/realtime/tenants/connect.ex Lines 213 to 214 in 386aeb3
But saying that I don't know if this cover all scenarios you described. Maybe we could start janitor on boot for development purposes? 🤔 Maybe for self host it should run migrations AND check for partitions as well? I will try to replicate the scenarios you described. |
Hey @edgurgel ! Thanks for the effort of explaining the process. I noticed this behaviour when running pgtap tests via supabase db test on CLI in 2.21.1. This is my config.toml (everything disabled except db and realtime): project_id = "core"
[db]
port = 50422
shadow_port = 50420
major_version = 15
# disable unused features
[db.seed]
enabled = true
sql_paths = [
"seed.sql",
]
[realtime]
enabled = true
[api]
port = 50421 # Add custom port for Kong
enabled = false # Keep disabled as per your config
[db.pooler]
enabled = false
[edge_runtime]
enabled = false
[studio]
enabled = false
[inbucket]
enabled = false
[analytics]
enabled = false
[storage]
enabled = false
[auth]
enabled = false I confirm i see "Seeding selfhosted Realtime" message in the logs, but the partition is not there after a clean start. I think it is reasonable to assume people do db reset a lot/often and it would be great to have the partition up already. At least that's my workflow - make changes to migrations -> db reset -> db test, and with this workflow i need workaround in my test to ensure the partition is up. Imo starting janitor asap / on boot would be best. |
@jumski v2.35.4 should've hopefully solved this issue. Please feel free to reopen this ticket if the issue persists. |
Tbanks @edgurgel ! Will be testing this soon |
Bug report
Describe the bug
The Janitor process in Realtime has a 10-minute startup delay (
janitor_run_after_in_ms: 600000
) before creating partitions for therealtime.messages
table. This causes silent failures when usingrealtime.send()
in local development after database resets, as the function expects partitions to exist but doesn't create them on demand.To Reproduce
Steps to reproduce the behavior:
Scenario 1: Fresh Install or First Startup
realtime.send()
in a function or queryrealtime.messages
and find no dataScenario 2: After Database Reset
supabase db reset
(which wipes out existing partitions)realtime.send()
in a function or queryExample SQL that fails silently:
Note: Simply restarting Supabase without a database reset doesn't cause this issue, as partitions persist in the database. The Janitor proactively creates partitions for today and the next 3 days, so day transitions are handled automatically once it has run initially.
Expected behavior
Either:
realtime.send()
function should create the necessary partition if it doesn't existrealtime.send()
should return an error rather than failing silentlySystem information
Additional context
I found this issue while writing pgTAP tests that needed to verify Realtime notifications were working. Messages were silently failing to be stored because partitions didn't exist.
I've verified that the Janitor is enabled in the Supabase CLI environment (
RUN_JANITOR=true
), but the 10-minute delay is causing problems for development workflows where database resets are common.While this 10-minute delay might be reasonable in production where services run continuously, it's particularly problematic in development environments where developers frequently reset their databases.
The current workaround is to manually create the partition:
Note: I'm not an Elixir developer, so I've used Claude and ripgrep to help analyze the codebase. Please correct me if I've misunderstood anything about how the Janitor configuration works.
The text was updated successfully, but these errors were encountered: