Janitor's 10-minute delay causes silent failures in local development #1369

jumski · 2025-05-09T16:38:20Z

Bug report

I confirm this is a bug with Supabase, not with my own application.
I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

The Janitor process in Realtime has a 10-minute startup delay (janitor_run_after_in_ms: 600000) before creating partitions for the realtime.messages table. This causes silent failures when using realtime.send() in local development after database resets, as the function expects partitions to exist but doesn't create them on demand.

To Reproduce

Steps to reproduce the behavior:

Scenario 1: Fresh Install or First Startup

Set up Supabase locally
Immediately try to use realtime.send() in a function or query
The function appears to succeed but no messages are stored
Check realtime.messages and find no data

Scenario 2: After Database Reset

Run supabase db reset (which wipes out existing partitions)
Try to use realtime.send() in a function or query
The function appears to succeed but no messages are stored
Behavior persists until the Janitor's next scheduled run (up to 4 hours later)

Example SQL that fails silently:

SELECT realtime.send(
  jsonb_build_object('data', 'test'),
  'test_event',
  'test_topic',
  true
);

Note: Simply restarting Supabase without a database reset doesn't cause this issue, as partitions persist in the database. The Janitor proactively creates partitions for today and the next 3 days, so day transitions are handled automatically once it has run initially.

Expected behavior

Either:

The Janitor should run with minimal delay in development environments
The realtime.send() function should create the necessary partition if it doesn't exist
At minimum, realtime.send() should return an error rather than failing silently

System information

OS: Linux
Version of Supabase CLI: v2.21.1

Additional context

I found this issue while writing pgTAP tests that needed to verify Realtime notifications were working. Messages were silently failing to be stored because partitions didn't exist.

I've verified that the Janitor is enabled in the Supabase CLI environment (RUN_JANITOR=true), but the 10-minute delay is causing problems for development workflows where database resets are common.

While this 10-minute delay might be reasonable in production where services run continuously, it's particularly problematic in development environments where developers frequently reset their databases.

The current workaround is to manually create the partition:

DO $$
DECLARE
  today date := current_date;
  partition_name text := 'messages_' || to_char(today, 'YYYY_MM_DD');
BEGIN
  EXECUTE format(
    'CREATE TABLE IF NOT EXISTS realtime.%I PARTITION OF realtime.messages FOR VALUES FROM (%L) TO (%L)',
    partition_name,
    today,
    today + interval '1 day'
  );
END;
$$;

Note: I'm not an Elixir developer, so I've used Claude and ripgrep to help analyze the codebase. Please correct me if I've misunderstood anything about how the Janitor configuration works.

The text was updated successfully, but these errors were encountered:

edgurgel · 2025-05-18T21:25:40Z

Hey @jumski,

Thanks for the detailed report.

When we seed the environment it should create the migrations & initial partitions. We've recently fixed an issue on the seeding process which might've impacted your environment? 🤔 See https://github.com/supabase/realtime/blob/main/run.sh#L65-L68

When Realtime boots you should see "Seeding selfhosted Realtime" and hopefully no errors. Assuming that SEED_SELF_HOST=true which should be the case.

In production we also ensure that when tenants are created the migrations are executed right away as you can see here and this should cover the first partitions.

Oh and finally (which should catch most problems) when Realtime connects to the database it checks if all migrations ran etc and if partitions were created:

realtime/lib/realtime/tenants/connect.ex

Lines 213 to 214 in 386aeb3

    
           with res when res in [:ok, :noop] <- Migrations.run_migrations(tenant), 
        
                :ok <- Migrations.create_partitions(db_conn_pid) do

. So as long as there is some activity that causes Realtime to connect it should have partitions set-up.

But saying that I don't know if this cover all scenarios you described. Maybe we could start janitor on boot for development purposes? 🤔

Maybe for self host it should run migrations AND check for partitions as well? I will try to replicate the scenarios you described.

jumski · 2025-05-20T06:47:55Z

Hey @edgurgel ! Thanks for the effort of explaining the process.

I noticed this behaviour when running pgtap tests via supabase db test on CLI in 2.21.1.

This is my config.toml (everything disabled except db and realtime):

project_id = "core"

[db]
port = 50422
shadow_port = 50420
major_version = 15

# disable unused features
[db.seed]
enabled = true
sql_paths = [
  "seed.sql",
]

[realtime]
enabled = true

[api]
port = 50421  # Add custom port for Kong
enabled = false  # Keep disabled as per your config
[db.pooler]
enabled = false
[edge_runtime]
enabled = false
[studio]
enabled = false
[inbucket]
enabled = false
[analytics]
enabled = false
[storage]
enabled = false
[auth]
enabled = false

I confirm i see "Seeding selfhosted Realtime" message in the logs, but the partition is not there after a clean start.

I think it is reasonable to assume people do db reset a lot/often and it would be great to have the partition up already. At least that's my workflow - make changes to migrations -> db reset -> db test, and with this workflow i need workaround in my test to ensure the partition is up.

Imo starting janitor asap / on boot would be best.

edgurgel · 2025-05-26T21:52:54Z

@jumski v2.35.4 should've hopefully solved this issue. Please feel free to reopen this ticket if the issue persists.

jumski · 2025-05-28T05:39:16Z

Tbanks @edgurgel ! Will be testing this soon

jumski added the bug Something isn't working label May 9, 2025

edgurgel self-assigned this May 23, 2025

edgurgel mentioned this issue May 23, 2025

fix: ensure that messages partitions are created for self hosted environment. #1383

Merged

edgurgel closed this as completed in #1383 May 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Janitor's 10-minute delay causes silent failures in local development #1369

Janitor's 10-minute delay causes silent failures in local development #1369

jumski commented May 9, 2025

edgurgel commented May 18, 2025 •

edited

Loading

Uh oh!

jumski commented May 20, 2025

Uh oh!

edgurgel commented May 26, 2025

Uh oh!

jumski commented May 28, 2025

Uh oh!

Uh oh!

Janitor's 10-minute delay causes silent failures in local development #1369

Janitor's 10-minute delay causes silent failures in local development #1369

Comments

jumski commented May 9, 2025

Bug report

Describe the bug

To Reproduce

Expected behavior

System information

Additional context

edgurgel commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jumski commented May 20, 2025

Uh oh!

edgurgel commented May 26, 2025

Uh oh!

jumski commented May 28, 2025

Uh oh!

edgurgel commented May 18, 2025 •

edited

Loading