Skip to content

Janitor's 10-minute delay causes silent failures in local development #1369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
jumski opened this issue May 9, 2025 · 4 comments · Fixed by #1383
Closed
2 tasks done

Janitor's 10-minute delay causes silent failures in local development #1369

jumski opened this issue May 9, 2025 · 4 comments · Fixed by #1383
Assignees
Labels
bug Something isn't working

Comments

@jumski
Copy link

jumski commented May 9, 2025

Bug report

  • I confirm this is a bug with Supabase, not with my own application.
  • I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

The Janitor process in Realtime has a 10-minute startup delay (janitor_run_after_in_ms: 600000) before creating partitions for the realtime.messages table. This causes silent failures when using realtime.send() in local development after database resets, as the function expects partitions to exist but doesn't create them on demand.

To Reproduce

Steps to reproduce the behavior:

Scenario 1: Fresh Install or First Startup

  1. Set up Supabase locally
  2. Immediately try to use realtime.send() in a function or query
  3. The function appears to succeed but no messages are stored
  4. Check realtime.messages and find no data

Scenario 2: After Database Reset

  1. Run supabase db reset (which wipes out existing partitions)
  2. Try to use realtime.send() in a function or query
  3. The function appears to succeed but no messages are stored
  4. Behavior persists until the Janitor's next scheduled run (up to 4 hours later)

Example SQL that fails silently:

SELECT realtime.send(
  jsonb_build_object('data', 'test'),
  'test_event',
  'test_topic',
  true
);

Note: Simply restarting Supabase without a database reset doesn't cause this issue, as partitions persist in the database. The Janitor proactively creates partitions for today and the next 3 days, so day transitions are handled automatically once it has run initially.

Expected behavior

Either:

  1. The Janitor should run with minimal delay in development environments
  2. The realtime.send() function should create the necessary partition if it doesn't exist
  3. At minimum, realtime.send() should return an error rather than failing silently

System information

  • OS: Linux
  • Version of Supabase CLI: v2.21.1

Additional context

I found this issue while writing pgTAP tests that needed to verify Realtime notifications were working. Messages were silently failing to be stored because partitions didn't exist.

I've verified that the Janitor is enabled in the Supabase CLI environment (RUN_JANITOR=true), but the 10-minute delay is causing problems for development workflows where database resets are common.

While this 10-minute delay might be reasonable in production where services run continuously, it's particularly problematic in development environments where developers frequently reset their databases.

The current workaround is to manually create the partition:

DO $$
DECLARE
  today date := current_date;
  partition_name text := 'messages_' || to_char(today, 'YYYY_MM_DD');
BEGIN
  EXECUTE format(
    'CREATE TABLE IF NOT EXISTS realtime.%I PARTITION OF realtime.messages FOR VALUES FROM (%L) TO (%L)',
    partition_name,
    today,
    today + interval '1 day'
  );
END;
$$;

Note: I'm not an Elixir developer, so I've used Claude and ripgrep to help analyze the codebase. Please correct me if I've misunderstood anything about how the Janitor configuration works.

@jumski jumski added the bug Something isn't working label May 9, 2025
@edgurgel
Copy link
Member

edgurgel commented May 18, 2025

Hey @jumski,

Thanks for the detailed report.

When we seed the environment it should create the migrations & initial partitions. We've recently fixed an issue on the seeding process which might've impacted your environment? 🤔 See https://github.com/supabase/realtime/blob/main/run.sh#L65-L68

When Realtime boots you should see "Seeding selfhosted Realtime" and hopefully no errors. Assuming that SEED_SELF_HOST=true which should be the case.

In production we also ensure that when tenants are created the migrations are executed right away as you can see here and this should cover the first partitions.

Oh and finally (which should catch most problems) when Realtime connects to the database it checks if all migrations ran etc and if partitions were created:

with res when res in [:ok, :noop] <- Migrations.run_migrations(tenant),
:ok <- Migrations.create_partitions(db_conn_pid) do
. So as long as there is some activity that causes Realtime to connect it should have partitions set-up.

But saying that I don't know if this cover all scenarios you described. Maybe we could start janitor on boot for development purposes? 🤔

Maybe for self host it should run migrations AND check for partitions as well? I will try to replicate the scenarios you described.

@jumski
Copy link
Author

jumski commented May 20, 2025

Hey @edgurgel ! Thanks for the effort of explaining the process.

I noticed this behaviour when running pgtap tests via supabase db test on CLI in 2.21.1.

This is my config.toml (everything disabled except db and realtime):

project_id = "core"

[db]
port = 50422
shadow_port = 50420
major_version = 15

# disable unused features
[db.seed]
enabled = true
sql_paths = [
  "seed.sql",
]

[realtime]
enabled = true

[api]
port = 50421  # Add custom port for Kong
enabled = false  # Keep disabled as per your config
[db.pooler]
enabled = false
[edge_runtime]
enabled = false
[studio]
enabled = false
[inbucket]
enabled = false
[analytics]
enabled = false
[storage]
enabled = false
[auth]
enabled = false

I confirm i see "Seeding selfhosted Realtime" message in the logs, but the partition is not there after a clean start.

I think it is reasonable to assume people do db reset a lot/often and it would be great to have the partition up already. At least that's my workflow - make changes to migrations -> db reset -> db test, and with this workflow i need workaround in my test to ensure the partition is up.

Imo starting janitor asap / on boot would be best.

@edgurgel
Copy link
Member

@jumski v2.35.4 should've hopefully solved this issue. Please feel free to reopen this ticket if the issue persists.

@jumski
Copy link
Author

jumski commented May 28, 2025

Tbanks @edgurgel ! Will be testing this soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants