Skip to content

Postgres takes 5 minutes to start up and incorrectly reports readiness #3798

Open
@mausch

Description

@mausch

Overview

Create a trivial PostgresCluster with a single instance. Scale it down to 0. Scale it back up to 1 (actually I think the operator automatically brings it back to 1).
Logs show that the instance takes 5 minutes to actually be ready. During those 5 minutes the pod reports as ready and live, so applications try to connect to it. Because it's not actually ready all connections fail.

So the issues here are:

  • 5 minutes is an inordinate amount of time to start Postgres. A regular Postgres instance takes less than 10 seconds to start. I understand this is probably due to leader election?
  • Readiness/liveness seem incorrect ( see related issue No disk space crashloop but pod healthy #3788 )

Logs:

2023-12-05 08:21:11,373 INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-12-05 08:21:11,385 WARNING: Postgresql is not running.
2023-12-05 08:21:11,385 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:21:11,436 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7307677244695171151
  Database cluster state: shut down
  pg_control last modified: Mon Dec  4 17:12:57 2023
  Latest checkpoint location: 36/59000028
  Latest checkpoint's REDO location: 36/59000028
  Latest checkpoint's REDO WAL file: 000000040000003600000059
  Latest checkpoint's TimeLineID: 4
  Latest checkpoint's PrevTimeLineID: 4
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:13401
  Latest checkpoint's NextOID: 4996957
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 722
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Mon Dec  4 17:12:57 2023
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 30f36bb6d2b271d381adba6cd1fcdbbd15c6e243a7f735f84e6675fe1eef2a16
2023-12-05 08:21:11,438 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:21:11,639 INFO: starting as readonly because i had the session lock
2023-12-05 08:21:13.034 UTC [122] LOG:  pgaudit extension initialized
2023-12-05 08:21:13,066 INFO: postmaster pid=122
/tmp/postgres:5432 - no response
2023-12-05 08:21:13.116 UTC [122] LOG:  redirecting log output to logging collector process
2023-12-05 08:21:13.116 UTC [122] HINT:  Future log output will appear in directory "log".
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2023-12-05 08:21:21,382 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:21:21,508 INFO: updated leader lock during restarting after failure
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
[...keeps repeating...]
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2023-12-05 08:26:11,382 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:26:11,441 INFO: updated leader lock during restarting after failure
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - accepting connections
2023-12-05 08:26:21,422 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:26:21,422 INFO: establishing a new patroni connection to the postgres cluster
2023-12-05 08:26:21,700 INFO: promoted self to leader because I had the session lock
server promoting
2023-12-05 08:26:31,382 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:26:31,484 INFO: updated leader lock during promote
2023-12-05 08:26:41,438 INFO: no action. I am (postgres-instance1-8mfm-0), the leader with the lock
2023-12-05 08:26:51,439 INFO: no action. I am (postgres-instance1-8mfm-0), the leader with the lock

Environment

Please provide the following details:

  • Platform: k3s
  • Platform Version: 1.28
  • PGO Image Tag: ubi8-16.0-3.4-0
  • Postgres Version: 16
  • Storage: EBS

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions