Open
Description
Overview
Create a trivial PostgresCluster with a single instance. Scale it down to 0. Scale it back up to 1 (actually I think the operator automatically brings it back to 1).
Logs show that the instance takes 5 minutes to actually be ready. During those 5 minutes the pod reports as ready and live, so applications try to connect to it. Because it's not actually ready all connections fail.
So the issues here are:
- 5 minutes is an inordinate amount of time to start Postgres. A regular Postgres instance takes less than 10 seconds to start. I understand this is probably due to leader election?
- Readiness/liveness seem incorrect ( see related issue No disk space crashloop but pod healthy #3788 )
Logs:
2023-12-05 08:21:11,373 INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-12-05 08:21:11,385 WARNING: Postgresql is not running.
2023-12-05 08:21:11,385 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:21:11,436 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202307071
Database system identifier: 7307677244695171151
Database cluster state: shut down
pg_control last modified: Mon Dec 4 17:12:57 2023
Latest checkpoint location: 36/59000028
Latest checkpoint's REDO location: 36/59000028
Latest checkpoint's REDO WAL file: 000000040000003600000059
Latest checkpoint's TimeLineID: 4
Latest checkpoint's PrevTimeLineID: 4
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:13401
Latest checkpoint's NextOID: 4996957
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 722
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Mon Dec 4 17:12:57 2023
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: 30f36bb6d2b271d381adba6cd1fcdbbd15c6e243a7f735f84e6675fe1eef2a16
2023-12-05 08:21:11,438 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:21:11,639 INFO: starting as readonly because i had the session lock
2023-12-05 08:21:13.034 UTC [122] LOG: pgaudit extension initialized
2023-12-05 08:21:13,066 INFO: postmaster pid=122
/tmp/postgres:5432 - no response
2023-12-05 08:21:13.116 UTC [122] LOG: redirecting log output to logging collector process
2023-12-05 08:21:13.116 UTC [122] HINT: Future log output will appear in directory "log".
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2023-12-05 08:21:21,382 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:21:21,508 INFO: updated leader lock during restarting after failure
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
[...keeps repeating...]
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2023-12-05 08:26:11,382 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:26:11,441 INFO: updated leader lock during restarting after failure
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - accepting connections
2023-12-05 08:26:21,422 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:26:21,422 INFO: establishing a new patroni connection to the postgres cluster
2023-12-05 08:26:21,700 INFO: promoted self to leader because I had the session lock
server promoting
2023-12-05 08:26:31,382 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:26:31,484 INFO: updated leader lock during promote
2023-12-05 08:26:41,438 INFO: no action. I am (postgres-instance1-8mfm-0), the leader with the lock
2023-12-05 08:26:51,439 INFO: no action. I am (postgres-instance1-8mfm-0), the leader with the lock
Environment
Please provide the following details:
- Platform: k3s
- Platform Version: 1.28
- PGO Image Tag: ubi8-16.0-3.4-0
- Postgres Version: 16
- Storage: EBS