Skip to content

Commit 1f06f33

Browse files
authored
docs: update the database restore procedure (#115)
1 parent 5e42009 commit 1f06f33

File tree

2 files changed

+62
-23
lines changed

2 files changed

+62
-23
lines changed

docs/for-ops/disaster-recovery/overview.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,11 @@ This area covers some potential scenarios, when a complete or partial restore of
1111
This guide has the following prerequisites and limitations that should be checked regularly:
1212

1313
1. The following items should be backed up regularly by the platform administrator:
14-
- The Kubernetes secret ending in "-wildcard-cert" in namespace "istio-system" (if installed via the Linode cloud console, or using your own certificate).
15-
- The Kubernetes secret "otomi-sops-secrets" in namespace "otomi-pipelines".
16-
- A download of the complete values in Platform -> Maintenance. Depending on whether these are downloaded with or without secrets, some passwords might have to be reset after recovery.
17-
- Optionally manual backups of databases, as covered in this guide for the CloudNative PostgreSQL Operator, should be taken.
14+
15+
- The Kubernetes secret ending in "-wildcard-cert" in namespace "istio-system" (if installed via the Linode cloud console, or using your own certificate).
16+
- The Kubernetes secret "otomi-sops-secrets" in namespace "otomi-pipelines".
17+
- A download of the complete values in Platform -> Maintenance. Depending on whether these are downloaded with or without secrets, some passwords might have to be reset after recovery.
18+
- Optionally manual backups of databases, as covered in this guide for the CloudNative PostgreSQL Operator, should be taken.
1819

1920
2. Object storage needs to be set up for all backup types referred to. Credentials should be added to Platform Settings -> Object Storage.
2021

@@ -30,6 +31,6 @@ This guide has the following prerequisites and limitations that should be checke
3031

3132
## Guides
3233

33-
* [Gitea](gitea.md): Restoring the platform's Gitea database and repositories from the application backup
34-
* [Databases](platform-databases.md): Backup and restore of the CNPG databases
35-
* [Reinstall](platform-reinstall.md): Restoring the complete platform, including settings and data
34+
- [Gitea](gitea.md): Restoring the platform's Gitea database and repositories from the application backup
35+
- [Databases](platform-databases.md): Backup and restore of the CNPG databases
36+
- [Reinstall](platform-reinstall.md): Restoring the complete platform, including settings and data

docs/for-ops/disaster-recovery/platform-databases.md

Lines changed: 54 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,31 @@ Generally it is recommended to get familiar with the [CNPG documentation](https:
88

99
## Initial notes
1010

11+
Since the procedure requires patching the existing Kubernetes resources, the apl-operator must be scaled down to prevent overwrites:
12+
13+
```
14+
kubectl patch application -n argocd apl-operator-apl-operator --patch '[{"op": "remove", "path": "/spec/syncPolicy/automated"}]' --type=json
15+
kubectl scale --replicas=0 -n apl-operator deployment apl-operator
16+
```
17+
18+
Once the restore procedure is completed, enable the apl-operator:
19+
20+
```
21+
kubectl scale --replicas=1 -n apl-operator deployment apl-operator
22+
```
23+
1124
Changes to the `values` repository can usually be made through the Gitea UI after signing in with the `platform-admin` user. As this requires Keycloak in addition to Gitea operating normally, the risk can be reduced by creating an application token and pulling/pushing local changes to the repository. In Gitea, go to the user settings, click on the `Applications` tab, enter a token name and select `repo` as the scope. After creating this token, you can include it in the repository URL, e.g.
1225

1326
```sh
1427
git clone https://<token>@gitea.example.com/otomi/values.git
1528
```
1629

17-
In the event that platform-critical services Gitea and Keycloak are not able to start, required changes to the database configuration can be applied directly in the following Argo CD applications in the `argocd` namespace. This change persists and is synchronized into the cluster until the next following Tekton pipeline overwrites them:
30+
In the event that platform-critical services Gitea and Keycloak are not able to start, required changes to the database configuration can be applied directly in the following Argo CD applications in the `argocd` namespace:
1831

19-
* Gitea database: `gitea-gitea-otomi-db`
20-
* Keycloak database: `keycloak-keycloak-otomi-db`
32+
- Gitea database: `gitea-gitea-otomi-db`
33+
- Keycloak database: `keycloak-keycloak-otomi-db`
2134

22-
Where applicable, in these manifests the `initdb` section in `clusterSpec.bootstrap` can be replaced with `recovery` and `externalClusters` just as instructed below. Note that `recovery` and `externalClusters` do not need to be reflected in the values file later, since they are only considered when initializing the cluster; even when Tekton does revert these changes, after a successful recovery this no longer has an effect.
35+
Where applicable, in these manifests the `initdb` section in `clusterSpec.bootstrap` can be replaced with `recovery` and `externalClusters` just as instructed below. Note that `recovery` and `externalClusters` do not need to be reflected in the values file later, since they are only considered when initializing the cluster.
2336

2437
## Regular recovery with backup in same cluster
2538

@@ -47,6 +60,7 @@ Note that the time stamps of the backup names are universal time (UTC).
4760
```sh
4861
kubectl get backup -n <app>
4962
```
63+
5064
where `<app>` is to be replaced with `gitea`, `harbor`, or `keycloak`.
5165

5266
### Adjustments to the backup configuration
@@ -60,12 +74,12 @@ Example:
6074
```yaml
6175
# ...
6276
platformBackups:
63-
database:
64-
gitea:
65-
enabled: false
66-
retentionPolicy: 7d
67-
schedule: 0 0 * * *
68-
pathSuffix: gitea-1
77+
database:
78+
gitea:
79+
enabled: false
80+
retentionPolicy: 7d
81+
schedule: 0 0 * * *
82+
pathSuffix: gitea-1
6983
# ...
7084
```
7185

@@ -76,6 +90,7 @@ The following change only has an effect on an initial database cluster. Therefor
7690
In the file `env/databases/<app>.yaml`, update the structure of `databases.<app>.recovery` as follows, depending on the app, inserting the backup name as determined above:
7791

7892
For Gitea:
93+
7994
```yaml
8095
databases:
8196
gitea:
@@ -90,6 +105,7 @@ databases:
90105
```
91106
92107
For Harbor:
108+
93109
```yaml
94110
databases:
95111
harbor:
@@ -102,6 +118,7 @@ databases:
102118
```
103119
104120
For Keycloak:
121+
105122
```yaml
106123
databases:
107124
keycloak:
@@ -120,31 +137,34 @@ Note that ArgoCD may show a sync error, pointing out that there are multiple `bo
120137
Check the Tekton pipelines to ensure that values changes have been deployed as expected. After this, during a backup or recovery of the database, the application should to be shut down for avoiding any write operations leading to inconsistencies.
121138

122139
For temporarily disabling Gitea:
140+
123141
```sh
124142
## Disable ArgoCD auto-sync during the changes
125143
kubectl patch application -n argocd gitea-gitea --patch '[{"op": "remove", "path": "/spec/syncPolicy/automated"}]' --type=json
126-
## Scale Gitea statefulset to zero
127-
kubectl patch statefulset -n gitea gitea --patch '[{"op": "replace", "path": "/spec/replicas", "value": 0}]' --type=json
144+
## Scale Gitea deployment to zero
145+
kubectl scale deployment -n gitea gitea --replicas=0
128146
## Verify that pods are shut down
129-
kubectl get statefulset -n gitea gitea # Should show READY 0/0
147+
kubectl get deployment -n gitea gitea # Should show READY 0/0
130148
```
131149

132150
For temporarily disabling Keycloak:
151+
133152
```sh
134153
## Disable ArgoCD auto-sync during the changes
135154
kubectl patch application -n argocd keycloak-keycloak-operator --patch '[{"op": "remove", "path": "/spec/syncPolicy/automated"}]' --type=json
136155
## Scale Keycloak statefulset to zero
137-
kubectl patch keycloak -n keycloak keycloak --patch '[{"op": "replace", "path": "/spec/instances", "value": 0}]' --type=json
156+
kubectl scale statefulset -n keycloak keycloak --replicas=0
138157
## Verify that pods are shut down
139158
kubectl get statefulset -n keycloak keycloak # Should show READY 0/0
140159
```
141160

142161
For temporarily disabling Harbor:
162+
143163
```sh
144164
## Disable ArgoCD auto-sync during the changes
145165
kubectl patch application -n argocd harbor-harbor --patch '[{"op": "remove", "path": "/spec/syncPolicy/automated"}]' --type=json
146166
## Scale Harbor deployment to zero
147-
kubectl patch deploy -n harbor harbor-core --patch '[{"op": "replace", "path": "/spec/replicas", "value": 0}]' --type=json
167+
kubectl scale deployment -n harbor harbor-core --replicas=0
148168
## Verify that pods are shut down
149169
kubectl get deploy -n harbor harbor-core # Should show READY 0/0
150170
```
@@ -154,6 +174,7 @@ kubectl get deploy -n harbor harbor-core # Should show READY 0/0
154174
After deploying the values changes and shutting down applications accessing the database, delete the database cluster.
155175

156176
For Gitea:
177+
157178
```sh
158179
## Disable ArgoCD auto-sync during the changes
159180
kubectl patch application -n argocd gitea-gitea-otomi-db --patch '[{"op": "remove", "path": "/spec/syncPolicy/automated"}]' --type=json
@@ -164,6 +185,7 @@ kubectl patch application -n argocd gitea-gitea-otomi-db --patch '[{"op": "add",
164185
```
165186

166187
For Harbor:
188+
167189
```sh
168190
## Disable ArgoCD auto-sync during the changes
169191
kubectl patch application -n argocd harbor-harbor-otomi-db --patch '[{"op": "remove", "path": "/spec/syncPolicy/automated"}]' --type=json
@@ -174,6 +196,7 @@ kubectl patch application -n argocd harbor-harbor-otomi-db --patch '[{"op": "add
174196
```
175197

176198
For Keycloak:
199+
177200
```sh
178201
## Disable ArgoCD auto-sync during the changes
179202
kubectl patch application -n argocd keycloak-keycloak-otomi-db --patch '[{"op": "remove", "path": "/spec/syncPolicy/automated"}]' --type=json
@@ -188,6 +211,7 @@ The cluster should now be recreated from the backup. Wait until the `Cluster` st
188211
### Restarting services
189212

190213
For restoring Gitea processes:
214+
191215
```sh
192216
## Re-enable ArgoCD auto-sync, which should also change the Gitea statefulset to scale up
193217
kubectl patch application -n argocd gitea-gitea --patch '[{"op": "add", "path": "/spec/syncPolicy/automated", "value": {"prune": true, "allowEmpty": true}}]' --type=json
@@ -196,6 +220,7 @@ kubectl patch statefulset -n gitea gitea --patch '[{"op": "replace", "path": "/s
196220
```
197221

198222
For restoring Keycloak processes:
223+
199224
```sh
200225
## Re-enable ArgoCD auto-sync
201226
kubectl patch application -n argocd keycloak-keycloak-operator-cr --patch '[{"op": "add", "path": "/spec/syncPolicy/automated", "value": {"prune": true, "allowEmpty": true}}]' --type=json
@@ -206,6 +231,7 @@ kubectl delete deploy -n apl-keycloak-operator apl-keycloak-operator
206231
```
207232

208233
For restoring Harbor processes:
234+
209235
```sh
210236
## Re-enable ArgoCD auto-sync
211237
kubectl patch application -n argocd harbor-harbor --patch '[{"op": "add", "path": "/spec/syncPolicy/automated", "value": {"prune": true, "allowEmpty": true}}]' --type=json
@@ -220,6 +246,7 @@ The following instructions for example apply for Gitea in the last step of [rein
220246
Adjust the object storage parameters below as needed, at least replacing the `<bucket-name>` and `<location>` placeholders. Typically `serverName` should remain unchanged. `linode-creds` are the account credentials set up by the platform and can be reused provided that they have access to the storage.
221247

222248
env/databases/gitea.yaml:
249+
223250
```yaml
224251
databases:
225252
gitea:
@@ -249,6 +276,7 @@ databases:
249276
```
250277

251278
env/databases/harbor.yaml:
279+
252280
```yaml
253281
databases:
254282
harbor:
@@ -278,6 +306,7 @@ databases:
278306
```
279307

280308
env/databases/keycloak.yaml:
309+
281310
```yaml
282311
databases:
283312
keycloak:
@@ -320,7 +349,7 @@ databases:
320349
owner: gitea
321350
recoveryTarget:
322351
# Time base target for the recovery
323-
targetTime: "2023-03-06 08:00:39+01"
352+
targetTime: '2023-03-06 08:00:39+01'
324353
externalClusters:
325354
# ...
326355
```
@@ -344,17 +373,20 @@ In the following steps, the `-n` suffix of each pod name (e.g. `gitea-db-n`) nee
344373
### Gitea database
345374

346375
Determine the primary instance:
376+
347377
```sh
348378
kubectl get cluster -n gitea gitea-db
349379
```
350380

351381
Backup:
382+
352383
```sh
353384
kubectl exec -n gitea gitea-db-n postgres \
354385
-- pg_dump -Fc -d gitea > gitea.dump
355386
```
356387

357388
Restore:
389+
358390
```sh
359391
kubectl exec -i -n gitea gitea-db-n postgres \
360392
-- pg_restore --no-owner --role=gitea -d gitea --verbose --clean < gitea.dump
@@ -363,17 +395,20 @@ kubectl exec -i -n gitea gitea-db-n postgres \
363395
### Keycloak database
364396

365397
Determine the primary instance:
398+
366399
```sh
367400
kubectl get cluster -n keycloak keycloak-db
368401
```
369402

370403
Backup:
404+
371405
```sh
372406
kubectl exec -n keycloak keycloak-db-n postgres \
373407
-- pg_dump -Fc -d keycloak > keycloak.dump
374408
```
375409

376410
Restore:
411+
377412
```sh
378413
kubectl exec -i -n keycloak keycloak-db-n postgres \
379414
-- pg_restore --no-owner --role=keycloak -d keycloak --verbose --clean < keycloak.dump
@@ -382,17 +417,20 @@ kubectl exec -i -n keycloak keycloak-db-n postgres \
382417
### Harbor database
383418

384419
Determine the primary instance:
420+
385421
```sh
386422
kubectl get cluster -n harbor harbor-otomi-db
387423
```
388424

389425
Backup:
426+
390427
```sh
391428
kubectl exec -n harbor harbor-otomi-db-n postgres \
392429
-- pg_dump -Fc -d harbor > harbor.dump
393430
```
394431

395432
Restore:
433+
396434
```sh
397435
kubectl exec -i -n harbor harbor-otomi-db-n postgres \
398436
-- pg_restore --no-owner --role=keycloak -d harbor --verbose --clean < harbor.dump

0 commit comments

Comments
 (0)