Description
What steps did you take and what happened?
When using CAPO, I've noticed that I had a cluster that was reconciling non-stop and eating up a ton of CPU, upon further troubleshooting, I noticed that the reconciler seems to not actually grab the latest version of the CRD when making the request (my guess is that in the db, it's still using v1alpha6 but presenting v1alpha7 for user).
You can see that v1alpha7 is the newest version:
❯ kubectl -n magnum-system get crd/openstackclusters.infrastructure.cluster.x-k8s.io -oyaml | grep 'cluster.x-k8s.io/v1beta1'
cluster.x-k8s.io/v1beta1: v1alpha5_v1alpha6_v1alpha7
The Cluster
resource agrees with this too:
❯ kubectl -n magnum-system get cluster/kube-cmd33 -oyaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
...
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
kind: OpenStackCluster
name: kube-cmd33-4k46x
namespace: magnum-system
...
However, you can see that when it tries to make a request to update when bringing the verbosity all the way up, snipped this from logs:
I1014 02:10:26.010999 1 round_trippers.go:466] curl -v -XPATCH -H "User-Agent: manager/v1.5.1 cluster-api-controller-manager (linux/amd64) cluster.x-k8s.io/db17cb2" -H "Authorization: Bearer <masked>" -H "Content-Type: application/apply-patch+yaml" -H "Accept: application/json" 'https://10.96.0.1:443/apis/infrastructure.cluster.x-k8s.io/v1alpha6/namespaces/magnum-system/openstackclusters/kube-cmd33-4k46x?fieldManager=capi-topology&force=true'
And because of that, it almost always 'notices' a change, and loops endlessly, I tried to make a diff with the info that it is sending...
❯ diff -uNr /tmp/currobj.yml /tmp/dryrunapplied.yml
--- /tmp/currobj.yml 2023-10-13 21:45:47
+++ /tmp/dryrunapplied.yml 2023-10-13 21:46:00
@@ -1,13 +1,14 @@
-apiVersion: infrastructure.cluster.x-k8s.io/v1alpha7
+apiVersion: infrastructure.cluster.x-k8s.io/v1alpha6
kind: OpenStackCluster
metadata:
annotations:
cluster.x-k8s.io/cloned-from-groupkind: OpenStackClusterTemplate.infrastructure.cluster.x-k8s.io
cluster.x-k8s.io/cloned-from-name: magnum-v0.9.1
+ topology.cluster.x-k8s.io/dry-run: ""
creationTimestamp: "2023-07-23T09:03:45Z"
finalizers:
- openstackcluster.infrastructure.cluster.x-k8s.io
- generation: 3429950
+ generation: 3430011
labels:
cluster.x-k8s.io/cluster-name: kube-cmd33
topology.cluster.x-k8s.io/owned: ""
@@ -20,7 +21,7 @@
kind: Cluster
name: kube-cmd33
uid: 0108ecb7-9e6e-4045-a5f0-811a8aade488
- resourceVersion: "89143700"
+ resourceVersion: "89143889"
uid: 0abd98ab-6010-43be-b028-44a0df84e597
spec:
allowAllInClusterTraffic: false
@@ -154,16 +155,16 @@
network:
id: a91dc22f-86fc-4677-938b-f15da173178e
name: k8s-clusterapi-cluster-magnum-system-kube-cmd33
- subnets:
- - cidr: 10.0.0.0/24
+ router:
+ id: dcf60b96-6ceb-42fe-8d17-7f2c1b8b99a8
+ ips:
+ - 46.246.75.135
+ name: k8s-clusterapi-cluster-magnum-system-kube-cmd33
+ subnet:
+ cidr: 10.0.0.0/24
id: 0ddb7a30-1bcb-4940-83b6-bf91ddadec8b
name: k8s-clusterapi-cluster-magnum-system-kube-cmd33
ready: true
- router:
- id: dcf60b96-6ceb-42fe-8d17-7f2c1b8b99a8
- ips:
- - A.B.C.D
- name: k8s-clusterapi-cluster-magnum-system-kube-cmd33
workerSecurityGroup:
id: c1237980-280d-44de-9ff2-4fe5a4e20d9a
name: k8s-cluster-magnum-system-kube-cmd33-secgroup-worker
So because there was a change in the OpenStackCluster
, and its' pulling v1alpha6 (somehow) and v1alpha7 is the real expected version, it's just looping.. I feel like there's a spot here where it was missed to pull the up to date version of the infrastructureRef
..
What did you expect to happen?
No loops and none of this to happen:
I1014 02:13:28.108463 1 reconcile_state.go:284] "Patching OpenStackCluster/kube-cmd33-4k46x" controller="topology/cluster" controllerGroup="cluster.x-k8s.io" controllerKind="Cluster" Cluster="magnum-system/kube-cmd33" namespace="magnum-system" name="kube-cmd33" reconcileID=2e8e2bc8-cc63-4b35-8c25-b83181bbd883 resource={Group:infrastructure.cluster.x-k8s.io Version:v1alpha6 Resource:OpenStackCluster} OpenStackCluster="magnum-system/kube-cmd33-4k46x"
looping.. non stop...
Cluster API version
Cluster API 1.5.1 + CAPO 0.8.0
Kubernetes version
No response
Anything else you would like to add?
No response
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.