Open
Description
What steps did you take and what happened?
always repro.
In current ClusterCache implementation, it only update kubeconfig when health check failed. This causes a few issues:
- It reports probe failure due to kubeconfig no longer invalid, which shall be considered as a false alarm. Although by default kubeconfig update less frequently (few months), but in our scenario, we need update in a relative fast pace, and support user to update on-demand, this may cause the false alarm more frequently.
- in our scenario, we have a relay proxy setup between mgmt cluster and target cluster, the kubeconfig server address are pointing to proxy address. This address may update in few hours, when address updated, the existing connection to old address still works, but old address doesn't accept new connections. This cause problem that clusterCache still holds the old address, but when ETCD get restconfig from clusterCache, it cannot establish the connection due to address already expired.
// Run the health probe
tooManyConsecutiveFailures, unauthorizedErrorOccurred := accessor.HealthCheck(ctx)
if tooManyConsecutiveFailures || unauthorizedErrorOccurred {
// Disconnect if the health probe failed (either with unauthorized or consecutive failures >= HealthProbe.FailureThreshold).
accessor.Disconnect(ctx)
// Store that disconnect was done.
didDisconnect = true
connected = false //nolint:ineffassign // connected is *currently* not used below, let's update it anyway
}
What did you expect to happen?
we can check kubeconfig secret resourceVersion in clusterCache's reconcile loop, when it changes, refreash the kubeconfig and re connect()
Cluster API version
v1.9.5
Kubernetes version
v1.30
Anything else you would like to add?
No response
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.