Skip to content

Occasional flake with OSProvisioningClientError #496

Open
kubernetes-sigs/image-builder
#1712
@jsturtevant

Description

@jsturtevant

We are seeing an occasional vm provisioning error:

"{\"status\":\"Failed\",\"error\":{\"code\":\"ResourceOperationFailure\",\"message\":\"The resource operation completed
 with terminal provisioning state 'Failed'.\",\"details\":[{\"code\":\"OSProvisioningClientError\",\"message\":\"OS 
Provisioning for VM 'capz-conf-6l2q7' did not finish in the allotted time. However, the VM guest agent was detected 
running. This suggests the guest OS has not been properly prepared to be used as a VM image (with 
CreateOption=FromImage). To resolve this issue, either use the VHD as is with CreateOption=Attach or prepare it 
properly for use as an image:\\r\\n * Instructions for Windows: https://learn.microsoft.com/azure/virtual-machines/windows/prepare-for-upload-vhd-image\\r\\n * Instructions for Linux: 
https://learn.microsoft.com/azure/virtual-machines/linux/create-upload-generic \"}]}}",

Interestingly the other node (which uses the same VHD image) came up just fine:

capz-conf-ry2zwl-control-plane-kpfmh   Ready    control-plane   3h48m   v1.33.0-beta.0.679+5c7491bf0874a8   10.0.0.4      <none>        Ubuntu 24.04.2 LTS               6.8.0-1021-azure   containerd://1.7.20
capz-conf-ss4bz                        Ready    <none>          3h45m   v1.33.0-beta.0.679+5c7491bf0874a8   10.1.0.5      <none>    

https://storage.googleapis.com/kubernetes-ci-logs/logs/ci-kubernetes-e2e-capz-master-windows/1905560033546997760/build-log.txt

We didn't get logs from the successful nodes since the logging code had a panic:

Done cleaning up after docker in docker.
Collecting logs for cluster capz-conf-ry2zwl in namespace default and dumping logs to /logs/artifacts
E0328 14:08:11.184919   42840 reflector.go:158] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:243: Failed to watch *v1.Pod: Get \"https://capz-conf--capz-conf-ry2zwl-46678f-2uf8mw3o.hcp.northeurope.azmk8s.io:443/api/v1/pods?resourceVersion=91160&timeoutSeconds=438&watch=true\": dial tcp: lookup capz-conf--capz-conf-ry2zwl-46678f-2uf8mw3o.hcp.northeurope.azmk8s.io on 172.20.0.10:53: no such host" logger="UnhandledError"

the cluster name looks suspicious: capz-conf--capz-conf-ry2zwl-46678f-2uf8mw3o

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions