When deleting nodegroups with eksctl delete nodegroup, workloads can be terminated without being safely rescheduled to surviving nodes. This happens because the ASG can launch new nodes into nodegroups that are being drained (triggered by cluster autoscaler, AZ rebalancing, health checks, etc.), and those new nodes receive evicted workloads but are then deleted by CloudFormation stack deletion.
Steps to reproduce
- Have a cluster with multiple nodegroups running workloads
- Run
eksctl delete nodegroup targeting multiple nodegroups simultaneously
- The drain process cordons existing nodes and evicts pods
- Evicted pods become Pending → the cluster autoscaler (or other ASG scaling triggers) scales up the same nodegroups being deleted, launching new uncordoned nodes
- Evicted pods are scheduled onto these new nodes
- Drain completes — the new nodes may or may not be caught by the drain loop's re-list
- CloudFormation stack deletion terminates all instances, including the new ones carrying workloads
Expected behavior
Workloads should be safely moved to nodes outside the deletion set before nodegroup deletion proceeds. No workloads should be running on any node in the targeted nodegroups when CloudFormation stack deletion begins.
Actual behavior
New nodes are launched into the nodegroups being deleted after the existing nodes are cordoned. These new nodes are not cordoned, receive evicted workloads, and are subsequently terminated by CloudFormation stack deletion — causing unexpected workload disruption.
Analysis
The drain loop in pkg/drain/nodegroup.go:110-172 re-lists nodes on each iteration to handle "accidental scale-up" (per the comment on line 110), but this is a race condition — new nodes can appear after the final re-list but before CloudFormation deletion. More fundamentally, the drain operates only at the Kubernetes node level (cordon + evict) and does nothing to prevent the ASG from launching replacement instances.
Suggested fix
Primary: Suspend ASG scaling processes before drain
Before draining, call SuspendProcesses on each nodegroup's ASG, suspending at minimum Launch, ReplaceUnhealthy, and AZRebalance. This prevents the ASG from launching new instances during drain, regardless of what triggered the scale-up.
The building blocks already exist in eksctl:
SuspendProcesses is already in the ASG API interface (pkg/awsapi/autoscaling.go:862)
- A working
suspendProcesses task exists (pkg/eks/nodegroup.go:185-213) that resolves the ASG name from the CF stack and calls SuspendProcesses
- ASG name lookup via
stackCollection.GetAutoScalingGroupName() is already implemented
- Process name validation is already in place (
pkg/apis/eksctl.io/v1alpha5/validation.go:1555)
This approach is:
- Autoscaler-agnostic — works at the AWS ASG level, not tied to cluster autoscaler or Karpenter
- Non-destructive — suspending
Launch does not affect existing running instances
- Self-cleaning — CloudFormation stack deletion removes the ASG anyway, no need to resume processes
For managed nodegroups, the underlying ASG name can be retrieved via the EKS DescribeNodegroup API (resources.autoScalingGroups field).
Secondary: Fail-safe check after drain
As a simpler interim safeguard or defense-in-depth measure: after all nodegroup drains complete but before deletion begins, re-list nodes for each nodegroup. If any new undrained nodes exist, fail with a clear error message so the user can retry.
eksctl version: 0.224.0
kubectl version: v1.34
When deleting nodegroups with
eksctl delete nodegroup, workloads can be terminated without being safely rescheduled to surviving nodes. This happens because the ASG can launch new nodes into nodegroups that are being drained (triggered by cluster autoscaler, AZ rebalancing, health checks, etc.), and those new nodes receive evicted workloads but are then deleted by CloudFormation stack deletion.Steps to reproduce
eksctl delete nodegrouptargeting multiple nodegroups simultaneouslyExpected behavior
Workloads should be safely moved to nodes outside the deletion set before nodegroup deletion proceeds. No workloads should be running on any node in the targeted nodegroups when CloudFormation stack deletion begins.
Actual behavior
New nodes are launched into the nodegroups being deleted after the existing nodes are cordoned. These new nodes are not cordoned, receive evicted workloads, and are subsequently terminated by CloudFormation stack deletion — causing unexpected workload disruption.
Analysis
The drain loop in
pkg/drain/nodegroup.go:110-172re-lists nodes on each iteration to handle "accidental scale-up" (per the comment on line 110), but this is a race condition — new nodes can appear after the final re-list but before CloudFormation deletion. More fundamentally, the drain operates only at the Kubernetes node level (cordon + evict) and does nothing to prevent the ASG from launching replacement instances.Suggested fix
Primary: Suspend ASG scaling processes before drain
Before draining, call
SuspendProcesseson each nodegroup's ASG, suspending at minimumLaunch,ReplaceUnhealthy, andAZRebalance. This prevents the ASG from launching new instances during drain, regardless of what triggered the scale-up.The building blocks already exist in eksctl:
SuspendProcessesis already in the ASG API interface (pkg/awsapi/autoscaling.go:862)suspendProcessestask exists (pkg/eks/nodegroup.go:185-213) that resolves the ASG name from the CF stack and callsSuspendProcessesstackCollection.GetAutoScalingGroupName()is already implementedpkg/apis/eksctl.io/v1alpha5/validation.go:1555)This approach is:
Launchdoes not affect existing running instancesFor managed nodegroups, the underlying ASG name can be retrieved via the EKS
DescribeNodegroupAPI (resources.autoScalingGroupsfield).Secondary: Fail-safe check after drain
As a simpler interim safeguard or defense-in-depth measure: after all nodegroup drains complete but before deletion begins, re-list nodes for each nodegroup. If any new undrained nodes exist, fail with a clear error message so the user can retry.
eksctl version: 0.224.0
kubectl version: v1.34