Skip to content

every TF apply changes managed node group block device mappings #3426

@fideloper

Description

@fideloper

Description

I'm having an issue seemingly only on EKS with K8s 1.32 (1.32 clusters don't seem effected). We run many EKS clusters so have a basis for comparison.

Every terraform run, across multiple 1.32 clusters, terraform attempts to roll the managed node group created via the module. It seems to get confused on the eks_managed_node_groups.block_device_mappings map.

Not sure what dumb thing I might be doing 😂

  • ✋ I have searched the open/closed issues and my issue is not listed.

For terraform like this:

eks_managed_node_groups = merge({
    "${var.main_node_group_name}" = {
      
      min_size     = var.min_size
      max_size     = var.max_size
      desired_size = var.desired_size

      iam_role_name            = "some-node-role-name"
      iam_role_use_name_prefix = false
      iam_role_tags = var.additional_iam_role_tags

      ebs_optimized = true
      block_device_mappings = {
        xvda = {
          device_name = "/dev/xvda"
          ebs = {
            volume_size           = 2
            volume_type           = "gp3"
            iops                  = 3000
            throughput            = 125
            encrypted             = true
            delete_on_termination = true
          }
        }
        xvdb = {
          device_name = "/dev/xvdb"
          ebs = {
            volume_size           = var.node_storage
            volume_type           = "gp3"
            iops                  = 3000
            throughput            = 125
            encrypted             = true
            delete_on_termination = true
          }
        }
      }
   
      instance_types = [var.node_instance_type]
      ami_type       = var.ami_type
      platform       = "bottlerocket"

    }
  }, var.additional_managed_node_groups)

We don't use any additional_managed_node_groups to over-ride this section in the clusters in question here

On every terraform plan/apply, we see this:

Image

As a result, the apply takes a painfull long time as the update to the ASG template replaces all the nodes in the managed node group.

🔴 There is no actual change, but we see some confusion around the devices. It seems to swap the device configurations around, the end result is the exact same configuration.

Versions

  • Module version [Required]:

Configured as version = "~> 20.0" which I believe gets the latest version on each TF run (since it runs automated without caching module files locally).

  • Terraform version:

Terraform v1.9.4

  • Provider version(s):
Terraform v1.9.4
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v5.100.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.7
+ provider registry.terraform.io/hashicorp/null v3.2.4
+ provider registry.terraform.io/hashicorp/time v0.13.1
+ provider registry.terraform.io/hashicorp/tls v4.1.0

Theories?

Any theories on this? I can dig in more to get reproduction steps if needed, but I'd have to unwind some complicated stuff as we wrap a module around this one.

My best guess is the AWS API is returning the block devices in a different order than this map expects? I don't quite understand the difference between k8s 1.31 and 1.32 (or how that correlates to this issue).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions