EKS Module V21: Fixing Eks_managed_node_group Errors

Oct 25, 2025 by SLV Team 53 views

Hey folks! 👋 I've been wrestling with an issue when using the eks_managed_node_group with the EKS module version 21, and I figured I'd share my experience and how I (kinda) fixed it. If you're using Terraform and running into this, then maybe this can help you, too.

The Problem: "Invalid count argument" ⚠️

So, the scenario is this: I'm using the hashicorp/aws provider, specifically version ~> 6, along with the EKS module version ~> 21. My EKS cluster is on version 1.32. When I run tofu plan (I'm using OpenTofu, but the Terraform behavior is the same, guys!), I keep getting this gnarly error popping up:

│ Error: Invalid count argument
│
│ on .terraform/modules/eks.eks/modules/eks-managed-node-group/main.tf line 2, in data "aws_partition" "current":
│  2: count = var.create && var.partition == "" ? 1 : 0
│
│ The "count" value depends on resource attributes that cannot be determined until apply, so OpenTofu cannot predict how many instances will be created.
│
│ To work around this, use the planning option -exclude=module.eks.module.eks.module.eks_managed_node_group["bootstrap"].data.aws_partition.current to first apply
│ without this object, and then apply normally to converge.
╵
╷
│ Error: Invalid count argument
│
│ on .terraform/modules/eks.eks/modules/eks-managed-node-group/main.tf line 5, in data "aws_caller_identity" "current":
│  5: count = var.create && var.account_id == "" ? 1 : 0
│
│ The "count" value depends on resource attributes that cannot be determined until apply, so OpenTofu cannot predict how many instances will be created.
│
│ To work around this, use the planning option -exclude=module.eks.module.eks.module.eks_managed_node_group["bootstrap"].data.aws_caller_identity.current to first
│ apply without this object, and then apply normally to converge.

Basically, the count argument in the eks-managed-node-group module is causing issues during the plan phase. It seems like OpenTofu (or Terraform) can't figure out the number of instances to create because some resource attributes aren't known until the apply phase. It's a classic chicken-and-egg problem, right? 🤔

I tried the usual suspects, like deleting the .terraform directory and running tofu init followed by tofu plan, but no dice. The init command worked fine, but the plan command still threw the same errors. Frustrating, I know! 😫

Quick Note on the Problem

If you have a problem, the first thing is to check these two things.

Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
Re-initialize the project root to pull down modules: terraform init
Re-attempt your terraform plan or apply and check if the issue still persists

Diving into the Details: Versions & Code 💻

Here’s a breakdown of the versions I'm using. It's always good to be specific when troubleshooting this stuff, as you probably already know.

Module version: Required (as per the issue template)
Terraform version: OpenTofu v1.10.3 (shouldn't matter, but good to know!)

Provider versions:

+ provider registry.opentofu.org/gavinbunney/kubectl v1.16.0
+ provider registry.opentofu.org/hashicorp/aws v6.18.0
+ provider registry.opentofu.org/hashicorp/cloudinit v2.3.7
+ provider registry.opentofu.org/hashicorp/helm v2.17.0
+ provider registry.opentofu.org/hashicorp/kubernetes v2.38.0
+ provider registry.opentofu.org/hashicorp/null v3.2.4
+ provider registry.opentofu.org/hashicorp/time v0.13.1
+ provider registry.opentofu.org/hashicorp/tls v4.1.0

And here’s the relevant code snippet from my Terraform configuration. It's a simplified version focusing on the eks_managed_node_groups part:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = var.module_version

  cluster_name                    = var.cluster_name
  cluster_version                 = var.cluster_version
  enable_irsa                     = true
  vpc_id                          = local.vpc_id
  subnet_ids                      = data.aws_subnets.private_subnets.ids
  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true
  bootstrap_self_managed_addons   = var.bootstrap_self_managed_addons

  cluster_addons = {
    kube-proxy = {}
    vpc-cni    = {}
  }

  eks_managed_node_groups = {
    bootstrap = {
      name                       = "bootstrap"
      use_custom_launch_template = false
      disk_size                  = 20
      capacity_type              = "SPOT"
      force_update_version       = true
      ami_type                   = "BOTTLEROCKET_x86_64"
      platform                   = "bottlerocket"
      instance_types             = ["m5.large", "m5a.large"]
      iam_role_attach_cni_policy = true

      min_size     = 0
      max_size     = 2
      desired_size = 0

      use_custom_launch_template = true
      enable_bootstrap_user_data = true

    }
  }
}

This is a pretty standard setup, right? We're defining a managed node group named "bootstrap" with a few configurations like instance types, disk size, and AMI type.

The Expected vs. Actual: What Should Happen? 💡

Expected behavior: When I run tofu plan, I should get a nice, clean list of resources that will be created, modified, or destroyed. No errors, just a clear picture of what's about to happen.
Actual behavior: I get the plan, but it's immediately followed by the "Invalid count argument" errors. 😠 It's like the plan is giving me the information, but then throwing a wrench in the works with those errors.

Workarounds and Potential Fixes 🤔

Unfortunately, I don't have a perfect solution, but here's what I've found so far to work around the issue. These are not ideal, but they might help you move forward.

The -exclude Option: The error messages themselves provide a hint. You can try using the -exclude option during the tofu plan command. For instance:
```
tofu plan -exclude=module.eks.module.eks.module.eks_managed_node_group["bootstrap"].data.aws_partition.current
tofu plan -exclude=module.eks.module.eks.module.eks_managed_node_group["bootstrap"].data.aws_caller_identity.current
```
This tells OpenTofu to skip those specific resources during the plan phase. It worked for some time, but eventually, it started to fail with other similar errors. 😢
Apply, Then Plan Again (Not ideal but works): A more involved workaround is to apply the changes, and then try tofu plan again. It is not the best way, but it works to get the cluster up and running. This approach leverages the fact that the troublesome values are known after the initial apply. It is not the most elegant, but it can unblock you.
Inspect the Module Code: I've looked through the eks-managed-node-group module code to see how the count arguments are being used. It seems like there might be some dependencies on variables that aren't fully initialized during the plan phase. I did not find the root cause, but digging into the module code may give you more insight, or allow you to come up with other possible workarounds.

In conclusion: Need a Fix! 🚀

So, there you have it, folks. I'm hoping this helps some of you. I'm also hoping the maintainers of the EKS module will have a look and get this fixed! If you have any other insights or solutions, please share them! Let's get this thing working smoothly. Happy coding! 😊