EKS Module V21: Fixing Eks_managed_node_group Errors
Hey folks! 👋 I've been wrestling with an issue when using the eks_managed_node_group with the EKS module version 21, and I figured I'd share my experience and how I (kinda) fixed it. If you're using Terraform and running into this, then maybe this can help you, too.
The Problem: "Invalid count argument" ⚠️
So, the scenario is this: I'm using the hashicorp/aws provider, specifically version ~> 6, along with the EKS module version ~> 21. My EKS cluster is on version 1.32. When I run tofu plan (I'm using OpenTofu, but the Terraform behavior is the same, guys!), I keep getting this gnarly error popping up:
│ Error: Invalid count argument
│
│ on .terraform/modules/eks.eks/modules/eks-managed-node-group/main.tf line 2, in data "aws_partition" "current":
│ 2: count = var.create && var.partition == "" ? 1 : 0
│
│ The "count" value depends on resource attributes that cannot be determined until apply, so OpenTofu cannot predict how many instances will be created.
│
│ To work around this, use the planning option -exclude=module.eks.module.eks.module.eks_managed_node_group["bootstrap"].data.aws_partition.current to first apply
│ without this object, and then apply normally to converge.
╵
╷
│ Error: Invalid count argument
│
│ on .terraform/modules/eks.eks/modules/eks-managed-node-group/main.tf line 5, in data "aws_caller_identity" "current":
│ 5: count = var.create && var.account_id == "" ? 1 : 0
│
│ The "count" value depends on resource attributes that cannot be determined until apply, so OpenTofu cannot predict how many instances will be created.
│
│ To work around this, use the planning option -exclude=module.eks.module.eks.module.eks_managed_node_group["bootstrap"].data.aws_caller_identity.current to first
│ apply without this object, and then apply normally to converge.
Basically, the count argument in the eks-managed-node-group module is causing issues during the plan phase. It seems like OpenTofu (or Terraform) can't figure out the number of instances to create because some resource attributes aren't known until the apply phase. It's a classic chicken-and-egg problem, right? 🤔
I tried the usual suspects, like deleting the .terraform directory and running tofu init followed by tofu plan, but no dice. The init command worked fine, but the plan command still threw the same errors. Frustrating, I know! 😫
Quick Note on the Problem
If you have a problem, the first thing is to check these two things.
- Remove the local
.terraformdirectory (! ONLY if state is stored remotely, which hopefully you are following that best practice!):rm -rf .terraform/ - Re-initialize the project root to pull down modules:
terraform init - Re-attempt your terraform plan or apply and check if the issue still persists
Diving into the Details: Versions & Code 💻
Here’s a breakdown of the versions I'm using. It's always good to be specific when troubleshooting this stuff, as you probably already know.
- Module version: Required (as per the issue template)
- Terraform version: OpenTofu v1.10.3 (shouldn't matter, but good to know!)
- Provider versions:
+ provider registry.opentofu.org/gavinbunney/kubectl v1.16.0 + provider registry.opentofu.org/hashicorp/aws v6.18.0 + provider registry.opentofu.org/hashicorp/cloudinit v2.3.7 + provider registry.opentofu.org/hashicorp/helm v2.17.0 + provider registry.opentofu.org/hashicorp/kubernetes v2.38.0 + provider registry.opentofu.org/hashicorp/null v3.2.4 + provider registry.opentofu.org/hashicorp/time v0.13.1 + provider registry.opentofu.org/hashicorp/tls v4.1.0
And here’s the relevant code snippet from my Terraform configuration. It's a simplified version focusing on the eks_managed_node_groups part:
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = var.module_version
cluster_name = var.cluster_name
cluster_version = var.cluster_version
enable_irsa = true
vpc_id = local.vpc_id
subnet_ids = data.aws_subnets.private_subnets.ids
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
bootstrap_self_managed_addons = var.bootstrap_self_managed_addons
cluster_addons = {
kube-proxy = {}
vpc-cni = {}
}
eks_managed_node_groups = {
bootstrap = {
name = "bootstrap"
use_custom_launch_template = false
disk_size = 20
capacity_type = "SPOT"
force_update_version = true
ami_type = "BOTTLEROCKET_x86_64"
platform = "bottlerocket"
instance_types = ["m5.large", "m5a.large"]
iam_role_attach_cni_policy = true
min_size = 0
max_size = 2
desired_size = 0
use_custom_launch_template = true
enable_bootstrap_user_data = true
}
}
}
This is a pretty standard setup, right? We're defining a managed node group named "bootstrap" with a few configurations like instance types, disk size, and AMI type.
The Expected vs. Actual: What Should Happen? 💡
- Expected behavior: When I run
tofu plan, I should get a nice, clean list of resources that will be created, modified, or destroyed. No errors, just a clear picture of what's about to happen. - Actual behavior: I get the plan, but it's immediately followed by the "Invalid count argument" errors. 😠 It's like the plan is giving me the information, but then throwing a wrench in the works with those errors.
Workarounds and Potential Fixes 🤔
Unfortunately, I don't have a perfect solution, but here's what I've found so far to work around the issue. These are not ideal, but they might help you move forward.
-
The
-excludeOption: The error messages themselves provide a hint. You can try using the-excludeoption during thetofu plancommand. For instance:tofu plan -exclude=module.eks.module.eks.module.eks_managed_node_group["bootstrap"].data.aws_partition.current tofu plan -exclude=module.eks.module.eks.module.eks_managed_node_group["bootstrap"].data.aws_caller_identity.currentThis tells OpenTofu to skip those specific resources during the plan phase. It worked for some time, but eventually, it started to fail with other similar errors. 😢
-
Apply, Then Plan Again (Not ideal but works): A more involved workaround is to apply the changes, and then try
tofu planagain. It is not the best way, but it works to get the cluster up and running. This approach leverages the fact that the troublesome values are known after the initialapply. It is not the most elegant, but it can unblock you. -
Inspect the Module Code: I've looked through the
eks-managed-node-groupmodule code to see how thecountarguments are being used. It seems like there might be some dependencies on variables that aren't fully initialized during the plan phase. I did not find the root cause, but digging into the module code may give you more insight, or allow you to come up with other possible workarounds.
In conclusion: Need a Fix! 🚀
So, there you have it, folks. I'm hoping this helps some of you. I'm also hoping the maintainers of the EKS module will have a look and get this fixed! If you have any other insights or solutions, please share them! Let's get this thing working smoothly. Happy coding! 😊