Troubleshooting ingress and ALB reconciliation (AWS)
This guide explains how the nullplatform agent detects and prevents Ingress and ALB reconciliation issues in agent-backed Kubernetes scopes on AWS, what problems can occur, and how to troubleshoot them.
To do this, the agent runs networking validations during deployment. These validations catch AWS ALB and Ingress reconciliation failures early, before they result in unreachable applications or a blocked Load Balancer Ingress Controller.
When enabled, the agent verifies the Ingress and ALB reconciliation process and detects errors that may occur during reconciliation, including (but not limited to) the following:
- Your Ingress hostname can be reconciled to a valid ALB rule.
- Required SSL/TLS certificates exist in ACM.
- ALB limits and traffic configuration issues that could prevent successful reconciliation.
If a mismatch or invalid state is found, the deployment fails early with a clear error message and troubleshooting information in the logs.
The problem: failed Ingress and ALB reconciliation
When deploying an application, Kubernetes relies on the AWS ALB Ingress Controller to reconcile Ingress
resources with an Application Load Balancer (ALB).
In some situations, this reconciliation fails due to invalid or constrained AWS or Ingress configuration, such as:
- An Ingress hostname without a matching SSL/TLS certificate in AWS Certificate Manager (ACM).
- The ALB reaching its maximum number of rules (commonly 100).
- Insufficient IP capacity or other AWS-side limitations.
When reconciliation fails, it can lead to two critical outcomes:
- The deployed application is not reachable through its expected domain.
- The ALB Ingress Controller enters an invalid or blocked state, preventing subsequent Ingresses from being reconciled on the same Load Balancer.
These failures are often difficult to diagnose after the fact and may impact unrelated services sharing the ALB.
What the networking validation checks
To reduce the risk of these failures, the agent includes a networking validation step that runs during deployment.
At a high level, the validation:
- Observes Ingress reconciliation events in the cluster.
- Identifies the ALB rule associated with the scope’s hostname.
- Compares the expected traffic configuration (for example, blue/green traffic weights) with the actual ALB configuration in AWS.
- Fails the deployment early if a mismatch or invalid state is detected.
By stopping the deployment at this stage, the validation prevents broken rollouts from progressing and avoids leaving the ALB in a bad or locked state.
How validation errors surface
If a networking validation fails, the deployment itself fails and the error is surfaced directly in the deployment logs.
A common example is a certificate mismatch:
- The Ingress hostname does not match any valid SSL/TLS certificate in ACM.
- The ALB listener rule cannot be created or updated.
In these cases, the logs clearly report:
- The detected root cause.
- The specific hostname, listener, or rule that failed validation.
- Suggested next steps to correct the configuration.
Troubleshooting a failed networking validation
When a deployment fails due to a networking validation error, use the following steps to diagnose and resolve the issue:
-
Review the deployment logs first The validation error message points directly to the failing condition (for example, a missing certificate or rule limit).
-
Review SSL/TLS certificates and hostname alignment Verify that the ACM certificates configured in the target AWS account and region correctly match the hostname defined in the
Ingress. The hostname reflects the desired domain, so the certificate must be issued for that exact configuration. -
Check ALB rule limits and capacity Confirm that the ALB has not reached its maximum number of listener rules and that there are no AWS-side capacity constraints.
After correcting the issue, re-run the deployment. No manual cleanup is typically required unless the ALB was left in a partially configured state.
Requirements
Because the validation inspects ALB configuration directly, the agent’s IAM role must include read-only permissions for Elastic Load Balancing.
Attach the following policy to the agent role:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowALBValidation",
"Effect": "Allow",
"Action": [
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeListeners",
"elasticloadbalancing:DescribeRules"
],
"Resource": "*"
}
]
}
These permissions allow the agent to list Load Balancers and inspect listeners and rules required for validation.
Enabling networking validations
Networking validations are disabled by default and must be explicitly enabled.
Option 1: Enable via Helm (recommended)
If you installed the agent using Helm, enable the validation by setting the following value:
helm upgrade nullplatform-agent nullplatform/nullplatform-agent \
--reuse-values \
--set configuration.values.ALB_RECONCILIATION_ENABLED=true \
--set configuration.values.REGION=us-east-1
📖 Check our Agent docs for info.
Option 2: Enable via override values
If you manage configuration through a repository, add the following to your values.yaml:
configuration:
ALB_RECONCILIATION_ENABLED: true
REGION: true
📖 Check our Override workflows docs for info.
Once enabled, the validation runs automatically on subsequent deployments.