Boosting AWS Resilience with ALB Least Outstanding Requests

Overview

In AWS environments, the Application Load Balancer (ALB) is a fundamental component for distributing traffic and ensuring application availability. However, its default configuration uses a simple Round Robin algorithm, which distributes requests sequentially to each target. This approach treats all backend instances and all requests as equal, an assumption that rarely holds true in modern, complex applications.

The flaw in Round Robin is its inability to adapt. A target instance struggling with a resource-intensive task will continue to receive the same number of new requests as an idle instance. This can lead to request queues, increased latency, and even service timeouts. Inefficient traffic distribution creates performance bottlenecks and operational waste, as teams are often forced to over-provision resources to handle worst-case scenarios.

A more intelligent approach is to configure the ALB’s target groups to use the Least Outstanding Requests algorithm. This dynamic method routes new traffic to the target with the fewest in-flight requests. It inherently accounts for variations in instance performance and request complexity, automatically sending work to the resources best equipped to handle it at that moment. This simple change is a powerful lever for improving both resilience and cost efficiency.

Why It Matters for FinOps

From a FinOps perspective, inefficient load balancing is a direct source of cloud waste. When traffic is distributed unevenly, some servers in a fleet become overwhelmed while others sit underutilized. The common response is to scale out the entire fleet, adding more instances to absorb the load. This increases the AWS bill without addressing the root cause of the inefficiency.

By implementing the Least Outstanding Requests algorithm, organizations can achieve higher aggregate utilization from their existing resources. The load is spread based on actual capacity, not a rigid sequence. This allows for right-sizing compute fleets and can make it more viable to use cost-effective options like AWS Spot Instances, as the ALB can gracefully manage the performance variability of a mixed-instance fleet.

Beyond direct cost savings, this configuration reduces business risk. Downtime caused by cascading failures has a significant financial impact, from lost revenue to reputational damage. By preventing server overload and improving system stability, this algorithm acts as a crucial governance guardrail that supports business continuity and protects the bottom line.

What Counts as “Idle” in This Article

In the context of load balancing, “idle” doesn’t mean a server is powered off. Instead, it refers to a target’s real-time processing capacity. An “idle” or available target is one with a low number of active, in-flight requests. Conversely, a “busy” target is one that is overwhelmed, with a long queue of requests it has yet to process.

The primary signal of a busy target is not just its CPU or memory usage, but the count of its outstanding requests. An intelligent load balancer monitors this count for each target in its group. A rising count on a specific instance is a clear indicator that it is either processing a series of complex tasks or experiencing performance degradation. This metric allows the load balancer to make dynamic routing decisions to avoid sending more work to an already-strained resource.

Common Scenarios

Scenario 1

Microservice architectures often have endpoints with vastly different computational costs. For example, one API call might fetch simple user data, while another generates a complex report. With Round Robin, a server that receives several report-generation requests in a row will become a bottleneck, slowing down responses for all subsequent requests sent its way. Least Outstanding Requests prevents this by diverting traffic away from the busy server until it catches up.

Scenario 2

In containerized environments using Amazon EKS, pods are constantly being created and destroyed. A newly launched container may need a “warm-up” period to load caches or complete just-in-time compilation. During this phase, it processes requests slowly. The Least Outstanding Requests algorithm naturally throttles traffic to these “cold” containers, giving them time to warm up without being overwhelmed, thereby improving the stability of the entire cluster.

Scenario 3

Organizations often operate compute fleets with mixed EC2 instance types, especially when migrating to new instance generations or leveraging Spot Instances for cost savings. Round Robin treats a small, older instance and a large, powerful one as equals. The Least Outstanding Requests algorithm automatically sends more traffic to the higher-capacity instances that can process it faster, optimizing the performance of the entire heterogeneous fleet.

Risks and Trade-offs

While switching to the Least Outstanding Requests algorithm is a low-risk, high-reward change, there are factors to consider. The primary concern is application state. If an application relies on “sticky sessions” (session affinity) to ensure a user is always sent to the same server, that affinity setting will override the load balancing algorithm for established sessions. It’s crucial to verify that state management is handled correctly before making the change.

Additionally, this routing method relies on accurate health checks. If health checks are too lenient, a “grey failure”—a server that is online but performing poorly—might remain in the target group. While the algorithm will route traffic away from it, a more robust health check would remove it from rotation entirely. The change can also expose previously hidden performance issues in specific nodes, so monitoring is key after implementation.

Recommended Guardrails

To ensure consistent resilience and cost-efficiency, organizations should establish governance guardrails for their AWS load balancing configurations.

  • Policy: Mandate the use of the Least Outstanding Requests algorithm as the default for all new Application Load Balancer target groups, especially for production workloads.
  • Tagging: Implement a consistent tagging strategy to assign clear ownership for each ALB and its associated target groups. This clarifies responsibility for configuration and monitoring.
  • Budgeting & Alerts: Use AWS Budgets and CloudWatch alarms to monitor for anomalies. Set alerts on key metrics like UnHealthyHostCount, target response time (p99 latency), and spikes in fleet size, which could indicate underlying issues that load balancing is masking.
  • Review Process: Incorporate a check for the load balancing algorithm into infrastructure-as-code (IaC) pull request reviews and automated security posture management tools.

Provider Notes

AWS

In AWS, this configuration is managed at the Target Group level, not on the Application Load Balancer (ALB) itself. Each target group associated with an ALB listener can be configured independently. After making the change, performance should be monitored using Amazon CloudWatch metrics. Key indicators to watch include TargetResponseTime, RequestCountPerTarget, and UnHealthyHostCount, which provide insight into how evenly the load is being distributed and how the backend is performing.

Binadox Operational Playbook

Binadox Insight: The default Round Robin algorithm in AWS ALBs creates performance hotspots and financial waste by treating all servers equally. Switching to Least Outstanding Requests aligns traffic distribution with actual server capacity, directly improving application resilience and lowering compute costs.

Binadox Checklist:

  • Inventory all Application Load Balancers and their associated Target Groups.
  • For each Target Group, inspect the load balancing algorithm attribute.
  • Identify all production Target Groups still using the default “Round Robin” setting.
  • Assess application statefulness and confirm compatibility with dynamic routing.
  • Plan and implement the change to “Least Outstanding Requests” during a low-traffic window.
  • Monitor CloudWatch metrics post-change to validate performance improvements.

Binadox KPIs to Track:

  • p99 Latency: Track the 99th percentile response time to see a reduction in “tail latency” for users.
  • CPU Utilization Variance: Measure the standard deviation of CPU usage across all targets in a group; it should decrease as load becomes more balanced.
  • Unhealthy Host Count: Monitor this metric to ensure the new traffic pattern isn’t causing previously stable nodes to fail.
  • Cost Per Request/Transaction: Correlate performance improvements with unit economics to quantify the financial benefit.

Binadox Common Pitfalls:

  • Ignoring Sticky Sessions: Changing the algorithm without understanding how your application handles user state can lead to broken user experiences.
  • Inadequate Health Checks: Relying on basic TCP health checks that don’t accurately reflect application health can undermine the algorithm’s effectiveness.
  • “Set and Forget” Mentality: Failing to monitor performance metrics after the change, thereby missing opportunities to further optimize or catch new issues.
  • Applying Changes Blindly: Not considering the specific workload; while broadly beneficial, some very uniform, short-lived request patterns might see negligible benefit.

Conclusion

Moving from Round Robin to the Least Outstanding Requests algorithm is a simple yet powerful optimization for any workload running behind an AWS Application Load Balancer. It transforms the load balancer from a static distributor into an intelligent traffic manager that adapts to real-world conditions.

By making this change, FinOps and engineering teams can build more resilient, performant, and cost-effective systems. The first step is to audit your current environment to identify where this opportunity for improvement exists. This proactive measure strengthens your cloud foundation, ensuring your infrastructure works smarter, not just harder.