Cloud-native applications are being built at a good clip. While they're not quite dominating app portfolios just yet, they are increasing in number. Interest in containers is closely associated with cloud-native (microservices-based) architectures because of the inherent dependency on infrastructure for communications and scale.
Typically, scaling microservices is achieved through horizontal cloning. That is, if we need more instances of a given service, we simply clone it and add it to the available pool from which a load balancer chooses to respond to requests. Easy peasy. When those microservices closely represent functional elements, this model works even better.
The load balancer in question is often a component of the container orchestrator and defaults to the industry standard round robin TCP-based algorithm. That means a request comes in and the load balancer chooses the 'next in line' resource to respond.
This method is often analogized to the line at a bank or the DMV. But that's not entirely accurate. In a true round robin scenario, you aren't directed to the "next available" resource. You are directed to the "next in line" resource—even if that resource is busy. Ironically, the methods of distribution at the DMV and your local bank are more efficient than a true round robin algorithm.
I know, right?
This is true for applications as well. The same service—even at the functional level—may be cloned because they serve the same set of requests. But those requests are not always equal in terms of execution because data. That's right, data. The same functional request—or API call—may take more or less time to execute depending on the data being submitted or requested. After all, it's going to take less time to retrieve and serialize a single customer record than it does to retrieve and serialize ten or a hundred customer records.
And that's where round robin breaks down a bit and introduces variability that can impact performance. Operational axiom #2 still applies to cloud-native and microservices-based architectures: as load increases, performance decreases.
Round robin is like honey badger. It doesn't care if a resource is getting overloaded by requests with significant data sets as responses. Round robin says "you're next" whether you're ready or not. This can result in uneven performance for those users whose requests wind up in a queue on an increasingly burdened resource.
If you're concerned about performance—and you should be—then a better alternative is, well, just about any of the other standard algorithms such as least connections or fastest response time. Basically, you want your algorithm to take into consideration load and/or speed instead of simply blindly foisting requests off on resources that may not be an optimal choice.
Some might think that as we climb the stack from TCP to HTTP to HTTP+ that this issue will resolve itself. That's not the case at all. The method of distribution—the load balancing algorithm—is still relevant irrespective of the layer you're basing it on. Round robin doesn't care about the architecture, it cares about resources and makes decisions based on an available pool. Whether that pool is meant to scale a single API call or an entire monolith makes no difference to the algorithm.
So, it would be nice if the load balancer were smart enough to recognize when a query would result in "more than average" data before it executes. Web application firewalls like F5 WAF are able to recognize when a result is out of the ordinary—but that's on the response and primarily enables better application security. What we need is for the load balancer to get smart enough to predict an "extra-large" legitimate response.
If the load balancer were capable of that kind of discernment, it could factor that in to its decision making and more evenly distribute requests across available resources. What we really want is to not be forced into specifying a rigid algorithm upon which to make decisions. It would be better if the load balancer could make the decision based on business thresholds and technical characteristics such as response times, anticipated execution time, size of data returned, and the load right now on each resource.
Ultimately, this is the kind of intelligence that can only be realized through better visibility and machine learning. If the load balancer can learn through experience to recognize which queries take more time than others, it can then apply that knowledge to better distribute requests such that a consistent, predictable response time can be achieved.
Load balancing is not going away. It's our best technical response to scaling everything from the network to applications. But it does need to evolve along with the rest of the infrastructure to be more dynamic, autonomic, and intelligent.