The load balancer supports the following load balancing algorithms:
round-robin
consistent-hashing
least-connections
latency
Note: If using health checks, unhealthy Targets won’t be removed from the load balancer, and won’t have any impact on the balancer layout when using a hashing algorithm.
Instead, unhealthy Targets will just be skipped.
The round-robin algorithm is done in a weighted manner. It provides identical
results to the default DNS based load balancing, but due to it being an upstream
,
the additional features for health checks and circuit breakers are also available.
When choosing this algorithm, consider the following:
- Provides good distribution of requests.
- Remains fairly static, as only DNS updates or Target updates can influence the distribution of traffic.
- Doesn’t improve cache-hit ratios.
With the consistent-hashing algorithm, a configurable client input is used to
calculate a hash value. This hash value is then tied to a specific backend
server.
A common example would be to use the consumer
as a hash input. Since this ID is
the same for every request from that user, it ensures that the same user is
handled consistently by the same backend server. This allows for cache
optimizations on the backend, since each of the servers only serves a fixed subset
of the users, and can improve its cache-hit ratio for user-related data.
This algorithm implements the ketama principle to
maximize hashing stability and minimize consistency loss upon changes to the list
of known backends.
The input for the consistent-hashing algorithm can be one of the following options,
determined by the value set in the hash_on
parameter:
Option
|
Description
|
none
|
Doesn’t use consistent-hashing , uses round-robin instead (default). Hashing is disabled.
|
consumer
|
Uses the Consumer ID as the hash input. If no Consumer ID is available, it will fall back on the Credential ID (for example, in case of an external authentication mechanism like LDAP).
|
ip
|
Uses the originating IP address as the hash input. Review the configuration settings for determining the real IP when using this option.
|
header
|
Uses a specified header as the hash input. The header name is specified in either of the Upstream’s hash_on_header or hash_fallback_header fields, depending on whether header is a primary or fallback attribute, respectively.
|
cookie
|
Use a specified cookie with a specified path as the hash input. The cookie name is specified in the Upstream’s hash_on_cookie field and the path is specified in the Upstream’s hash_on_cookie_path field. If the specified cookie is not present in the request, it will be set by the response. The generated cookie will have a random UUID value, which is then preserved in the cookie.
The hash_fallback setting is invalid and can’t be used if cookie is the primary hashing mechanism.
|
The consistent-hashing
algorithm supports a primary and a fallback hashing attribute.
If the primary fails (for example, if the primary is set to consumer
, but no Consumer is authenticated),
the fallback attribute is used. This maximizes upstream cache hits.
The consistent-hashing balancer is designed to work both with a single node as well
as in a cluster.
When choosing this algorithm, consider the following:
- Improves backend cache-hit ratios.
- Requires enough cardinality in the hash inputs to distribute evenly. For example, hashing on a header that only has 2 possible values doesn’t make sense.
- The cookie-based approach works well for browser-based requests, but less so for machine-to-machine (M2M) clients, which will often omit the cookie.
- When using the hashing approach in a Kong Gateway cluster, add Target entities by their IP address, and avoid using hostnames in the balancer.
The balancers will slowly diverge, as the DNS TTL only has second precision, and renewal is determined by when a name is actually requested.
Additionally, some nameservers don’t return all entries, which makes the problem worse.
This problem can be mitigated by balancer rebuilds and higher TTL settings.
The least-connections
algorithm keeps track of the number of in-flight requests for each backend.
The weights are used to calculate the connection capacity of a backend. Requests are
routed towards the backend with the highest spare capacity.
When choosing this algorithm, consider the following:
- Provides good distribution of traffic.
- Doesn’t improve cache-hit ratios.
- This option is more dynamic, since slower backends will have more connections open, and
new requests will be routed to other backends automatically.
The latency
algorithm is based on peak EWMA (Exponentially Weighted Moving Average),
which ensures that the balancer selects the backend by the lowest latency
(upstream_response_time
). The latency metric used is the full request cycle, from
TCP connect to body response time. Since it’s a moving average, the metrics will
decay over time.
Target weights aren’t taken into account.
When choosing this algorithm, consider the following:
- Provides good distribution of traffic, provided there is enough base load to keep the metrics alive, since they are always decaying.
- The algorithm is very dynamic, since it will constantly optimize loads.
- Latency-based load balancing works best with low variance in latencies, meaning mostly similar-shaped traffic and even workloads for the backends.
For example, using this algorithm with a GraphQL backend serving small-fast queries as well big-slow ones will result in high variance in the latency metrics, which will skew the metrics.
- You must properly set up the backend capacity and ensure proper network latency to prevent resource starvation.
For example, you could use 2 servers: a small capacity server close by (low network latency), and a high capacity server far away (high latency).
Most traffic will be routed to the small one until its latency starts going up.
However, the latency going up means the small server is likely suffering from resource starvation.
In this case, the algorithm will keep the small server in a constant state of resource starvation, which is most likely not efficient.
- This option is not suitable for long-lived connections like websockets or server-sent events (SSE).