Upstreams

Uses: Kong Gateway Admin API decK KIC Terraform

What is an Upstream?

An Upstream enables load balancing by providing a virtual hostname and collection of Targets, or upstream service instances, to which client requests are forwarded.

You can use Upstreams to health check, circuit break, and load balance incoming requests over multiple Gateway Services. In addition, the Upstream entity has more advanced functionality algorithms like least-connections, consistent-hashing, and lowest-latency.

Upstream and Gateway Service interaction

You can configure a Service to point to an Upstream instead of a host. For example, if you have a Service called example_service and an Upstream called example_upstream, you can point example_service to example_upstream instead of specifying a host. The example_upstream Upstream can then point to two different Targets: httpbin.konghq.com and httpbun.com. In a real environment, the Upstream points to the same Service running on multiple systems.

This setup allows you to load balance between upstream targets. For example, if a upstream service is deployed across two different servers or upstream targets, Kong Gateway needs to load balance across both servers. If one of the servers (like httpbin.konghq.com in the previous example) is unavailable, it automatically detects the problem and routes all traffic to the working server (httpbun.com).

The following diagram shows how Upstreams interact with other Kong Gateway entities:

 
flowchart LR

  A("Request")
  B("`Route 
  (/mock)`")
  C("`Service
  (example_service)`")
  D("Target:
  httpbin.konghq.com")
  E("Target:
  httpbun.com")
  F(Upstream service:
  httpbin.konghq.com)
  G(Upstream service:
  httpbun.com)

  A --> B
  subgraph id1 ["`**KONG GATEWAY**`"]
    B --> C --> D & E
  subgraph id3 ["`**Upstream** (load balancer)`"]
  
    D & E
  end

  end

  subgraph id2 ["`**Target upstream services**`"]
    D --> F
    E --> G

  end

  style id2 stroke:none!important
  

Use cases for Upstreams

The following are examples of common use cases for Upstreams:

Use case

Description

Load balance When an Upstream points to multiple upstream targets, you can configure the Upstream entity to load balance traffic between the targets. If you don’t need to load balance, we recommend using the host header on a Route as the preferred method for routing a request and proxying traffic.
Health check Configure Upstreams to dynamically mark a target as healthy or unhealthy. This is an active check where a specific HTTP or HTTPS endpoint in the target is periodically requested and the health of the target is determined based on its response.
Circuit break Configure Upstreams to allow Kong Gateway to passively analyze the ongoing traffic being proxied and determine the health of targets based on their behavior responding to requests. This feature is not supported in Konnect or hybrid mode.

Load balancing algorithms

The load balancer supports the following load balancing algorithms:

  • round-robin
  • consistent-hashing
  • least-connections
  • latency

Note: If using health checks, unhealthy Targets won’t be removed from the load balancer, and won’t have any impact on the balancer layout when using a hashing algorithm. Instead, unhealthy Targets will just be skipped.

Round-robin

The round-robin algorithm is done in a weighted manner. It provides identical results to the default DNS based load balancing, but due to it being an upstream, the additional features for health checks and circuit breakers are also available.

When choosing this algorithm, consider the following:

  • Provides good distribution of requests.
  • Remains fairly static, as only DNS updates or Target updates can influence the distribution of traffic.
  • Doesn’t improve cache-hit ratios.

Consistent-hashing

With the consistent-hashing algorithm, a configurable client input is used to calculate a hash value. This hash value is then tied to a specific backend server.

A common example would be to use the consumer as a hash input. Since this ID is the same for every request from that user, it ensures that the same user is handled consistently by the same backend server. This allows for cache optimizations on the backend, since each of the servers only serves a fixed subset of the users, and can improve its cache-hit ratio for user-related data.

This algorithm implements the ketama principle to maximize hashing stability and minimize consistency loss upon changes to the list of known backends.

The input for the consistent-hashing algorithm can be one of the following options, determined by the value set in the hash_on parameter:

Option

Description

none Doesn’t use consistent-hashing, uses round-robin instead (default). Hashing is disabled.
consumer Uses the Consumer ID as the hash input. If no Consumer ID is available, it will fall back on the Credential ID (for example, in case of an external authentication mechanism like LDAP).
ip Uses the originating IP address as the hash input. Review the configuration settings for determining the real IP when using this option.
header Uses a specified header as the hash input. The header name is specified in either of the Upstream’s hash_on_header or hash_fallback_header fields, depending on whether header is a primary or fallback attribute, respectively.
cookie Use a specified cookie with a specified path as the hash input. The cookie name is specified in the Upstream’s hash_on_cookie field and the path is specified in the Upstream’s hash_on_cookie_path field. If the specified cookie is not present in the request, it will be set by the response. The generated cookie will have a random UUID value, which is then preserved in the cookie.

The hash_fallback setting is invalid and can’t be used if cookie is the primary hashing mechanism.

The consistent-hashing algorithm supports a primary and a fallback hashing attribute. If the primary fails (for example, if the primary is set to consumer, but no Consumer is authenticated), the fallback attribute is used. This maximizes upstream cache hits.

The consistent-hashing balancer is designed to work both with a single node as well as in a cluster.

When choosing this algorithm, consider the following:

  • Improves backend cache-hit ratios.
  • Requires enough cardinality in the hash inputs to distribute evenly. For example, hashing on a header that only has 2 possible values doesn’t make sense.
  • The cookie-based approach works well for browser-based requests, but less so for machine-to-machine (M2M) clients, which will often omit the cookie.
  • When using the hashing approach in a Kong Gateway cluster, add Target entities by their IP address, and avoid using hostnames in the balancer. The balancers will slowly diverge, as the DNS TTL only has second precision, and renewal is determined by when a name is actually requested. Additionally, some nameservers don’t return all entries, which makes the problem worse. This problem can be mitigated by balancer rebuilds and higher TTL settings.

Least-connections

The least-connections algorithm keeps track of the number of in-flight requests for each backend. The weights are used to calculate the connection capacity of a backend. Requests are routed towards the backend with the highest spare capacity.

When choosing this algorithm, consider the following:

  • Provides good distribution of traffic.
  • Doesn’t improve cache-hit ratios.
  • This option is more dynamic, since slower backends will have more connections open, and new requests will be routed to other backends automatically.

Latency

The latency algorithm is based on peak EWMA (Exponentially Weighted Moving Average), which ensures that the balancer selects the backend by the lowest latency (upstream_response_time). The latency metric used is the full request cycle, from TCP connect to body response time. Since it’s a moving average, the metrics will decay over time.

Target weights aren’t taken into account.

When choosing this algorithm, consider the following:

  • Provides good distribution of traffic, provided there is enough base load to keep the metrics alive, since they are always decaying.
  • The algorithm is very dynamic, since it will constantly optimize loads.
  • Latency-based load balancing works best with low variance in latencies, meaning mostly similar-shaped traffic and even workloads for the backends. For example, using this algorithm with a GraphQL backend serving small-fast queries as well big-slow ones will result in high variance in the latency metrics, which will skew the metrics.
  • You must properly set up the backend capacity and ensure proper network latency to prevent resource starvation. For example, you could use 2 servers: a small capacity server close by (low network latency), and a high capacity server far away (high latency). Most traffic will be routed to the small one until its latency starts going up. However, the latency going up means the small server is likely suffering from resource starvation. In this case, the algorithm will keep the small server in a constant state of resource starvation, which is most likely not efficient.
  • This option is not suitable for long-lived connections like websockets or server-sent events (SSE).

Schema

Set up an Upstream

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!
OSZAR »