Custom Kubernetes Scaling via Envoy Metrics

The Descartes Labs Platform runs on Kubernetes and scales from hundreds to tens of thousands of cores in response to customer traffic.

Much of this load comes from our Tasks service, which allows users to scale out analytics models with high throughput access to petabytes of geospatial data. The bulk of the heavy lifting behind the retrieval, transformation, and delivery of this data is handled by our Raster service.

Scaling Challenges

Raster can be called directly from our Python client or via RESTful API and response times can vary significantly depending on the nature of the request.

Our original approach to scaling Raster used a standard horizontal pod autoscaler (HPA) that tracked CPU utilization per pod. Unfortunately, variation in compute characteristics (requests could be I/O- or CPU-bound) made CPU utilization a poor indicator, and we needed a low threshold to stay ahead of the load.

ercentiles — Raster request latencies at a random moment in time

not ideal cpu — CPU utilization was not a great indicator of load

Variation in both the nature and duration of requests meant that scaling based on request rate was also not ideal.

We have been using Istio for a long time and took note of Istio metrics-based autoscaling, but these higher level metrics (i.e., labeled request counts, duration, rate) were not an obvious fit for our service.

Tapping into Envoy Metrics

Fortunately for us, Envoy, the sidecar proxy used by Isti, allows us to directly measure current saturation of our Raster service. From the Envoy docs:

upstream_rq_active — Gauge — Total active requests

By summing the upstream_rq_active across all Raster pods we get an effective measure of how many requests are currently being handled by our service.

To allow Kubernetes to scale on this metric we installed the Zalando kube-metric-adapter as packaged by Stefan Prodan. We could technically configure the metric adapter to scrape this metric directly from Envoy using the JSON stats endpoint, but it made more sense to let our existing Prometheus infrastructure handle scraping and aggregation.

Update: After upgrading to Istio 1.1.x we found that the specific Envoy metric we leverage is no longer exposed by default. To configure Envoy to provide the metric you must now add the following to the pod template of the deployment you are scaling:

template: metadata: annotations: sidecar.istio.io/statsInclusionPrefixes: cluster.inbound,cluster_manager,listener_manager,http_mixer_filter,t cp_mixer_filter,server,cluster.xds-grpc

Prometheus Configuration

In Prometheus this metric is surfaced as:

envoy_cluster_upstream_rq_active

(this metric is also exposed via envoy_cluster_upstream_cx_active ).

Number of active connections to **Raster** as reported by **Envoy's upstream_rq_active** metric

Unfortunately the Istio bundled Prometheus configuration scrapes but drops this number. To retain it you must modify or remove the following lines from the Prometheus config.

- source_labels: [ cluster_name ] regex: '(outbound|inbound|prometheus_stats).*' action: drop

Implementing Our Custom Autoscaler

With the number of active requests per pod being tracked by Prometheus we could then implement our custom HPA:

apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: raster-release namespace: raster annotations: metric-config.object.raster-release-rq-active.prometheus/per-replica: "true" metric-config.object.raster-release-rq-active.prometheus/query:| sum(max_over_time(envoy_cluster_upstream_rq_active{app="raster",cluster_name="inbound|8000|http|raster-release.raster.svc.cluster.local", namespace="raster",stage="release"}[1m])) spec: maxReplicas: 1500 minReplicas: 12 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: raster-release metrics: - type: Object object: metricName: raster-release-rq-active target: apiVersion: v1 kind: Pod name: raster-release # needed for schema consistency targetValue: 2

The critical part of this HPA config is the annotation block where we:

Provide a consistent label for the metric:raster-release-rq-active
Tell the metrics adapter to normalize the metric with respect to the number of pods:per-replica: “true"
Provide a PromQL query that returns the sum of the max number of active requests per Raster pod over the last minute.

Raster can handle four concurrent requests per pod, so we set the targetValue to two active requests per pod.

In our small-scale testing we found that simply taking the sum of envoy_cluster_upstream_rq_active yielded accurate numbers, but when we tested with production traffic (yay for Istio traffic mirroring!) and large numbers of pods, we needed to use a window of at least one minute to get consistent numbers.

Does It Work?

As shown below, our rq_active HPA roughly halved the requested resources. Even accounting for the delay introduced between Prometheus scraping Envoy and the metrics adapter querying Prometheus, we still get more responsive scaling than using CPU utilization, resulting in a lower 503 rate overall. We saw these trends continue once we applied our custom HPA in production.

cpu scaling — Requested cores for raster-release (using CPU scaling) vs raster-mirror (using rq_activ)

Final Thoughts

Istio and Envoy made collecting telemetry and safely testing with production traffic (via mirroring) simple.
We’re now rolling this methodology out to multiple services throughout our stack.
How often do you get to improve quality of service and reduce costs in the process?

Custom Kubernetes Scaling via Envoy Metrics

Scaling Challenges

Tapping into Envoy Metrics

Prometheus Configuration

Implementing Our Custom Autoscaler

Does It Work?

Final Thoughts

Related posts

Takeaways from Spinnaker Summit 2018

Apply for Our Impact Science Program and Get Your Hands on...

Scaling Spatial Analysis with The Trust for Public Land

Searching the World Wide World