Vladyslav Ratslav

Cloud Architect · DevOps · MLOps · SRE Consultant
article preview

KEDA vs HPA: When to Use Which - Real Production Lessons

Published by Vladyslav Ratslav · Cloud Architect · January 2026

Also published on LinkedIn: Read on LinkedIn

Several years ago, after I had almost single‑handedly migrated Cloudbeds from a legacy stack - Ansible + BitBucket + AWX Tower + Jenkins - to a modern GitOps‑driven platform - GitHub + EKS + Kubernetes + ArgoCD + Argo Rollouts - I ran into a scaling problem I didn’t expect.

In the old EC2 world, I relied heavily on AWS ALB metrics, especially connection count, to autoscale NGINX‑based services. But once the first pilot microservice went to real testing on Kubernetes, I realized something important:

Kubernetes HPA cannot scale based on ALB connections or most external metrics.

And that limitation matters a lot when your workloads depend on real traffic behavior, not just CPU or memory.

The Search for a Better Autoscaler

A quick search pointed me to two options:

  • Prometheus Adapter
  • KEDA

My initial instinct was to use Prometheus Adapter. I tested it briefly, and it worked fine for basic custom metrics. But then I remembered: we also needed to scale based on AWS SQS, AWS CloudWatch metrics, and other external signals.

That’s when the choice became obvious.

KEDA was built for this.

So I moved forward with KEDA.

Step 1. Porting HPA to KEDA (Ridiculously Easy)

My first step was simply porting the existing HPA configuration to KEDA.

Original HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 60
          

Converted to KEDA

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-app-scaledobject
spec:
  scaleTargetRef:
    name: my-app
  minReplicaCount: 2
  maxReplicaCount: 10
  triggers:
  - type: cpu
    metadata:
      type: Utilization
      value: "50"
  - type: memory
    metadata:
      type: Utilization
      value: "60"
          

As you can see - it took minutes. KEDA doesn’t fight HPA; it extends it.

Step 2. Scaling NGINX Based on Real Traffic (Connections / RPS)

This is where things got interesting.

For NGINX, the most important metric is connections / requests per second, not CPU. So I needed to expose those metrics from inside NGINX.

Expose NGINX internals via stub_status

server {
    listen 8080;
    location /stub_status {
        stub_status on;
        allow 127.0.0.1; # Limit access to the exporter
        deny all;
    }
}
          

Add NGINX Prometheus Exporter as a sidecar

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx-app
spec:
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9113"
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

      - name: nginx-exporter
        image: nginx/nginx-prometheus-exporter:1.5.1
        args:
          - "--nginx.scrape-uri=http://localhost:8080/stub_status"
        ports:
        - containerPort: 9113
          name: metrics
          

Prometheus automatically scrapes the exporter thanks to the annotations. In Grafana, you can now see all NGINX metrics - and the one I needed was:

nginx_http_requests_total

Step 3. Add KEDA Trigger for NGINX RPS

Here’s the final piece:

triggers:
...
- type: prometheus
  metadata:
    serverAddress: http://prometheus-server.monitoring.svc.cluster.local:9090
    metricName: nginx_http_requests_total
    threshold: "10"
    query: |
      sum(rate(nginx_http_requests_total{app="nginx"}[1m]))
          

And that’s it.

How the Scaling Logic Works

KEDA applies a simple formula:

number of pods = total RPS / threshold

So:

  • If RPS = 101 pod
  • If RPS = 606 pods
  • If RPS = 30030 pods

This is exactly how real traffic behaves - and exactly what HPA cannot do without heavy customization.

Why Autoscaling Matters More Than People Think

Don’t forget: a well‑tuned autoscaler directly saves you money and protects your business.

  • Why keep extra pods running when traffic is low? That’s literally burning money for no reason.
  • And on the other side: why should your site go down just because all threads are busy and no new pods were started in time? “Servers not responding” often means your autoscaler reacted too late.

Autoscaling isn’t just a technical feature - it’s a financial and reliability safeguard.

Final Thoughts - When to Use HPA vs KEDA

Use HPA when:

  • You only need CPU / memory autoscaling
  • Your workloads are simple
  • You want minimal moving parts

Use KEDA when:

  • You need to scale on real business metrics
  • You rely on Prometheus, CloudWatch, SQS, Kafka, etc.
  • You want scale-to-zero
  • You want autoscaling that reflects actual traffic behavior

In real production environments, especially at scale, KEDA becomes the more powerful and flexible choice.