KEDA vs HPA: When to Use Which - Real Production Lessons
Published by Vladyslav Ratslav · Cloud Architect · January 2026
Also published on LinkedIn: Read on LinkedIn
Several years ago, after I had almost single‑handedly migrated Cloudbeds from a legacy stack - Ansible + BitBucket + AWX Tower + Jenkins - to a modern GitOps‑driven platform - GitHub + EKS + Kubernetes + ArgoCD + Argo Rollouts - I ran into a scaling problem I didn’t expect.
In the old EC2 world, I relied heavily on AWS ALB metrics, especially connection count, to autoscale NGINX‑based services. But once the first pilot microservice went to real testing on Kubernetes, I realized something important:
Kubernetes HPA cannot scale based on ALB connections or most external metrics.
And that limitation matters a lot when your workloads depend on real traffic behavior, not just CPU or memory.
The Search for a Better Autoscaler
A quick search pointed me to two options:
- Prometheus Adapter
- KEDA
My initial instinct was to use Prometheus Adapter. I tested it briefly, and it worked fine for basic custom metrics. But then I remembered: we also needed to scale based on AWS SQS, AWS CloudWatch metrics, and other external signals.
That’s when the choice became obvious.
KEDA was built for this.
So I moved forward with KEDA.
Step 1. Porting HPA to KEDA (Ridiculously Easy)
My first step was simply porting the existing HPA configuration to KEDA.
Original HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 60
Converted to KEDA
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: my-app-scaledobject
spec:
scaleTargetRef:
name: my-app
minReplicaCount: 2
maxReplicaCount: 10
triggers:
- type: cpu
metadata:
type: Utilization
value: "50"
- type: memory
metadata:
type: Utilization
value: "60"
As you can see - it took minutes. KEDA doesn’t fight HPA; it extends it.
Step 2. Scaling NGINX Based on Real Traffic (Connections / RPS)
This is where things got interesting.
For NGINX, the most important metric is connections / requests per second, not CPU. So I needed to expose those metrics from inside NGINX.
Expose NGINX internals via stub_status
server {
listen 8080;
location /stub_status {
stub_status on;
allow 127.0.0.1; # Limit access to the exporter
deny all;
}
}
Add NGINX Prometheus Exporter as a sidecar
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx-app
spec:
template:
metadata:
labels:
app: nginx
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9113"
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
- name: nginx-exporter
image: nginx/nginx-prometheus-exporter:1.5.1
args:
- "--nginx.scrape-uri=http://localhost:8080/stub_status"
ports:
- containerPort: 9113
name: metrics
Prometheus automatically scrapes the exporter thanks to the annotations. In Grafana, you can now see all NGINX metrics - and the one I needed was:
nginx_http_requests_total
Step 3. Add KEDA Trigger for NGINX RPS
Here’s the final piece:
triggers:
...
- type: prometheus
metadata:
serverAddress: http://prometheus-server.monitoring.svc.cluster.local:9090
metricName: nginx_http_requests_total
threshold: "10"
query: |
sum(rate(nginx_http_requests_total{app="nginx"}[1m]))
And that’s it.
How the Scaling Logic Works
KEDA applies a simple formula:
number of pods = total RPS / threshold
So:
- If RPS = 10 → 1 pod
- If RPS = 60 → 6 pods
- If RPS = 300 → 30 pods
This is exactly how real traffic behaves - and exactly what HPA cannot do without heavy customization.
Why Autoscaling Matters More Than People Think
Don’t forget: a well‑tuned autoscaler directly saves you money and protects your business.
- Why keep extra pods running when traffic is low? That’s literally burning money for no reason.
- And on the other side: why should your site go down just because all threads are busy and no new pods were started in time? “Servers not responding” often means your autoscaler reacted too late.
Autoscaling isn’t just a technical feature - it’s a financial and reliability safeguard.
Final Thoughts - When to Use HPA vs KEDA
Use HPA when:
- You only need CPU / memory autoscaling
- Your workloads are simple
- You want minimal moving parts
Use KEDA when:
- You need to scale on real business metrics
- You rely on Prometheus, CloudWatch, SQS, Kafka, etc.
- You want scale-to-zero
- You want autoscaling that reflects actual traffic behavior
In real production environments, especially at scale, KEDA becomes the more powerful and flexible choice.