kubernetesautoscalingPractitioner

HPA in Production: What the Docs Don't Tell You

5 min read Kubernetes DocsApr 21, 2026Reviewed for accuracy

Practitioner — Hands-on experience recommended

Horizontal Pod Autoscaling exists to address the challenge of fluctuating workloads in Kubernetes. As demand increases, you need a way to automatically scale your Pods without manual intervention. HPA does just that by adjusting the number of Pods based on real-time metrics, ensuring your application remains responsive and resource-efficient.

Kubernetes implements HPA as a control loop that runs intermittently, with a default sync period of 15 seconds. During each cycle, the controller manager queries resource utilization against the metrics defined in your HorizontalPodAutoscaler. It identifies the target resource via scaleTargetRef, selects the appropriate Pods using the .spec.selector labels, and retrieves metrics from either the resource metrics API or custom metrics API. Key configuration parameters include --horizontal-pod-autoscaler-sync-period, which sets the control loop interval, and --horizontal-pod-autoscaler-downscale-stabilization, which manages how quickly you can scale down after a spike in demand.

In production, remember that if your Pods lack defined resource requests, the autoscaler won't act on CPU metrics. This can lead to unexpected behavior during load changes. Also, HPA does not apply to non-scalable objects like DaemonSets, so plan your architecture accordingly. Understanding these nuances will help you leverage HPA effectively and avoid common pitfalls that can lead to performance degradation or resource wastage.

Key takeaways

→Configure the sync period with --horizontal-pod-autoscaler-sync-period to optimize responsiveness.
→Set resource requests for all containers to ensure accurate CPU utilization metrics.
→Use --horizontal-pod-autoscaler-downscale-stabilization to manage scaling behavior after demand spikes.
→Avoid using HPA for non-scalable objects like DaemonSets.

Why it matters

In real production environments, effective use of HPA can dramatically enhance application performance and resource utilization, reducing costs while maintaining service quality during traffic spikes.

Code examples

plaintext

--horizontal-pod-autoscaler-sync-period

plaintext

--horizontal-pod-autoscaler-downscale-stabilization

When NOT to use this

Horizontal Pod Autoscaling does not apply to objects that can't be scaled, such as DaemonSets. If your application architecture includes these, consider alternative scaling strategies.

Want the complete reference?

Read official docs

Test what you just learned

Quiz questions written from this article

Take the quiz →

Better StackSponsor

Unified observability — logs, uptime monitoring, and on-call in one place. Used by 50,000+ engineering teams to ship faster and sleep better.

Try Better Stack free →

HPA in Production: What the Docs Don't Tell You

Key takeaways

Why it matters

Code examples

When NOT to use this

More on this topic

Scaling StarRocks on EKS: Harnessing KEDA and Karpenter for OLAP Power

Scaling StarRocks on EKS: Harnessing KEDA and Karpenter for OLAP Efficiency

GPU Autoscaling in Kubernetes: Mastering KEDA with External Scalers