HPA in Production: What the Docs Don't Tell You
In Kubernetes, managing workloads efficiently is vital for maintaining performance under varying loads. The Horizontal Pod Autoscaler (HPA) exists to automatically adjust the number of pods in a deployment or stateful set based on real-time demand. This means your application can scale up during peak usage and scale down when demand decreases, optimizing resource usage and cost.
The HPA works by monitoring the average CPU utilization across all pods in a deployment. By default, it targets an average CPU utilization of 50%. You can configure the minimum and maximum number of replicas, with defaults set to 1 and 10, respectively. When the load increases, the HPA controller increases the number of replicas; conversely, it scales down if the load decreases and the number of pods exceeds the minimum. To set this up, you need to enable the Metrics Server, which collects resource metrics from your cluster and exposes them via the Kubernetes API.
In production, be aware that it may take a few minutes for the number of replicas to stabilize after scaling actions. Also, if there are no clients sending requests, the current CPU consumption may show as 0%, which can be misleading. Ensure your Kubernetes server is version 1.23 or later to utilize HPA effectively. Running this on a cluster with at least two nodes is recommended to avoid control plane host issues.
Key takeaways
- →Configure HPA with a target average CPU utilization of 50% for optimal performance.
- →Set minimum and maximum replicas to control scaling behavior effectively.
- →Enable the Metrics Server to collect and expose resource metrics for HPA.
- →Monitor the stabilization time after scaling actions to manage expectations.
- →Ensure your Kubernetes version is 1.23 or later for HPA functionality.
Why it matters
In production, effective autoscaling can lead to significant cost savings and improved application performance. By dynamically adjusting resources, you can handle traffic spikes without over-provisioning.
Code examples
kubectl autoscale deployment php-apache --cpu=50% --min=1 --max=10kubectl get hpakubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"When NOT to use this
The official docs don't call out specific anti-patterns here. Use your judgment based on your scale and requirements.
Want the complete reference?
Read official docsUnified observability — logs, uptime monitoring, and on-call in one place. Used by 50,000+ engineering teams to ship faster and sleep better.
Try Better Stack free →Unlocking Efficiency: Amazon EKS Auto Mode Meets Istio Ambient Mesh
Streamline your Kubernetes workloads with the powerful combination of Amazon EKS Auto Mode and Istio Ambient Mesh. This integration automates node management while providing seamless mutual TLS encryption across your services. Discover how to leverage these technologies for enhanced security and performance.
Scaling StarRocks on EKS: Harnessing KEDA and Karpenter for OLAP Efficiency
In the world of enterprise OLAP workloads, scaling efficiently is crucial. By leveraging KEDA for autoscaling and Karpenter for node provisioning on Amazon EKS, you can dynamically adjust your StarRocks cluster to meet fluctuating query demands without data movement.
Scaling StarRocks on EKS: Harnessing KEDA and Karpenter for OLAP Power
Unlock the full potential of your OLAP workloads with StarRocks on Amazon EKS. Learn how KEDA and Karpenter enable near-instant scaling of compute resources while maintaining a cost-effective shared-data architecture.
Get the daily digest
One email. 5 articles. Every morning.
No spam. Unsubscribe anytime.