Mastering Node-Pressure Eviction in Kubernetes
Node-pressure eviction exists to prevent resource starvation in your Kubernetes cluster. When nodes run low on critical resources like memory or disk space, the kubelet steps in to terminate pods proactively. This mechanism ensures that your applications remain responsive and that the cluster operates smoothly, even under heavy load.
The kubelet continuously monitors resource usage on nodes. It uses eviction signals to determine when to act. If memory usage exceeds defined thresholds—like memory.available<1Gi—the kubelet will fail selected pods, setting their phase to Failed. You can configure soft and hard eviction thresholds. Soft thresholds allow for a grace period, while hard thresholds trigger immediate termination. For example, you might set memory.available<10% as a hard threshold to ensure that memory availability never drops too low.
In production, understanding the nuances of node-pressure eviction is essential. The kubelet does not respect PodDisruptionBudgets during these evictions, which can lead to unexpected behavior if you rely on them for availability. Be aware that the CommitLimit can change based on the node's page-file size, which might affect your eviction strategy. As of Kubernetes v1.31, this feature is enabled by default, but ensure you’re aware of version-specific requirements, like enabling the KubeletSeparateDiskGC feature gate in v1.35.
Key takeaways
- →Configure soft eviction thresholds to manage pod terminations gracefully.
- →Set hard eviction thresholds to prevent critical resource exhaustion.
- →Monitor `memory.available` to anticipate node-pressure eviction events.
- →Understand that PodDisruptionBudgets are ignored during node-pressure evictions.
- →Be aware of the impact of CommitLimit changes on resource management.
Why it matters
Node-pressure eviction directly impacts application availability and performance. Properly managing this process can prevent downtime and ensure that your services remain responsive under load.
Code examples
memory.available<1Gimemory.available<10%memory.available=1m30sWhen NOT to use this
The official docs don't call out specific anti-patterns here. Use your judgment based on your scale and requirements.
Want the complete reference?
Read official docsUnified observability — logs, uptime monitoring, and on-call in one place. Used by 50,000+ engineering teams to ship faster and sleep better.
Try Better Stack free →Mastering Workload-Aware Scheduling in Kubernetes v1.36
Kubernetes v1.36 introduces powerful workload-aware scheduling features that can transform how you deploy applications. With the new Workload and PodGroup APIs, you can prevent resource wastage and deadlocks through gang scheduling. This is a game changer for managing complex workloads effectively.
Unlocking Kubernetes v1.36: PSI Metrics for Proactive Resource Management
Kubernetes v1.36 introduces Pressure Stall Information (PSI) metrics, a game changer for monitoring resource saturation. With cumulative totals and moving averages, you can now detect issues before they escalate into outages.
Unlocking Kubernetes v1.36: Dynamic Resource Allocation and Its Game-Changing Features
Kubernetes v1.36 introduces Dynamic Resource Allocation (DRA), revolutionizing how you manage hardware accelerators. With features like prioritized lists and device taints, you can optimize resource utilization and improve system reliability.
Get the daily digest
One email. 5 articles. Every morning.
No spam. Unsubscribe anytime.