OpsCanary
kubernetesschedulingPractitioner

Unlocking Kubernetes v1.36: PSI Metrics for Proactive Resource Management

5 min read Kubernetes BlogMay 12, 2026Reviewed for accuracy
Share
PractitionerHands-on experience recommended

Kubernetes v1.36 brings PSI metrics into general availability, addressing a critical need for proactive resource management. PSI provides high-fidelity signals that help you identify resource saturation before it leads to an outage. This capability is essential for maintaining application performance and reliability in production environments.

The Kubelet now detects OS-level PSI support through cgroup configurations, ensuring that pressure metrics are only collected when supported by the node. This means cleaner data for your monitoring and alerting systems. To take advantage of these metrics, ensure your nodes are running a Linux kernel version 4.20 or later and using cgroup v2. Additionally, your kernel must be compiled with CONFIG_PSI=y, and the system must not be booted with the psi=0 parameter.

In production, you can query PSI metrics using a simple command. For example, use kubectl get --raw to access the stats summary for a specific container. Be cautious, as proxying to the kubelet is a privileged operation that requires appropriate administrative permissions. As of v1.36, you no longer need to opt in to any feature gate, making it easier to leverage these metrics in your workflows.

Key takeaways

  • Leverage PSI metrics to identify resource saturation before outages occur.
  • Ensure your kernel is compiled with CONFIG_PSI=y to collect accurate PSI data.
  • Use moving averages (10s, 60s, 300s) to differentiate between transient spikes and sustained resource tension.
  • Query PSI metrics with `kubectl get --raw` for real-time insights into container performance.
  • Be aware of security risks when proxying to the kubelet; use appropriate permissions.

Why it matters

In production, PSI metrics can significantly reduce downtime by allowing you to anticipate and address resource issues before they impact users. This proactive approach enhances overall system reliability.

Code examples

Bash
CONTAINER_NAME="example-container"
kubectl get --raw "/api/v1/nodes/$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')/proxy/stats/summary" | jq '.pods[].containers[] | select(.name=="$CONTAINER_NAME") | {name, cpu: .cpu.psi, memory: .memory.psi, io: .io.psi}'

When NOT to use this

The official docs don't call out specific anti-patterns here. Use your judgment based on your scale and requirements.

Want the complete reference?

Read official docs

Test what you just learned

Quiz questions written from this article

Take the quiz →
Better StackSponsor

Unified observability — logs, uptime monitoring, and on-call in one place. Used by 50,000+ engineering teams to ship faster and sleep better.

Try Better Stack free →

Get the daily digest

One email. 5 articles. Every morning.

No spam. Unsubscribe anytime.