Mastering Node Readiness Controller: Ensuring Node Health in Kubernetes
The Node Readiness Controller exists to solve a critical problem in Kubernetes: ensuring that workloads are only placed on nodes that meet specific infrastructure requirements. Traditional readiness checks can fall short, especially during node bootstrapping. This controller enhances the readiness guarantee by dynamically managing taints based on custom health signals, thus preventing workloads from being scheduled on nodes that are not yet ready.
At its core, the Node Readiness Controller revolves around the NodeReadinessRule (NRR) API. This allows you to define declarative gates for your nodes. You can set it up in two operational modes: 'continuous enforcement' for ongoing checks or 'bootstrap-only enforcement' for one-time initialization steps. The controller reacts to Node Conditions, which means it doesn't perform health checks itself but relies on existing conditions to determine readiness. For example, you can create a rule that specifies a condition type like 'cniplugin.example.net/NetworkReady' and requires its status to be 'True'. If the condition is not met, the controller applies a taint, such as 'readiness.k8s.io/acme.com/network-unavailable', with an effect of 'NoSchedule' to prevent scheduling on that node.
In production, deploying new readiness rules carries inherent risks, especially across a fleet of nodes. You need to be cautious about the implications of taints and ensure that your conditions are correctly defined. The dry run mode can be a lifesaver here, allowing you to simulate the impact of your rules before applying them. Remember, this controller is set to be available starting February 3, 2026, so plan your upgrades accordingly.
Key takeaways
- →Define NodeReadinessRule (NRR) to set custom readiness gates for your nodes.
- →Choose between 'continuous enforcement' and 'bootstrap-only enforcement' based on your needs.
- →Utilize dry run mode to simulate impacts before applying taints to your nodes.
- →React to Node Conditions instead of performing health checks directly.
- →Be cautious when deploying new readiness rules across a fleet.
Why it matters
In production, ensuring that workloads are only scheduled on fully prepared nodes can significantly reduce downtime and improve application reliability. The Node Readiness Controller helps maintain this readiness throughout the node's lifecycle.
Code examples
1apiVersion: readiness.node.x-k8s.io/v1alpha1
2kind: NodeReadinessRule
3metadata:
4 name: network-readiness-rule
5spec:
6 conditions:
7 - type: "cniplugin.example.net/NetworkReady"
8 requiredStatus: "True"
9 taint:
10 key: "readiness.k8s.io/acme.com/network-unavailable"
11 effect: "NoSchedule"
12 value: "pending"
13enforcementMode: "bootstrap-only"
14nodeSelector:
15 matchLabels:
16 node-role.kubernetes.io/worker: ""When NOT to use this
The official docs don't call out specific anti-patterns here. Use your judgment based on your scale and requirements.
Want the complete reference?
Read official docsBuilding a Memcached Operator with Go: A Practical Guide
Operators are a powerful way to extend Kubernetes, and building one with Go can streamline your application management. This guide walks you through creating a Memcached operator, focusing on the Custom Resource Definition (CRD) and the controller's role in reconciliation.
Mastering Admission Control in Kubernetes: What You Need to Know
Admission control is a critical gatekeeper in Kubernetes, ensuring that only valid requests reach your cluster. Understanding the difference between mutating and validating admission controllers can save you from costly misconfigurations.
CustomResourceDefinitions: Extending Kubernetes for Your Needs
Unlock the power of Kubernetes by extending its API with CustomResourceDefinitions (CRDs). Learn how to create custom resources that fit your application’s specific requirements, including namespaced and cluster-scoped options.
Get the daily digest
One email. 5 articles. Every morning.
No spam. Unsubscribe anytime.