Mastering AKS Upgrades: Strategies for Zero Downtime
Upgrading your Azure Kubernetes Service (AKS) cluster is essential for maintaining security, performance, and access to new features. However, it can also lead to downtime if not managed properly. This is where understanding the upgrade options and configurations becomes vital. You need to ensure that your cluster remains healthy and operational throughout the upgrade process.
AKS performs pre-upgrade validations to check cluster health, including API breaking changes and valid upgrade paths. Two critical parameters to consider are maxSurge and maxUnavailable. maxSurge defines how many additional nodes can be created during an upgrade, while maxUnavailable specifies how many nodes can be offline. Additionally, the Pod Disruption Budget (PDB) helps manage the number of pods that can be down during voluntary disruptions, ensuring that your applications remain available. You can configure these settings using commands like az aks nodepool update to set parameters such as --max-surge and --max-blocked-nodes.
In production, you must be cautious with the force upgrade option, as it bypasses PDB constraints and can drain all pods simultaneously, leading to service disruptions. Always verify your PDB settings before opting for a force upgrade. Ensure you are using Azure CLI version 2.79.0 or later to take advantage of these features. The undrainable-node-behavior setting is also crucial; it defaults to 'Schedule', but you can change it to 'Cordon' to prevent scheduling new pods on nodes that can't be drained. This can help maintain stability during upgrades.
Key takeaways
- →Configure maxSurge to control the number of surge nodes during upgrades.
- →Set maxUnavailable to limit the number of nodes that can be offline during the upgrade.
- →Use Pod Disruption Budgets (PDB) to manage pod availability during disruptions.
- →Beware of the force upgrade option; it can cause service disruptions by ignoring PDB constraints.
- →Ensure you have the latest Azure CLI aks-preview extension for advanced features.
Why it matters
Properly managing AKS upgrades minimizes downtime and service disruptions, which is critical for maintaining user trust and operational efficiency in production environments.
Code examples
1az aks upgrade \
2 --name $CLUSTER_NAME \
3 --resource-group $RESOURCE_GROUP_NAME \
4 --kubernetes-version $KUBERNETES_VERSION \
5 --enable-force-upgrade \
6 --upgrade-override-until 2023-10-01T13:00:00Z1az aks nodepool update \
2 --resource-group <resource-group-name> \
3 --cluster-name <cluster-name> \
4 --name <node-pool-name> \
5 --undrainable-node-behavior Cordon \
6 --max-blocked-nodes 2 \
7 --drain-timeout 301az aks nodepool update \
2 --cluster-name jizenMC1 \
3 --name nodepool1 \
4 --resource-group jizenTestMaxBlockedNodesRG \
5 --max-surge 1 \
6 --undrainable-node-behavior Cordon \
7 --max-blocked-nodes 2 \
8 --drain-timeout 5When NOT to use this
The official docs don't call out specific anti-patterns here. Use your judgment based on your scale and requirements.
Want the complete reference?
Read official docsSimple, affordable cloud — VMs, Kubernetes, and managed databases in minutes. Trusted by 600,000+ developers. Spin up a Droplet in 60 seconds.
Try DigitalOcean →Mastering AKS Node Pool Snapshots: A Game Changer for Cluster Management
Node pool snapshots in Azure Kubernetes Service (AKS) are a powerful feature that can streamline your cluster management. By capturing the configuration of your node pool, you can easily create new node pools or clusters. This article dives into how to leverage this capability effectively.
Unlocking Azure Kubernetes Service with Microsoft Entra Workload ID
Integrating Microsoft Entra Workload ID with AKS transforms how your workloads authenticate to Azure services. This approach leverages OpenID Connect for secure access, streamlining identity management in Kubernetes environments. Dive in to learn how to implement this effectively in production.
Mastering AKS: Best Practices for Cluster Operators and Developers
Building and managing applications on Azure Kubernetes Service (AKS) requires a solid grasp of best practices. From multi-tenancy to securing your API server, these strategies are essential for operational excellence. Dive in to learn how to effectively manage your clusters and applications.
Get the daily digest
One email. 5 articles. Every morning.
No spam. Unsubscribe anytime.