azureaksPractitioner

Mastering AKS Upgrades: Strategies for Zero Downtime

5 min read Microsoft LearnApr 26, 2026Reviewed for accuracy

Practitioner — Hands-on experience recommended

Upgrading your Azure Kubernetes Service (AKS) cluster is essential for maintaining security, performance, and access to new features. However, it can also lead to downtime if not managed properly. This is where understanding the upgrade options and configurations becomes vital. You need to ensure that your cluster remains healthy and operational throughout the upgrade process.

AKS performs pre-upgrade validations to check cluster health, including API breaking changes and valid upgrade paths. Two critical parameters to consider are maxSurge and maxUnavailable. maxSurge defines how many additional nodes can be created during an upgrade, while maxUnavailable specifies how many nodes can be offline. Additionally, the Pod Disruption Budget (PDB) helps manage the number of pods that can be down during voluntary disruptions, ensuring that your applications remain available. You can configure these settings using commands like az aks nodepool update to set parameters such as --max-surge and --max-blocked-nodes.

In production, you must be cautious with the force upgrade option, as it bypasses PDB constraints and can drain all pods simultaneously, leading to service disruptions. Always verify your PDB settings before opting for a force upgrade. Ensure you are using Azure CLI version 2.79.0 or later to take advantage of these features. The undrainable-node-behavior setting is also crucial; it defaults to 'Schedule', but you can change it to 'Cordon' to prevent scheduling new pods on nodes that can't be drained. This can help maintain stability during upgrades.

Key takeaways

→Configure maxSurge to control the number of surge nodes during upgrades.
→Set maxUnavailable to limit the number of nodes that can be offline during the upgrade.
→Use Pod Disruption Budgets (PDB) to manage pod availability during disruptions.
→Beware of the force upgrade option; it can cause service disruptions by ignoring PDB constraints.
→Ensure you have the latest Azure CLI aks-preview extension for advanced features.

Why it matters

Properly managing AKS upgrades minimizes downtime and service disruptions, which is critical for maintaining user trust and operational efficiency in production environments.

Code examples

Bash

1az aks upgrade \
2  --name $CLUSTER_NAME \
3  --resource-group $RESOURCE_GROUP_NAME \
4  --kubernetes-version $KUBERNETES_VERSION \
5  --enable-force-upgrade \
6  --upgrade-override-until 2023-10-01T13:00:00Z

Bash

1az aks nodepool update \
2  --resource-group <resource-group-name> \
3  --cluster-name <cluster-name> \
4  --name <node-pool-name> \
5  --undrainable-node-behavior Cordon \
6  --max-blocked-nodes 2 \
7  --drain-timeout 30

Bash

1az aks nodepool update \
2  --cluster-name jizenMC1 \
3  --name nodepool1 \
4  --resource-group jizenTestMaxBlockedNodesRG \
5  --max-surge 1 \
6  --undrainable-node-behavior Cordon \
7  --max-blocked-nodes 2 \
8  --drain-timeout 5