Mastering Workload-Aware Scheduling in Kubernetes v1.36
Kubernetes v1.36 enhances scheduling capabilities with the introduction of the Workload API and the PodGroup API. These features address critical issues in workload management, particularly around resource allocation and deployment consistency. The Workload API acts as a static template, while the PodGroup API defines runtime objects, allowing for more granular control over how pods are scheduled. This is essential for preventing partial deployments that can lead to resource wastage and potential deadlocks.
At the core of this advancement is the PodGroup scheduling cycle, which ensures that the kube-scheduler takes a single snapshot of the cluster state. This prevents race conditions and guarantees consistency when evaluating the entire group. The scheduler uses a dedicated algorithm that filters and scores potential node placements for all pods in the group. The scheduling decision is then applied atomically, meaning that either all pods in the group are scheduled together, or none are. You can configure parameters like minCount, which specifies the minimum number of pods that must be schedulable at once, to ensure your applications meet their resource needs efficiently.
In production, you need to be aware of several limitations. If new pods are added to a PodGroup after others are scheduled, the existing pods will not be unassigned or evicted, even if the group fails to meet its requirements later. Additionally, for heterogeneous Pod groups or those with inter-Pod dependencies, finding valid placements is not guaranteed. This can lead to challenges in complex deployments, so thorough testing and validation are crucial before rolling out these features in a live environment.
Key takeaways
- →Utilize the Workload API as a static template for better workload management.
- →Implement the PodGroup API to define runtime objects for your applications.
- →Leverage gang scheduling to prevent partial deployments and resource wastage.
- →Set the `minCount` parameter to ensure a minimum number of pods are schedulable together.
- →Be cautious with heterogeneous Pod groups and inter-Pod dependencies, as valid placements are not guaranteed.
Why it matters
These advancements in Kubernetes scheduling can significantly enhance application reliability and resource efficiency, especially in complex environments where workload management is critical.
Code examples
1apiVersion: scheduling.k8s.io/v1alpha2
2kind: Workload
3metadata:
4 name: training-job-workload
5 namespace: some-ns
6spec:
7 podGroupTemplates:
8 - name: workers
9 schedulingPolicy:
10 gang:
11 minCount: 41apiVersion: scheduling.k8s.io/v1alpha2
2kind: PodGroup
3metadata:
4 name: training-job-workers-pg
5 namespace: some-ns
6spec:
7 podGroupTemplateRef:
8 workload:
9 workloadName: training-job-workload
10 podGroupTemplateName: workers
11 schedulingPolicy:
12 gang:
13 minCount: 4
14status:
15 conditions:
16 - type: PodGroupScheduled
17 status: "True"
18 lastTransitionTime: 2026-04-03T00:00:00Z1apiVersion: scheduling.k8s.io/v1alpha2
2kind: PodGroup
3metadata:
4 name: topology-aware-workers-pg
5spec:
6 schedulingPolicy:
7 gang:
8 minCount: 4
9 schedulingConstraints:
10 topology:
11 - key: topology.kubernetes.io/rackWhen NOT to use this
The official docs don't call out specific anti-patterns here. Use your judgment based on your scale and requirements.
Want the complete reference?
Read official docsUnified observability — logs, uptime monitoring, and on-call in one place. Used by 50,000+ engineering teams to ship faster and sleep better.
Try Better Stack free →Unlocking Kubernetes v1.36: PSI Metrics for Proactive Resource Management
Kubernetes v1.36 introduces Pressure Stall Information (PSI) metrics, a game changer for monitoring resource saturation. With cumulative totals and moving averages, you can now detect issues before they escalate into outages.
Unlocking Kubernetes v1.36: Dynamic Resource Allocation and Its Game-Changing Features
Kubernetes v1.36 introduces Dynamic Resource Allocation (DRA), revolutionizing how you manage hardware accelerators. With features like prioritized lists and device taints, you can optimize resource utilization and improve system reliability.
Unlocking Performance with Kubernetes Pod-Level Resource Managers
Kubernetes v1.36 introduces Pod-Level Resource Managers, a game changer for performance-sensitive workloads. This feature allows for hybrid resource allocation models, enhancing efficiency without compromising NUMA alignment.
Get the daily digest
One email. 5 articles. Every morning.
No spam. Unsubscribe anytime.