Kubernetes

AI & GPU Workloads

11 articles from official documentation

Practitioner11 articles

kubernetesai workloadsPractitioner

Unifying AI Workloads: KubeCon, OpenInfra, and PyTorch Conference in China

Discover how the convergence of KubeCon, OpenInfra Summit, and PyTorch Conference in China is set to revolutionize AI workloads. By integrating Kubernetes orchestration with OpenInfra's infrastructure and PyTorch's AI frameworks, organizations can achieve scalable and reliable AI solutions.

→Leverage the integration of OpenInfra for optimized infrastructure.
→Utilize Kubernetes for effective orchestration of AI workloads.

5 min read·CNCF Blog

Read article

kubernetesai workloadsPractitioner

Mastering Geo-Distributed AI Operations with k0smos

Unlock the potential of geo-distributed AI infrastructure with the k0smos stack. This powerful setup leverages k0s and k0smotron to deploy isolated control planes, streamlining operations across multiple clusters.

→Leverage k0s for a lightweight, zero-dependency Kubernetes distribution.
→Utilize k0smotron to deploy isolated, versioned control planes efficiently.

5 min read·CNCF Blog

Read article

kubernetesai workloadsPractitioner

Engineering AI at Scale: Kubernetes for the Next Generation

AI workloads are fundamentally different from traditional microservices, and Kubernetes is evolving to meet these challenges. Discover how the Kubernetes AI Conformance program and Dynamic Resource Allocation can help you scale AI applications effectively.

→Utilize the Kubernetes AI Conformance program to ensure interoperability across environments.
→Implement Dynamic Resource Allocation to efficiently manage specialized hardware for AI workloads.

5 min read·CNCF Blog

Read article

kubernetesai workloadsPractitioner

Achieving 30-Second LLM Cold Starts on Kubernetes with Fluid

Cold starts can cripple application performance, especially for large language models (LLMs). Discover how NetEase Games leveraged Fluid to automate runtime deployment and optimize cache management, achieving impressive 30-second cold starts on Kubernetes.

→Leverage Fluid for automated runtime deployment and lifecycle management.
→Utilize HPA and KEDA for cache elasticity to optimize resource scaling.

3 min read·CNCF Blog

Read article

kubernetesai workloadsPractitioner

Streamline AI Workloads with Kubernetes Dynamic Resource Allocation on AWS

Simplifying AI infrastructure is crucial for efficiency and performance. With Kubernetes Dynamic Resource Allocation (DRA), you can manage AWS Trainium and Elastic Fabric Adapter devices seamlessly. This article dives into how DRA transforms resource management in Kubernetes.

→Utilize ResourceClaimTemplates to define policies for workload patterns.
→Leverage ResourceSlices to advertise available EFA and Neuron devices to the scheduler.

5 min read·AWS Containers Blog

Read article

kubernetesai workloadsPractitioner

How KubeStellar Achieved 81% PR Acceptance with AI Agents

KubeStellar is revolutionizing how we approach pull requests by integrating AI coding agents into the workflow. By externalizing preferences in CLAUDE.md and measuring acceptance rates with auto-qa-tuning.json, they’ve reached an impressive 81% PR acceptance rate. Dive in to discover how this model can transform your Kubernetes projects.

→Utilize CLAUDE.md to externalize pull request conventions.
→Log PR acceptance rates with auto-qa-tuning.json to measure performance.

5 min read·CNCF Blog

Read article

kubernetesai workloadsPractitioner

Cloud Custodian: Governance for the AI Era

As AI agents increasingly manage cloud infrastructure, effective governance becomes critical. Cloud Custodian offers automated guardrails that enforce best practices in real-time, ensuring your resources remain efficient and secure.

→Implement automated guardrails to manage AI-generated resources effectively.
→Utilize declarative policies to describe and enforce desired states of cloud resources.

5 min read·CNCF Blog

Read article

kubernetesai workloadsPractitioner

Benchmarking AI Retrieval Strategies for Kubernetes Bug Fixes

In the vast landscape of Kubernetes, fixing bugs can be a daunting task. This article explores how different AI agent retrieval strategies—RAG, Hybrid, and Local Only—impact the effectiveness of bug fixes in a multi-million-line codebase.

→Understand the differences between RAG, Hybrid, and Local Only strategies for bug fixes.
→Leverage RAG's hybrid retrieval for keyword matching and semantic search to enhance fix accuracy.

5 min read·CNCF Blog

Read article

kubernetesai workloadsPractitioner

Accelerate AI Model Distribution with Dragonfly's P2P Magic

Tired of slow model downloads? Dragonfly’s peer-to-peer acceleration can reduce your origin traffic by 99.5%. Discover how it splits files and shares them across nodes for lightning-fast distribution.

→Leverage P2P to reduce model download times dramatically.
→Configure repository types to optimize your downloads.

4 min read·CNCF Blog

Read article

kubernetesai workloadsPractitioner

Deploying Generative AI at the Edge with EKS Hybrid Nodes and NVIDIA DGX

Unlock the power of generative AI at the edge with Amazon EKS Hybrid Nodes and NVIDIA DGX. This setup allows you to connect on-premises infrastructure directly to the EKS control plane, ensuring low-latency AI services. Learn how to configure your environment for optimal performance.

→Connect on-premises infrastructure to Amazon EKS using EKS Hybrid Nodes for low-latency AI services.
→Deploy NVIDIA DGX Spark as a hybrid node to optimize edge AI deployment.

5 min read·AWS Containers Blog

Read article

kubernetesai workloadsPractitioner

Unlocking AI Workloads: The AI Gateway Working Group Explained

The AI Gateway Working Group is set to revolutionize how we handle AI workloads in Kubernetes. With proposals like payload processing and egress gateways, it addresses critical needs for inspecting and transforming HTTP payloads. Dive in to understand its impact on your infrastructure.

→Understand the AI Gateway as a specialized infrastructure for AI workloads.
→Leverage the payload processing proposal to inspect and transform HTTP payloads.

4 min read·Kubernetes Blog

Read article

Better StackSponsor

Unified observability — logs, uptime monitoring, and on-call in one place. Used by 50,000+ engineering teams to ship faster and sleep better.

Try Better Stack free →

Get the daily digest

One email. 5 articles. Every morning.

No spam. Unsubscribe anytime.

Back to Kubernetes