kubernetesai workloadsPractitioner

Building a Cluster-Aware AI Agent with Kubernetes and GitOps

5 min read CNCF BlogJun 25, 2026Reviewed for accuracy

Practitioner — Hands-on experience recommended

In today's fast-paced tech landscape, deploying AI agents that can operate within a Kubernetes cluster is essential. These agents can enhance application capabilities by leveraging Large Language Models (LLMs) locally, reducing latency and dependency on external services. By integrating GitOps practices with tools like Argo CD, you can ensure that your AI deployments are not only efficient but also maintainable and scalable.

The architecture consists of a CI/CD chain and a Kubernetes runtime. On the runtime side, an Ollama pod serves a local Mistral 7B model, exposing a REST API on port 11434. A FastAPI pod provides the agent's HTTP API and chat UI on port 8000, while a PersistentVolumeClaim holds the model weights. A dedicated ServiceAccount in the FastAPI pod has a ClusterRole that allows only read operations. When you push changes to the application source in Git, GitHub Actions builds a multi-architecture image tagged with the 7-character commit SHA. Argo CD Image Updater checks Docker Hub every two minutes for new tags, commits the new tag back into the repository's kustomization.yaml, and Argo CD reconciles the cluster to deploy the updates.

In production, be mindful of the configuration parameters, such as allowTags, which uses a regex for allowed tags, and updateStrategy, which defaults to newest-build. These settings help control how your images are updated and ensure that only valid tags are deployed. However, be cautious about giving your AI agent write access to the cluster. An agent that can delete pods based on its own reasoning is a production incident waiting to happen, as hallucinations combined with write access can lead to disastrous outcomes.

Key takeaways

→Utilize Ollama to serve local LLMs and reduce latency.
→Implement Argo CD Image Updater for automated deployment of new model versions.
→Configure `allowTags` regex to control which image tags are permitted.
→Restrict the ServiceAccount in the FastAPI pod to read-only operations.
→Avoid giving AI agents write access to prevent accidental deletions.

Why it matters

Deploying a cluster-aware AI agent can significantly enhance your application's responsiveness and capabilities, allowing for real-time data processing and interaction without relying on external cloud services.

Code examples

YAML

1apiVersion: argocd-image-updater.argoproj.io/v1alpha1
2kind: ImageUpdater
3metadata:
4  name: local-k8s-ai-agent
5  namespace: argocd
6spec:
7  writeBackConfig:
8    method: git
9    gitConfig:
10      branch: main
11      writeBackTarget: "kustomization:."
12  applicationRefs:
13    - namePattern: "local-k8s-ai-agent"
14      images:
15        - alias: api
16          imageName: marytvk/local-k8s-ai-agent
17          commonUpdateSettings:
18            updateStrategy: newest-build
19            allowTags: "regexp:^[0-9a-f]{7}$"

YAML

1apiVersion: rbac.authorization.k8s.io/v1
2kind: ClusterRole
3metadata:
4  name: ai-devops-api-reader
5rules:
6  - apiGroups: [""]
7    resources: ["pods", "pods/log", "events", "services", "configmaps", "namespaces"]
8    verbs: ["get", "list"]
9  - apiGroups: ["apps"]
10    resources: ["deployments", "replicasets", "statefulsets", "daemonsets"]
11    verbs: ["get", "list"]

When NOT to use this

An agent that can delete pods based on its own reasoning is a production incident waiting to happen. Hallucinations multiplied by write access is a poor combination.

Want the complete reference?

Read official docs

Test what you just learned

Quiz questions written from this article

Take the quiz →

Better StackSponsor

Unified observability — logs, uptime monitoring, and on-call in one place. Used by 50,000+ engineering teams to ship faster and sleep better.

Try Better Stack free →

Building a Cluster-Aware AI Agent with Kubernetes and GitOps

Key takeaways

Why it matters

Code examples

When NOT to use this

More on this topic

Unifying AI Workloads: KubeCon, OpenInfra, and PyTorch Conference in China

Mastering Geo-Distributed AI Operations with k0smos

Engineering AI at Scale: Kubernetes for the Next Generation