Building a Cluster-Aware AI Agent with Kubernetes and GitOps
In today's fast-paced tech landscape, deploying AI agents that can operate within a Kubernetes cluster is essential. These agents can enhance application capabilities by leveraging Large Language Models (LLMs) locally, reducing latency and dependency on external services. By integrating GitOps practices with tools like Argo CD, you can ensure that your AI deployments are not only efficient but also maintainable and scalable.
The architecture consists of a CI/CD chain and a Kubernetes runtime. On the runtime side, an Ollama pod serves a local Mistral 7B model, exposing a REST API on port 11434. A FastAPI pod provides the agent's HTTP API and chat UI on port 8000, while a PersistentVolumeClaim holds the model weights. A dedicated ServiceAccount in the FastAPI pod has a ClusterRole that allows only read operations. When you push changes to the application source in Git, GitHub Actions builds a multi-architecture image tagged with the 7-character commit SHA. Argo CD Image Updater checks Docker Hub every two minutes for new tags, commits the new tag back into the repository's kustomization.yaml, and Argo CD reconciles the cluster to deploy the updates.
In production, be mindful of the configuration parameters, such as allowTags, which uses a regex for allowed tags, and updateStrategy, which defaults to newest-build. These settings help control how your images are updated and ensure that only valid tags are deployed. However, be cautious about giving your AI agent write access to the cluster. An agent that can delete pods based on its own reasoning is a production incident waiting to happen, as hallucinations combined with write access can lead to disastrous outcomes.
Key takeaways
- →Utilize Ollama to serve local LLMs and reduce latency.
- →Implement Argo CD Image Updater for automated deployment of new model versions.
- →Configure `allowTags` regex to control which image tags are permitted.
- →Restrict the ServiceAccount in the FastAPI pod to read-only operations.
- →Avoid giving AI agents write access to prevent accidental deletions.
Why it matters
Deploying a cluster-aware AI agent can significantly enhance your application's responsiveness and capabilities, allowing for real-time data processing and interaction without relying on external cloud services.
Code examples
1apiVersion: argocd-image-updater.argoproj.io/v1alpha1
2kind: ImageUpdater
3metadata:
4 name: local-k8s-ai-agent
5 namespace: argocd
6spec:
7 writeBackConfig:
8 method: git
9 gitConfig:
10 branch: main
11 writeBackTarget: "kustomization:."
12 applicationRefs:
13 - namePattern: "local-k8s-ai-agent"
14 images:
15 - alias: api
16 imageName: marytvk/local-k8s-ai-agent
17 commonUpdateSettings:
18 updateStrategy: newest-build
19 allowTags: "regexp:^[0-9a-f]{7}$"1apiVersion: rbac.authorization.k8s.io/v1
2kind: ClusterRole
3metadata:
4 name: ai-devops-api-reader
5rules:
6 - apiGroups: [""]
7 resources: ["pods", "pods/log", "events", "services", "configmaps", "namespaces"]
8 verbs: ["get", "list"]
9 - apiGroups: ["apps"]
10 resources: ["deployments", "replicasets", "statefulsets", "daemonsets"]
11 verbs: ["get", "list"]When NOT to use this
An agent that can delete pods based on its own reasoning is a production incident waiting to happen. Hallucinations multiplied by write access is a poor combination.
Want the complete reference?
Read official docsUnified observability — logs, uptime monitoring, and on-call in one place. Used by 50,000+ engineering teams to ship faster and sleep better.
Try Better Stack free →Unifying AI Workloads: KubeCon, OpenInfra, and PyTorch Conference in China
Discover how the convergence of KubeCon, OpenInfra Summit, and PyTorch Conference in China is set to revolutionize AI workloads. By integrating Kubernetes orchestration with OpenInfra's infrastructure and PyTorch's AI frameworks, organizations can achieve scalable and reliable AI solutions.
Mastering Geo-Distributed AI Operations with k0smos
Unlock the potential of geo-distributed AI infrastructure with the k0smos stack. This powerful setup leverages k0s and k0smotron to deploy isolated control planes, streamlining operations across multiple clusters.
Engineering AI at Scale: Kubernetes for the Next Generation
AI workloads are fundamentally different from traditional microservices, and Kubernetes is evolving to meet these challenges. Discover how the Kubernetes AI Conformance program and Dynamic Resource Allocation can help you scale AI applications effectively.
Get the daily digest
One email. 5 articles. Every morning.
No spam. Unsubscribe anytime.