Automate EKS AMI Updates with AI and GitOps
In today's fast-paced cloud environments, keeping your Amazon EKS clusters secure and up-to-date is critical. Manual updates can lead to downtime and security vulnerabilities. By automating the AMI update lifecycle with an AI-powered, event-driven approach, you can reduce risk and improve efficiency. This solution integrates Amazon Bedrock for risk analysis and GitOps practices to manage infrastructure seamlessly.
The process unfolds in three phases: Detection, AI Analysis & GitHub PR, and GitOps Deployment. In the Detection Phase, tools like Amazon EventBridge and AWS Lambda identify new AMI releases. Next, the AI Analysis Phase employs Amazon Bedrock to assess risks and generate Pull Requests for human review. Finally, the GitOps Deployment Phase utilizes ArgoCD and Karpenter to orchestrate zero-downtime rolling updates, ensuring your applications remain available during the upgrade process. Key configuration parameters include NotificationEmail for alerts, GitHubAppId for authentication, and EKSVersion to specify your Amazon EKS version.
In production, ensure you have an existing Amazon EKS cluster (version 1.34 or later) and that Karpenter is properly installed. Follow the README instructions in your Git repository to set up your environment correctly. A common pitfall is neglecting to deploy the EC2NodeClass configuration in your GitHub repository before running this solution. This setup is crucial for successful automation and deployment.
Key takeaways
- →Automate AMI updates using a three-phase approach: Detection, AI Analysis, and GitOps Deployment.
- →Utilize Amazon Bedrock for AI-powered risk analysis during the update process.
- →Configure essential parameters like GitHubAppId and NotificationEmail for seamless integration.
- →Ensure Karpenter is installed and EC2NodeClass is deployed in your GitHub repository.
- →Leverage ArgoCD for orchestrating zero-downtime rolling updates in your EKS cluster.
Why it matters
Automating AMI updates minimizes downtime and enhances security, allowing teams to focus on development rather than maintenance. This approach can significantly reduce the risk of vulnerabilities in production environments.
Code examples
1aws cloudformation create-stack \
2--stack-name eks-ami-update \
3--template-body file://cloudformation-template.yaml \
4--capabilities CAPABILITY_NAMED_IAM \
5--parameters \
6ParameterKey=NotificationEmail,ParameterValue=your-email@example.com \
7ParameterKey=GitHubAppId,ParameterValue=<your-github-appid> \
8ParameterKey=GitHubAppInstallationId,ParameterValue=<your-githubapp-installationid> \
9ParameterKey=GitHubAppPrivateKey,ParameterValue=$(base64 -i your-app.private-key.pem | tr -d '\n') \
10ParameterKey=GitHubRepoOwner,ParameterValue=<your-github-org> \
11ParameterKey=GitHubRepoName,ParameterValue=<your-repo-name> \
12ParameterKey=GitHubFilePath,ParameterValue=karpenter-configs/clusters/your-cluster/nodeclass.yaml \
13ParameterKey=GitHubBranch,ParameterValue=main \
14ParameterKey=EKSVersion,ParameterValue=1.34aws lambda invoke \
--function-name eks-ami-detector \
--payload '{}' \
--cli-binary-format raw-in-base64-out \
/tmp/response.json && cat /tmp/response.jsongit clone https://github.com/<your-github-username>/<repository-name>.git
cd <repository-name>When NOT to use this
The official docs don't call out specific anti-patterns here. Use your judgment based on your scale and requirements.
Want the complete reference?
Read official docsUnified observability — logs, uptime monitoring, and on-call in one place. Used by 50,000+ engineering teams to ship faster and sleep better.
Try Better Stack free →Preparing for Bitnami Image Removal from ECR Public
Bitnami images will vanish from Amazon ECR Public after June 10th, 2026, leaving many Kubernetes deployments at risk. You need to update your image URIs to avoid service disruptions. This article dives into the steps you must take to ensure a smooth transition.
Kubernetes v1.36: Mixed Version Proxy Moves to Beta
Kubernetes v1.36 brings the Mixed Version Proxy (MVP) to beta, enhancing cluster upgrade safety. This feature ensures requests for resources not recognized by an older API server are properly routed to a newer one, preventing frustrating 404 errors.
Reclaiming Engineering Time: Streamlining Kubernetes Upgrades
Kubernetes upgrades can drain your engineering resources, often consuming weeks of effort. With critical vulnerabilities emerging mid-upgrade, it’s crucial to streamline your upgrade process to minimize downtime and maximize productivity.
Get the daily digest
One email. 5 articles. Every morning.
No spam. Unsubscribe anytime.