Diagnose EKS Node Issues with AWS DevOps Agent and Custom MCP
In the world of Kubernetes, diagnosing node issues can be a time-consuming and complex task. The AWS DevOps Agent addresses this by autonomously investigating production incidents, allowing you to focus on resolving issues rather than spending hours gathering logs. With the integration of the Model Context Protocol (MCP), this process becomes even more efficient, enabling seamless interaction with external tools for diagnostics.
Here's how it works: The AWS DevOps Agent initiates a call to a collect tool using the instance ID of the problematic node. The MCP server then triggers an SSM Automation execution on that node, running the AWS-managed AWSSupport-CollectEKSInstanceLogs runbook. This runbook collects more than 20 log sources, including kubelet, containerd, iptables, and ENI metadata, packages them into an archive, and uploads it to an Amazon S3 bucket with AWS KMS encryption. This automated process saves you from manual log collection and speeds up your troubleshooting efforts.
In production, it's crucial to ensure that your Amazon EKS cluster has the AWS Systems Manager Agent running on the worker nodes, which is included by default on Amazon EKS optimized AMIs. Additionally, you need Node.js v18 or later, AWS CLI v2, and AWS CDK v2 installed and bootstrapped in your target account and region. Be cautious with commands that disrupt DNS resolution for all pods on a node; these should only be executed in a non-production test environment to avoid service disruptions.
Key takeaways
- →Use the AWS DevOps Agent to autonomously diagnose EKS node issues.
- →Leverage the MCP to streamline interactions with external diagnostic tools.
- →Collect over 20 log sources automatically with the AWSSupport-CollectEKSInstanceLogs runbook.
- →Ensure SSM Agent is running on worker nodes for effective diagnostics.
- →Avoid executing disruptive commands on production nodes.
Why it matters
This approach significantly reduces the time spent on diagnosing EKS node issues, allowing teams to resolve incidents faster and maintain higher availability of services. Efficient log collection is critical in minimizing downtime.
Code examples
git clone https://github.com/aws-samples/sample-eks-node-diagnostics-mcp.git
cd sample-eks-node-diagnostics-mcp
chmod +x deploy.sh
./deploy.sh1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: web-frontend
5 namespace: demo-app
6spec:
7 replicas: 3
8 selector:
9 matchLabels:
10 app: web-frontend
11 template:
12 metadata:
13 labels:
14 app: web-frontend
15 spec:
16 containers:
17 - name: nginx
18 image: nginx:latest
19 ports:
20 - containerPort: 80# Block pod traffic to kube-dns ClusterIP — pods run but DNS fails
# Only affects FORWARD chain (pod traffic), not the node's own DNS
sudo iptables -I FORWARD -d 10.100.0.10/32 -p udp --dport 53 -j DROP
sudo iptables -I FORWARD -d 10.100.0.10/32 -p tcp --dport 53 -j DROPWhen NOT to use this
The official docs don't call out specific anti-patterns here. Use your judgment based on your scale and requirements.
Want the complete reference?
Read official docsSimple, affordable cloud — VMs, Kubernetes, and managed databases in minutes. Trusted by 600,000+ developers. Spin up a Droplet in 60 seconds.
Try DigitalOcean →Granting IAM Users Access to Kubernetes: Mastering EKS Access Entries
Unlocking Kubernetes API access for IAM users is crucial for effective cloud-native operations. EKS access entries provide a streamlined way to associate IAM identities with Kubernetes permissions, allowing for seamless resource management. Dive in to learn how to set this up correctly and avoid common pitfalls.
Streamline Your Compute Management with AWS Fargate on EKS
AWS Fargate simplifies compute management for your Kubernetes workloads, eliminating the need for server provisioning. With Fargate profiles, you can control which Pods run on Fargate seamlessly.
Mastering IAM Roles for Service Accounts in EKS
Unlock the power of IAM roles for service accounts (IRSA) in your EKS clusters. This feature allows you to manage credentials securely, ensuring that only specific Pods access AWS resources. Dive into how it works and the critical considerations for production use.
Get the daily digest
One email. 5 articles. Every morning.
No spam. Unsubscribe anytime.