awseksPractitioner

Diagnose EKS Node Issues with AWS DevOps Agent and Custom MCP

5 min read AWS DevOps BlogJun 11, 2026Reviewed for accuracy

Practitioner — Hands-on experience recommended

In the world of Kubernetes, diagnosing node issues can be a time-consuming and complex task. The AWS DevOps Agent addresses this by autonomously investigating production incidents, allowing you to focus on resolving issues rather than spending hours gathering logs. With the integration of the Model Context Protocol (MCP), this process becomes even more efficient, enabling seamless interaction with external tools for diagnostics.

Here's how it works: The AWS DevOps Agent initiates a call to a collect tool using the instance ID of the problematic node. The MCP server then triggers an SSM Automation execution on that node, running the AWS-managed AWSSupport-CollectEKSInstanceLogs runbook. This runbook collects more than 20 log sources, including kubelet, containerd, iptables, and ENI metadata, packages them into an archive, and uploads it to an Amazon S3 bucket with AWS KMS encryption. This automated process saves you from manual log collection and speeds up your troubleshooting efforts.

In production, it's crucial to ensure that your Amazon EKS cluster has the AWS Systems Manager Agent running on the worker nodes, which is included by default on Amazon EKS optimized AMIs. Additionally, you need Node.js v18 or later, AWS CLI v2, and AWS CDK v2 installed and bootstrapped in your target account and region. Be cautious with commands that disrupt DNS resolution for all pods on a node; these should only be executed in a non-production test environment to avoid service disruptions.

Key takeaways

→Use the AWS DevOps Agent to autonomously diagnose EKS node issues.
→Leverage the MCP to streamline interactions with external diagnostic tools.
→Collect over 20 log sources automatically with the AWSSupport-CollectEKSInstanceLogs runbook.
→Ensure SSM Agent is running on worker nodes for effective diagnostics.
→Avoid executing disruptive commands on production nodes.

Why it matters

This approach significantly reduces the time spent on diagnosing EKS node issues, allowing teams to resolve incidents faster and maintain higher availability of services. Efficient log collection is critical in minimizing downtime.

Code examples

Bash

git clone https://github.com/aws-samples/sample-eks-node-diagnostics-mcp.git
cd sample-eks-node-diagnostics-mcp
chmod +x deploy.sh
./deploy.sh

YAML

1apiVersion: apps/v1
2kind: Deployment
3metadata:
4  name: web-frontend
5  namespace: demo-app
6spec:
7  replicas: 3
8  selector:
9    matchLabels:
10      app: web-frontend
11  template:
12    metadata:
13      labels:
14        app: web-frontend
15    spec:
16      containers:
17      - name: nginx
18        image: nginx:latest
19        ports:
20        - containerPort: 80

Bash

# Block pod traffic to kube-dns ClusterIP — pods run but DNS fails
# Only affects FORWARD chain (pod traffic), not the node's own DNS
sudo iptables -I FORWARD -d 10.100.0.10/32 -p udp --dport 53 -j DROP
sudo iptables -I FORWARD -d 10.100.0.10/32 -p tcp --dport 53 -j DROP