sumanthst24/EKS-Chaos-Engineering-MCP-Server
3.1
If you are the rightful owner of EKS-Chaos-Engineering-MCP-Server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
A Model Context Protocol (MCP) server designed for chaos engineering testing on Amazon EKS clusters.
Tools
4
Resources
0
Prompts
0
EKS Chaos Engineering MCP Server
A Model Context Protocol (MCP) server for chaos engineering testing on Amazon EKS clusters.
Features
- Pod Failure Simulation: Target specific pods or UI pods for resilience testing
- Node Failure Simulation: Stop EC2 instances to test node-level failures
- AZ Failure Simulation: Simulate availability zone failures
- Universal Compatibility: Works with any Kubernetes application deployed on EKS
Installation
From PyPI (recommended)
pip install eks-chaos-mcp
From Source
git clone https://github.com/sumanthst24/EKS-Chaos-Engineering-MCP-Server.git
cd EKS-Chaos-Engineering-MCP-Server
pip install -e .
Prerequisites
- Python 3.8+
- kubectl configured for your EKS cluster
- AWS CLI configured with appropriate permissions
- EKS cluster with any Kubernetes application deployed
Configuration
- Copy the example configuration:
cp mcp.json.example mcp.json
- Update
mcp.jsonwith your environment:
{
"mcpServers": {
"eks_chaos_mcp": {
"command": "eks-chaos-mcp",
"env": {
"AWS_REGION": "your-aws-region",
"AWS_PROFILE": "your-aws-profile"
}
}
}
}
Available Tools
list_pods_for_failure: List all pods across all namespacessimulate_pod_failure: Delete specific pods to test resiliencesimulate_node_failure: Stop EC2 instances with running podssimulate_az_failure: Simulate availability zone failures
Usage
With MCP Client
Start the MCP server and use with compatible MCP clients like Amazon Q Developer.
Direct Execution
# Run the server directly
eks-chaos-mcp
# Or using Python module
python -m eks_chaos_mcp
Development
# Clone the repository
git clone https://github.com/sumanthst24/EKS-Chaos-Engineering-MCP-Server.git
cd EKS-Chaos-Engineering-MCP-Server
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black src/
isort src/
Security Notes
- Never commit
mcp.jsonwith real credentials - Use appropriate AWS IAM permissions
- Test in non-production environments first
- Sanitize logs before sharing
Testing Results
Here are example results from chaos engineering tests:
Available MCP Tools
eks_chaos_mcp (MCP):
- list_pods_for_failure * List all pods across namespaces
- simulate_pod_failure * Delete specific pods to test resilience
- simulate_node_failure * Stop EC2 instances with running pods
- simulate_az_failure * Simulate availability zone failures
Pod Listing
> List the pods for failure
✅ Successfully listed pods across all namespaces:
### Default Namespace
• nginx-deployment-xxxxxxxxx-xxxxx (Running on ip-xx-xx-xxx-xxx)
• redis-xxxxxxxxx-xxxxx (Running on ip-xx-xx-xxx-xxx)
### Kube-system Namespace
• coredns-xxxxxxxxx-xxxxx (Running on ip-xx-xx-xxx-xxx)
• aws-node-xxxxxxxxx (Running on ip-xx-xx-xxx-xxx)
### [Additional namespaces and pods...]
Pod Failure Simulation
> Simulate pod failure for nginx pod in default namespace
✅ Pod failure simulated: nginx-deployment-xxxxxxxxx-xxxxx in namespace default was terminated.
Kubernetes automatically recreated the pod to maintain desired replicas.
Service continued functioning with remaining healthy pods during recreation.
Node Failure Simulation
> Simulate node failure
✅ Node failure simulated: Stopping instance i-xxxxxxxxxxxxxxxxx (Node: ip-xx-xx-xxx-xxx.region.compute.internal).
Successfully stopped EC2 instance corresponding to Kubernetes node. All pods on the node were affected:
• Application pods: Rescheduled to healthy nodes
• System pods: Automatically recreated on available nodes
• Persistent volumes: Remounted on new pod locations
Kubernetes control plane successfully handled node failure and maintained application availability.
AZ Failure Simulation
> Simulate AZ level failure
✅ AZ failure simulated: Stopped X instances across multiple availability zones.
Multi-AZ failure test completed:
- Affected nodes in zone-a and zone-b
- Forced pod rescheduling to remaining healthy zones
- Tested cluster capacity and autoscaling capabilities
- Verified application resilience across availability zones
Application demonstrated high availability and proper AZ failure resilience.
Project Structure
eks-chaos-mcp/
├── src/eks_chaos_mcp/ # Main package
│ ├── __init__.py # Package initialization
│ ├── __main__.py # Python -m execution
│ ├── server.py # Main MCP server
│ ├── pod_failure.py # Pod chaos functions
│ ├── node_failure.py # Node chaos functions
│ ├── az_failure.py # AZ chaos functions
│ └── get_nodes.py # Utility functions
├── pyproject.toml # Modern Python packaging
├── README.md # Documentation
├── LICENSE # MIT license
├── .gitignore # Git ignore rules
└── mcp.json.example # Configuration template
License
MIT License - see file for details.