suryansh639/mcp-aws-devops-server
If you are the rightful owner of mcp-aws-devops-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The MCP DevOps Platform is a production-grade, multi-cloud operations server designed to enable AI agents to execute DevOps and CloudOps tasks securely across various cloud environments.
MCP DevOps Platform - Production-Grade Multi-Cloud Operations
A complete, enterprise-ready Model Context Protocol (MCP) server that enables AI agents to safely execute DevOps and CloudOps operations across AWS, Azure, GCP, and Kubernetes environments without exposing credentials.
🎯 Overview
This platform allows AI agents (ChatGPT, Claude, Qwen, custom LLMs) to perform infrastructure operations inside your VPC using a zero-trust security model with IAM role-based authentication.
Key Features
- Multi-Cloud Support: AWS, Azure, GCP, and Kubernetes
- Zero-Trust Security: No credentials stored, IAM role-based access
- AI-Powered Intelligence: Automated troubleshooting, cost optimization, health checks
- Production-Ready: Multiple deployment options (EC2, ECS Fargate, EKS)
- Audit Logging: Complete JSON audit trail
- API Gateway Integration: Secure external access with API key authentication
🏗️ Architecture
┌─────────────────────────────────────────────────────────────┐
│ AI Agents Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ ChatGPT │ │ Claude │ │ Qwen │ │ Custom │ │
│ │ MCP │ │ MCP │ │ Agent │ │ Agent │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└───────┼─────────────┼─────────────┼─────────────┼──────────┘
│ │ │ │
└─────────────┴─────────────┴─────────────┘
│
┌─────────────────▼─────────────────┐
│ API Gateway (Optional) │
│ - API Key Authentication │
│ - Rate Limiting │
│ - Custom Domain │
└─────────────────┬─────────────────┘
│
┌─────────────────▼─────────────────┐
│ VPC Link / NLB │
└─────────────────┬─────────────────┘
│
┌─────────────────▼─────────────────┐
│ MCP DevOps Server │
│ ┌───────────────────────────┐ │
│ │ Security Layer │ │
│ │ - JWT/API Key Auth │ │
│ │ - Rate Limiting │ │
│ │ - Audit Logging │ │
│ └───────────────────────────┘ │
│ ┌───────────────────────────┐ │
│ │ Tool Router │ │
│ └───────────────────────────┘ │
│ ┌───────────────────────────┐ │
│ │ Intelligence Layer │ │
│ │ - AI Troubleshooting │ │
│ │ - Cost Optimization │ │
│ │ - Health Checks │ │
│ └───────────────────────────┘ │
└─────────────────┬─────────────────┘
│
┌─────────────────▼─────────────────┐
│ Cloud Provider APIs │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │ AWS │ │Azure │ │ GCP │ K8s │
│ └──────┘ └──────┘ └──────┘ │
└───────────────────────────────────┘
🔐 Zero-Trust Security Model
How It Works
- No Stored Credentials: The MCP server uses IAM roles (AWS), Managed Identities (Azure), or Workload Identity (GCP)
- IAM Role Assumption: The server assumes roles with least-privilege permissions
- Request Authentication: All requests authenticated via API keys or JWT tokens
- Audit Trail: Every action logged with timestamp, user, and parameters
- Resource Tagging: Operations restricted to resources with specific tags (e.g.,
ManagedBy: MCP)
IAM Permission Model
The platform uses three IAM role configurations:
- EC2 Role: For server running on EC2 instances
- ECS Task Role: For Fargate deployments
- EKS Service Account Role: For Kubernetes deployments with IRSA
Each role follows least-privilege principles:
- Read operations allowed on all resources
- Write operations restricted by resource tags
- Destructive operations (terminate) limited to dev environments
- Regional restrictions applied where appropriate
📦 Available Tools
AWS EC2
list_instances- List all EC2 instances in a regiondescribe_instance- Get detailed instance informationstart_instance- Start a stopped instancestop_instance- Stop a running instancereboot_instance- Reboot an instanceterminate_instance- Terminate an instance (tag-restricted)
AWS CloudWatch
get_log_groups- List CloudWatch log groupsget_log_streams- List log streams in a groupfetch_logs- Retrieve log eventsmetric_query- Query CloudWatch metrics
AWS S3
list_buckets- List all S3 bucketslist_objects- List objects in a bucketget_object- Download an objectput_object- Upload an object
AWS Lambda
list_functions- List Lambda functionsinvoke_function- Invoke a functionupdate_function_code- Update function code
AWS EKS
list_pods- List pods in a clusterdescribe_pod- Get pod detailsget_pod_logs- Retrieve pod logsrestart_deployment- Restart a deployment
Azure
azure_list_vms- List virtual machinesazure_start_vm- Start a VMazure_stop_vm- Stop a VMazure_list_containers- List storage containers
GCP
gcp_list_instances- List Compute Engine instancesgcp_start_instance- Start an instancegcp_stop_instance- Stop an instancegcp_list_buckets- List Cloud Storage buckets
Kubernetes
k8s_list_pods- List pods in a namespacek8s_delete_pod- Delete a podk8s_restart_deployment- Restart a deployment
Intelligence Tools
ai_troubleshoot- AI-powered troubleshooting with log analysis and suggestionscost_optimization- Find cost savings opportunities (idle resources, untagged volumes)health_check- Comprehensive health check across resources
🚀 Installation
Prerequisites
- Python 3.11+
- AWS CLI configured (for AWS operations)
- kubectl configured (for Kubernetes operations)
- Docker (for containerized deployments)
- Terraform (for IaC deployments)
Option 1: Local Development
# Clone the repository
cd mcp-devops
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export MCP_SECRET_KEY="your-secret-key"
export MCP_API_KEYS="key1,key2,key3"
# Run the server
python -m mcp_server.server
Option 2: EC2 Deployment
# Copy files to EC2 instance
scp -r mcp-devops ec2-user@your-instance:/opt/
# SSH to instance
ssh ec2-user@your-instance
# Run installation script
cd /opt/mcp-devops/deployment/ec2
chmod +x install.sh
sudo ./install.sh
# Check status
sudo systemctl status mcp-devops
Option 3: ECS Fargate Deployment
# Build and push Docker image
cd mcp-devops
docker build -t mcp-devops:latest -f docker/Dockerfile .
# Tag and push to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
docker tag mcp-devops:latest ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/mcp-devops:latest
docker push ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/mcp-devops:latest
# Deploy with Terraform
cd terraform
terraform init
terraform plan -var="vpc_id=vpc-xxx" -var="private_subnet_ids=[\"subnet-xxx\",\"subnet-yyy\"]"
terraform apply
Option 4: EKS Deployment
# Create namespace and secrets
kubectl create namespace mcp-devops
kubectl create secret generic mcp-secrets \
--from-literal=secret-key=your-secret-key \
-n mcp-devops
# Update deployment.yaml with your account ID and region
sed -i 's/ACCOUNT_ID/123456789012/g' deployment/eks/deployment.yaml
sed -i 's/REGION/us-east-1/g' deployment/eks/deployment.yaml
# Apply Kubernetes manifests
kubectl apply -f deployment/eks/rbac.yaml
kubectl apply -f deployment/eks/deployment.yaml
# Check deployment
kubectl get pods -n mcp-devops
kubectl logs -f deployment/mcp-devops -n mcp-devops
Option 5: Helm Chart
# Update values.yaml with your configuration
cd helm
# Install the chart
helm install mcp-devops . \
--namespace mcp-devops \
--create-namespace \
--set image.repository=ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com/mcp-devops \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::ACCOUNT_ID:role/mcp-devops-eks-role
# Check status
helm status mcp-devops -n mcp-devops
🔧 Configuration
Environment Variables
MCP_SECRET_KEY- Secret key for JWT token generationMCP_API_KEYS- Comma-separated list of valid API keysAWS_REGION- Default AWS region (optional)AZURE_SUBSCRIPTION_ID- Azure subscription ID (for Azure operations)GOOGLE_APPLICATION_CREDENTIALS- Path to GCP service account key (for GCP operations)
IAM Role Setup
For EC2 Deployment
# Create IAM role
aws iam create-role --role-name mcp-devops-ec2-role \
--assume-role-policy-document file://config/ec2-trust-policy.json
# Attach policy
aws iam put-role-policy --role-name mcp-devops-ec2-role \
--policy-name mcp-permissions \
--policy-document file://config/iam-ec2-role.json
# Attach role to instance
aws ec2 associate-iam-instance-profile \
--instance-id i-xxxxx \
--iam-instance-profile Name=mcp-devops-ec2-role
For ECS Deployment
# Create task role
aws iam create-role --role-name mcp-devops-ecs-task-role \
--assume-role-policy-document file://config/ecs-trust-policy.json
# Attach policy
aws iam put-role-policy --role-name mcp-devops-ecs-task-role \
--policy-name mcp-permissions \
--policy-document file://config/iam-ecs-task-role.json
For EKS Deployment (IRSA)
# Create OIDC provider for your cluster
eksctl utils associate-iam-oidc-provider --cluster=your-cluster --approve
# Create IAM role with trust policy for service account
eksctl create iamserviceaccount \
--name mcp-devops-sa \
--namespace mcp-devops \
--cluster your-cluster \
--attach-policy-arn arn:aws:iam::ACCOUNT_ID:policy/mcp-devops-policy \
--approve
💻 Client Configuration
ChatGPT MCP
Add to your ChatGPT MCP configuration:
{
"mcpServers": {
"aws-devops": {
"url": "https://mcp.company.com",
"apiKey": "YOUR_API_KEY_HERE"
}
}
}
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"aws-devops": {
"command": "python",
"args": ["-m", "mcp_server.server"],
"env": {
"MCP_SECRET_KEY": "your-secret-key",
"MCP_API_KEYS": "key1,key2"
}
}
}
}
Qwen Agent
agent:
name: aws-devops-agent
mcp_servers:
- name: aws-devops
endpoint: https://mcp.company.com
auth:
type: api_key
key: YOUR_API_KEY_HERE
📊 Usage Examples
Example 1: List and Start EC2 Instances
Agent: "List all EC2 instances in us-east-1"
MCP: [Returns list of instances with IDs and states]
Agent: "Start instance i-1234567890abcdef0"
MCP: {"status": "started", "instance_id": "i-1234567890abcdef0"}
Example 2: Troubleshoot Lambda Function
Agent: "Troubleshoot Lambda function my-api-function in us-east-1"
MCP: {
"function_name": "my-api-function",
"recent_errors": [
"ERROR: Timeout after 30 seconds",
"Exception: Unable to connect to database"
],
"suggestions": [
"Check function timeout setting",
"Verify database security group allows Lambda access",
"Review VPC configuration"
],
"root_cause": "Network connectivity issue to RDS database"
}
Example 3: Cost Optimization
Agent: "Find cost optimization opportunities in us-west-2"
MCP: {
"region": "us-west-2",
"findings": [
{
"type": "idle_ec2",
"resource_id": "i-abcdef123456",
"avg_cpu": "2.3%",
"recommendation": "Consider stopping or terminating this instance"
},
{
"type": "unattached_ebs",
"resource_id": "vol-xyz789",
"size": 100,
"recommendation": "Delete if not needed to save $10/month"
}
],
"total_opportunities": 2
}
Example 4: Kubernetes Operations
Agent: "List pods in production namespace"
MCP: {"pods": [{"name": "api-7d8f9c-abc", "status": "Running"}, ...]}
Agent: "Get logs for pod api-7d8f9c-abc"
MCP: [Returns last 100 lines of logs]
Agent: "Restart deployment api-deployment"
MCP: {"status": "restarted", "deployment": "api-deployment"}
🛡️ Security Best Practices
- Use Resource Tags: Tag resources with
ManagedBy: MCPto restrict operations - Rotate API Keys: Regularly rotate API keys and JWT secrets
- Monitor Audit Logs: Review
/var/log/mcp-audit.logregularly - Least Privilege: Use the provided IAM policies as a starting point, further restrict as needed
- Network Isolation: Deploy in private subnets, use VPC endpoints
- Enable CloudTrail: Track all AWS API calls made by the MCP server
- Rate Limiting: Configure rate limits in API Gateway
- Approval Workflows: Implement manual approval for destructive operations
🔍 Troubleshooting
Server Won't Start
# Check logs
journalctl -u mcp-devops -f # For systemd
kubectl logs -f deployment/mcp-devops -n mcp-devops # For Kubernetes
# Verify IAM permissions
aws sts get-caller-identity
# Test connectivity
curl http://localhost:8080/health
Permission Denied Errors
# Verify IAM role is attached
aws ec2 describe-instances --instance-ids i-xxxxx --query 'Reservations[0].Instances[0].IamInstanceProfile'
# Check IAM policy
aws iam get-role-policy --role-name mcp-devops-ec2-role --policy-name mcp-permissions
# Verify resource tags
aws ec2 describe-instances --instance-ids i-xxxxx --query 'Reservations[0].Instances[0].Tags'
Connection Timeouts
# Check security groups
aws ec2 describe-security-groups --group-ids sg-xxxxx
# Verify VPC endpoints (if using)
aws ec2 describe-vpc-endpoints
# Test network connectivity
telnet your-nlb-dns-name 8080
🔄 Adding New Tools
To add a new tool:
- Create the tool implementation in
mcp_server/tools/ - Add the tool definition to
server.pyTOOLS list - Add the handler in
call_tool()function - Update IAM policies if new permissions needed
- Update this README with tool documentation
Example:
# In mcp_server/tools/aws/rds.py
async def list_databases(region: str):
rds = boto3.client('rds', region_name=region)
response = rds.describe_db_instances()
return {'databases': [db['DBInstanceIdentifier'] for db in response['DBInstances']]}
# In server.py
Tool(name="list_databases", description="List RDS databases",
inputSchema={"type": "object", "properties": {"region": {"type": "string"}}, "required": ["region"]})
📈 Monitoring and Observability
CloudWatch Metrics
The server automatically logs to CloudWatch (when deployed on AWS):
- Request count
- Error rate
- Latency
- Tool invocation counts
Audit Logs
All operations are logged in JSON format to /var/log/mcp-audit.log:
{
"timestamp": "2025-12-03T14:30:00Z",
"user_id": "system",
"action": "start_instance",
"arguments": {"instance_id": "i-xxxxx", "region": "us-east-1"},
"status": "initiated"
}
Health Check Endpoint
curl http://localhost:8080/health
# Returns: {"status": "healthy", "version": "1.0.0"}
🔖 Versioning
Current version: 1.0.0
Version format: MAJOR.MINOR.PATCH
- MAJOR: Breaking changes
- MINOR: New features, backward compatible
- PATCH: Bug fixes
📄 License
This is a production-grade enterprise platform. Customize licensing as needed for your organization.
🤝 Support
For issues, questions, or contributions:
- Check the troubleshooting section
- Review audit logs for error details
- Verify IAM permissions
- Check CloudWatch logs
🎯 Roadmap
- Slack approval workflow integration
- Multi-region failover
- GraphQL API support
- Terraform state management tools
- Cost forecasting and budgets
- Automated remediation workflows
- Integration with ServiceNow/Jira
- Advanced RBAC with team-based permissions
Built with ❤️ for Enterprise DevOps Teams