mcp-aws-devops-server

suryansh639/mcp-aws-devops-server

3.2

If you are the rightful owner of mcp-aws-devops-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The MCP DevOps Platform is a production-grade, multi-cloud operations server designed to enable AI agents to execute DevOps and CloudOps tasks securely across various cloud environments.

Tools
9
Resources
0
Prompts
0

MCP DevOps Platform - Production-Grade Multi-Cloud Operations

A complete, enterprise-ready Model Context Protocol (MCP) server that enables AI agents to safely execute DevOps and CloudOps operations across AWS, Azure, GCP, and Kubernetes environments without exposing credentials.

🎯 Overview

This platform allows AI agents (ChatGPT, Claude, Qwen, custom LLMs) to perform infrastructure operations inside your VPC using a zero-trust security model with IAM role-based authentication.

Key Features

  • Multi-Cloud Support: AWS, Azure, GCP, and Kubernetes
  • Zero-Trust Security: No credentials stored, IAM role-based access
  • AI-Powered Intelligence: Automated troubleshooting, cost optimization, health checks
  • Production-Ready: Multiple deployment options (EC2, ECS Fargate, EKS)
  • Audit Logging: Complete JSON audit trail
  • API Gateway Integration: Secure external access with API key authentication

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                      AI Agents Layer                         │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │ ChatGPT  │  │  Claude  │  │   Qwen   │  │  Custom  │   │
│  │   MCP    │  │   MCP    │  │  Agent   │  │   Agent  │   │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘   │
└───────┼─────────────┼─────────────┼─────────────┼──────────┘
        │             │             │             │
        └─────────────┴─────────────┴─────────────┘
                          │
        ┌─────────────────▼─────────────────┐
        │      API Gateway (Optional)        │
        │    - API Key Authentication        │
        │    - Rate Limiting                 │
        │    - Custom Domain                 │
        └─────────────────┬─────────────────┘
                          │
        ┌─────────────────▼─────────────────┐
        │         VPC Link / NLB             │
        └─────────────────┬─────────────────┘
                          │
        ┌─────────────────▼─────────────────┐
        │       MCP DevOps Server            │
        │   ┌───────────────────────────┐   │
        │   │  Security Layer           │   │
        │   │  - JWT/API Key Auth       │   │
        │   │  - Rate Limiting          │   │
        │   │  - Audit Logging          │   │
        │   └───────────────────────────┘   │
        │   ┌───────────────────────────┐   │
        │   │  Tool Router              │   │
        │   └───────────────────────────┘   │
        │   ┌───────────────────────────┐   │
        │   │  Intelligence Layer       │   │
        │   │  - AI Troubleshooting     │   │
        │   │  - Cost Optimization      │   │
        │   │  - Health Checks          │   │
        │   └───────────────────────────┘   │
        └─────────────────┬─────────────────┘
                          │
        ┌─────────────────▼─────────────────┐
        │      Cloud Provider APIs           │
        │  ┌──────┐ ┌──────┐ ┌──────┐      │
        │  │ AWS  │ │Azure │ │ GCP  │ K8s  │
        │  └──────┘ └──────┘ └──────┘      │
        └───────────────────────────────────┘

🔐 Zero-Trust Security Model

How It Works

  1. No Stored Credentials: The MCP server uses IAM roles (AWS), Managed Identities (Azure), or Workload Identity (GCP)
  2. IAM Role Assumption: The server assumes roles with least-privilege permissions
  3. Request Authentication: All requests authenticated via API keys or JWT tokens
  4. Audit Trail: Every action logged with timestamp, user, and parameters
  5. Resource Tagging: Operations restricted to resources with specific tags (e.g., ManagedBy: MCP)

IAM Permission Model

The platform uses three IAM role configurations:

  • EC2 Role: For server running on EC2 instances
  • ECS Task Role: For Fargate deployments
  • EKS Service Account Role: For Kubernetes deployments with IRSA

Each role follows least-privilege principles:

  • Read operations allowed on all resources
  • Write operations restricted by resource tags
  • Destructive operations (terminate) limited to dev environments
  • Regional restrictions applied where appropriate

📦 Available Tools

AWS EC2

  • list_instances - List all EC2 instances in a region
  • describe_instance - Get detailed instance information
  • start_instance - Start a stopped instance
  • stop_instance - Stop a running instance
  • reboot_instance - Reboot an instance
  • terminate_instance - Terminate an instance (tag-restricted)

AWS CloudWatch

  • get_log_groups - List CloudWatch log groups
  • get_log_streams - List log streams in a group
  • fetch_logs - Retrieve log events
  • metric_query - Query CloudWatch metrics

AWS S3

  • list_buckets - List all S3 buckets
  • list_objects - List objects in a bucket
  • get_object - Download an object
  • put_object - Upload an object

AWS Lambda

  • list_functions - List Lambda functions
  • invoke_function - Invoke a function
  • update_function_code - Update function code

AWS EKS

  • list_pods - List pods in a cluster
  • describe_pod - Get pod details
  • get_pod_logs - Retrieve pod logs
  • restart_deployment - Restart a deployment

Azure

  • azure_list_vms - List virtual machines
  • azure_start_vm - Start a VM
  • azure_stop_vm - Stop a VM
  • azure_list_containers - List storage containers

GCP

  • gcp_list_instances - List Compute Engine instances
  • gcp_start_instance - Start an instance
  • gcp_stop_instance - Stop an instance
  • gcp_list_buckets - List Cloud Storage buckets

Kubernetes

  • k8s_list_pods - List pods in a namespace
  • k8s_delete_pod - Delete a pod
  • k8s_restart_deployment - Restart a deployment

Intelligence Tools

  • ai_troubleshoot - AI-powered troubleshooting with log analysis and suggestions
  • cost_optimization - Find cost savings opportunities (idle resources, untagged volumes)
  • health_check - Comprehensive health check across resources

🚀 Installation

Prerequisites

  • Python 3.11+
  • AWS CLI configured (for AWS operations)
  • kubectl configured (for Kubernetes operations)
  • Docker (for containerized deployments)
  • Terraform (for IaC deployments)

Option 1: Local Development

# Clone the repository
cd mcp-devops

# Install dependencies
pip install -r requirements.txt

# Set environment variables
export MCP_SECRET_KEY="your-secret-key"
export MCP_API_KEYS="key1,key2,key3"

# Run the server
python -m mcp_server.server

Option 2: EC2 Deployment

# Copy files to EC2 instance
scp -r mcp-devops ec2-user@your-instance:/opt/

# SSH to instance
ssh ec2-user@your-instance

# Run installation script
cd /opt/mcp-devops/deployment/ec2
chmod +x install.sh
sudo ./install.sh

# Check status
sudo systemctl status mcp-devops

Option 3: ECS Fargate Deployment

# Build and push Docker image
cd mcp-devops
docker build -t mcp-devops:latest -f docker/Dockerfile .

# Tag and push to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
docker tag mcp-devops:latest ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/mcp-devops:latest
docker push ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/mcp-devops:latest

# Deploy with Terraform
cd terraform
terraform init
terraform plan -var="vpc_id=vpc-xxx" -var="private_subnet_ids=[\"subnet-xxx\",\"subnet-yyy\"]"
terraform apply

Option 4: EKS Deployment

# Create namespace and secrets
kubectl create namespace mcp-devops
kubectl create secret generic mcp-secrets \
  --from-literal=secret-key=your-secret-key \
  -n mcp-devops

# Update deployment.yaml with your account ID and region
sed -i 's/ACCOUNT_ID/123456789012/g' deployment/eks/deployment.yaml
sed -i 's/REGION/us-east-1/g' deployment/eks/deployment.yaml

# Apply Kubernetes manifests
kubectl apply -f deployment/eks/rbac.yaml
kubectl apply -f deployment/eks/deployment.yaml

# Check deployment
kubectl get pods -n mcp-devops
kubectl logs -f deployment/mcp-devops -n mcp-devops

Option 5: Helm Chart

# Update values.yaml with your configuration
cd helm

# Install the chart
helm install mcp-devops . \
  --namespace mcp-devops \
  --create-namespace \
  --set image.repository=ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com/mcp-devops \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::ACCOUNT_ID:role/mcp-devops-eks-role

# Check status
helm status mcp-devops -n mcp-devops

🔧 Configuration

Environment Variables

  • MCP_SECRET_KEY - Secret key for JWT token generation
  • MCP_API_KEYS - Comma-separated list of valid API keys
  • AWS_REGION - Default AWS region (optional)
  • AZURE_SUBSCRIPTION_ID - Azure subscription ID (for Azure operations)
  • GOOGLE_APPLICATION_CREDENTIALS - Path to GCP service account key (for GCP operations)

IAM Role Setup

For EC2 Deployment
# Create IAM role
aws iam create-role --role-name mcp-devops-ec2-role \
  --assume-role-policy-document file://config/ec2-trust-policy.json

# Attach policy
aws iam put-role-policy --role-name mcp-devops-ec2-role \
  --policy-name mcp-permissions \
  --policy-document file://config/iam-ec2-role.json

# Attach role to instance
aws ec2 associate-iam-instance-profile \
  --instance-id i-xxxxx \
  --iam-instance-profile Name=mcp-devops-ec2-role
For ECS Deployment
# Create task role
aws iam create-role --role-name mcp-devops-ecs-task-role \
  --assume-role-policy-document file://config/ecs-trust-policy.json

# Attach policy
aws iam put-role-policy --role-name mcp-devops-ecs-task-role \
  --policy-name mcp-permissions \
  --policy-document file://config/iam-ecs-task-role.json
For EKS Deployment (IRSA)
# Create OIDC provider for your cluster
eksctl utils associate-iam-oidc-provider --cluster=your-cluster --approve

# Create IAM role with trust policy for service account
eksctl create iamserviceaccount \
  --name mcp-devops-sa \
  --namespace mcp-devops \
  --cluster your-cluster \
  --attach-policy-arn arn:aws:iam::ACCOUNT_ID:policy/mcp-devops-policy \
  --approve

💻 Client Configuration

ChatGPT MCP

Add to your ChatGPT MCP configuration:

{
  "mcpServers": {
    "aws-devops": {
      "url": "https://mcp.company.com",
      "apiKey": "YOUR_API_KEY_HERE"
    }
  }
}

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "aws-devops": {
      "command": "python",
      "args": ["-m", "mcp_server.server"],
      "env": {
        "MCP_SECRET_KEY": "your-secret-key",
        "MCP_API_KEYS": "key1,key2"
      }
    }
  }
}

Qwen Agent

agent:
  name: aws-devops-agent
  mcp_servers:
    - name: aws-devops
      endpoint: https://mcp.company.com
      auth:
        type: api_key
        key: YOUR_API_KEY_HERE

📊 Usage Examples

Example 1: List and Start EC2 Instances

Agent: "List all EC2 instances in us-east-1"
MCP: [Returns list of instances with IDs and states]

Agent: "Start instance i-1234567890abcdef0"
MCP: {"status": "started", "instance_id": "i-1234567890abcdef0"}

Example 2: Troubleshoot Lambda Function

Agent: "Troubleshoot Lambda function my-api-function in us-east-1"
MCP: {
  "function_name": "my-api-function",
  "recent_errors": [
    "ERROR: Timeout after 30 seconds",
    "Exception: Unable to connect to database"
  ],
  "suggestions": [
    "Check function timeout setting",
    "Verify database security group allows Lambda access",
    "Review VPC configuration"
  ],
  "root_cause": "Network connectivity issue to RDS database"
}

Example 3: Cost Optimization

Agent: "Find cost optimization opportunities in us-west-2"
MCP: {
  "region": "us-west-2",
  "findings": [
    {
      "type": "idle_ec2",
      "resource_id": "i-abcdef123456",
      "avg_cpu": "2.3%",
      "recommendation": "Consider stopping or terminating this instance"
    },
    {
      "type": "unattached_ebs",
      "resource_id": "vol-xyz789",
      "size": 100,
      "recommendation": "Delete if not needed to save $10/month"
    }
  ],
  "total_opportunities": 2
}

Example 4: Kubernetes Operations

Agent: "List pods in production namespace"
MCP: {"pods": [{"name": "api-7d8f9c-abc", "status": "Running"}, ...]}

Agent: "Get logs for pod api-7d8f9c-abc"
MCP: [Returns last 100 lines of logs]

Agent: "Restart deployment api-deployment"
MCP: {"status": "restarted", "deployment": "api-deployment"}

🛡️ Security Best Practices

  1. Use Resource Tags: Tag resources with ManagedBy: MCP to restrict operations
  2. Rotate API Keys: Regularly rotate API keys and JWT secrets
  3. Monitor Audit Logs: Review /var/log/mcp-audit.log regularly
  4. Least Privilege: Use the provided IAM policies as a starting point, further restrict as needed
  5. Network Isolation: Deploy in private subnets, use VPC endpoints
  6. Enable CloudTrail: Track all AWS API calls made by the MCP server
  7. Rate Limiting: Configure rate limits in API Gateway
  8. Approval Workflows: Implement manual approval for destructive operations

🔍 Troubleshooting

Server Won't Start

# Check logs
journalctl -u mcp-devops -f  # For systemd
kubectl logs -f deployment/mcp-devops -n mcp-devops  # For Kubernetes

# Verify IAM permissions
aws sts get-caller-identity

# Test connectivity
curl http://localhost:8080/health

Permission Denied Errors

# Verify IAM role is attached
aws ec2 describe-instances --instance-ids i-xxxxx --query 'Reservations[0].Instances[0].IamInstanceProfile'

# Check IAM policy
aws iam get-role-policy --role-name mcp-devops-ec2-role --policy-name mcp-permissions

# Verify resource tags
aws ec2 describe-instances --instance-ids i-xxxxx --query 'Reservations[0].Instances[0].Tags'

Connection Timeouts

# Check security groups
aws ec2 describe-security-groups --group-ids sg-xxxxx

# Verify VPC endpoints (if using)
aws ec2 describe-vpc-endpoints

# Test network connectivity
telnet your-nlb-dns-name 8080

🔄 Adding New Tools

To add a new tool:

  1. Create the tool implementation in mcp_server/tools/
  2. Add the tool definition to server.py TOOLS list
  3. Add the handler in call_tool() function
  4. Update IAM policies if new permissions needed
  5. Update this README with tool documentation

Example:

# In mcp_server/tools/aws/rds.py
async def list_databases(region: str):
    rds = boto3.client('rds', region_name=region)
    response = rds.describe_db_instances()
    return {'databases': [db['DBInstanceIdentifier'] for db in response['DBInstances']]}

# In server.py
Tool(name="list_databases", description="List RDS databases", 
     inputSchema={"type": "object", "properties": {"region": {"type": "string"}}, "required": ["region"]})

📈 Monitoring and Observability

CloudWatch Metrics

The server automatically logs to CloudWatch (when deployed on AWS):

  • Request count
  • Error rate
  • Latency
  • Tool invocation counts

Audit Logs

All operations are logged in JSON format to /var/log/mcp-audit.log:

{
  "timestamp": "2025-12-03T14:30:00Z",
  "user_id": "system",
  "action": "start_instance",
  "arguments": {"instance_id": "i-xxxxx", "region": "us-east-1"},
  "status": "initiated"
}

Health Check Endpoint

curl http://localhost:8080/health
# Returns: {"status": "healthy", "version": "1.0.0"}

🔖 Versioning

Current version: 1.0.0

Version format: MAJOR.MINOR.PATCH

  • MAJOR: Breaking changes
  • MINOR: New features, backward compatible
  • PATCH: Bug fixes

📄 License

This is a production-grade enterprise platform. Customize licensing as needed for your organization.

🤝 Support

For issues, questions, or contributions:

  1. Check the troubleshooting section
  2. Review audit logs for error details
  3. Verify IAM permissions
  4. Check CloudWatch logs

🎯 Roadmap

  • Slack approval workflow integration
  • Multi-region failover
  • GraphQL API support
  • Terraform state management tools
  • Cost forecasting and budgets
  • Automated remediation workflows
  • Integration with ServiceNow/Jira
  • Advanced RBAC with team-based permissions

Built with ❤️ for Enterprise DevOps Teams