mcp-scp

repolhomp3/mcp-scp

3.2

If you are the rightful owner of mcp-scp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

An MCP (Model Context Protocol) server designed to automate SRE tasks, runbooks, and interactive playbooks with AWS Bedrock integration for intelligent decision making.

Tools
6
Resources
0
Prompts
0

SRE MCP Server

An MCP (Model Context Protocol) server designed to automate SRE tasks, runbooks, and interactive playbooks with AWS Bedrock integration for intelligent decision making.

Features

  • Automated Runbooks: Execute predefined SRE procedures (pod restarts, health checks, disk cleanup)
  • Interactive Playbooks: Step-by-step guided workflows for incident response and maintenance
  • AWS Bedrock Integration: Intelligent log analysis and decision support
  • Kubernetes Native: Deep integration with EKS clusters
  • Security First: RBAC, non-root containers, read-only filesystems

Quick Start

Prerequisites

  • EKS cluster with kubectl access
  • Docker for building images
  • AWS credentials configured for Bedrock access

Deploy to EKS

# Set your container registry (optional)
export REGISTRY=your-registry.com

# Deploy to cluster
./deploy.sh

Local Development

# Install dependencies
pip install -r requirements.txt

# Run locally (requires kubeconfig)
python src/sre_mcp_server.py

MCP Tools Available

Core Operations

  • execute_runbook: Run predefined automation procedures
  • start_interactive_playbook: Begin guided incident response workflows
  • get_cluster_health: Comprehensive cluster status and metrics
  • analyze_logs: AI-powered log analysis using AWS Bedrock
  • scale_deployment: Safe deployment scaling with confirmations
  • create_incident_response: Structured incident management

Example Usage

# Execute a runbook
await execute_runbook(
    runbook_name="pod_restart",
    parameters={"namespace": "production", "selector": "app=web"},
    dry_run=True
)

# Start incident response
await start_interactive_playbook(
    playbook_type="incident_response",
    incident_id="INC-001",
    severity="high"
)

# Analyze logs with AI
await analyze_logs(
    service_name="api-service",
    time_range="1h",
    log_level="error"
)

Available Runbooks

  • pod_restart: Graceful pod restart with health verification
  • service_health_check: Comprehensive service and endpoint validation
  • disk_cleanup: Automated cleanup of temporary files and logs

Interactive Playbooks

  • incident_response: Structured incident management workflow
  • deployment_rollback: Safe rollback procedures with verification
  • maintenance: Guided maintenance operations

Configuration

Environment Variables

  • AWS_REGION: AWS region for Bedrock (default: us-east-1)
  • LOG_LEVEL: Logging level (default: INFO)

Kubernetes RBAC

The server requires cluster-level permissions for:

  • Reading pods, services, deployments
  • Updating/patching deployments for scaling
  • Accessing metrics APIs

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   MCP Client    │────│  SRE MCP Server  │────│  AWS Bedrock    │
│  (Q Developer)  │    │                  │    │   (Claude-3)    │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                │
                                │
                       ┌──────────────────┐
                       │   EKS Cluster    │
                       │  (Kubernetes)    │
                       └──────────────────┘

Security Considerations

  • Non-root container execution
  • Read-only root filesystem
  • Minimal RBAC permissions
  • Secure secrets management for AWS credentials
  • Network policies for pod-to-pod communication

Monitoring & Observability

  • Structured JSON logging with correlation IDs
  • Health and readiness probes
  • Prometheus metrics endpoint (planned)
  • Distributed tracing support (planned)

Development

Adding New Runbooks

  1. Add runbook definition to src/runbooks.py
  2. Register in SREMCPServer.setup_tools()
  3. Update documentation

Adding New Playbooks

  1. Create playbook template in src/playbooks.py
  2. Define step-by-step workflow
  3. Add automation hooks where appropriate

Contributing

  1. Follow existing code patterns
  2. Add comprehensive error handling
  3. Include structured logging
  4. Update documentation
  5. Test with dry-run capabilities