repolhomp3/mcp-scp
3.2
If you are the rightful owner of mcp-scp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
An MCP (Model Context Protocol) server designed to automate SRE tasks, runbooks, and interactive playbooks with AWS Bedrock integration for intelligent decision making.
Tools
6
Resources
0
Prompts
0
SRE MCP Server
An MCP (Model Context Protocol) server designed to automate SRE tasks, runbooks, and interactive playbooks with AWS Bedrock integration for intelligent decision making.
Features
- Automated Runbooks: Execute predefined SRE procedures (pod restarts, health checks, disk cleanup)
- Interactive Playbooks: Step-by-step guided workflows for incident response and maintenance
- AWS Bedrock Integration: Intelligent log analysis and decision support
- Kubernetes Native: Deep integration with EKS clusters
- Security First: RBAC, non-root containers, read-only filesystems
Quick Start
Prerequisites
- EKS cluster with kubectl access
- Docker for building images
- AWS credentials configured for Bedrock access
Deploy to EKS
# Set your container registry (optional)
export REGISTRY=your-registry.com
# Deploy to cluster
./deploy.sh
Local Development
# Install dependencies
pip install -r requirements.txt
# Run locally (requires kubeconfig)
python src/sre_mcp_server.py
MCP Tools Available
Core Operations
execute_runbook: Run predefined automation proceduresstart_interactive_playbook: Begin guided incident response workflowsget_cluster_health: Comprehensive cluster status and metricsanalyze_logs: AI-powered log analysis using AWS Bedrockscale_deployment: Safe deployment scaling with confirmationscreate_incident_response: Structured incident management
Example Usage
# Execute a runbook
await execute_runbook(
runbook_name="pod_restart",
parameters={"namespace": "production", "selector": "app=web"},
dry_run=True
)
# Start incident response
await start_interactive_playbook(
playbook_type="incident_response",
incident_id="INC-001",
severity="high"
)
# Analyze logs with AI
await analyze_logs(
service_name="api-service",
time_range="1h",
log_level="error"
)
Available Runbooks
- pod_restart: Graceful pod restart with health verification
- service_health_check: Comprehensive service and endpoint validation
- disk_cleanup: Automated cleanup of temporary files and logs
Interactive Playbooks
- incident_response: Structured incident management workflow
- deployment_rollback: Safe rollback procedures with verification
- maintenance: Guided maintenance operations
Configuration
Environment Variables
AWS_REGION: AWS region for Bedrock (default: us-east-1)LOG_LEVEL: Logging level (default: INFO)
Kubernetes RBAC
The server requires cluster-level permissions for:
- Reading pods, services, deployments
- Updating/patching deployments for scaling
- Accessing metrics APIs
Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ MCP Client │────│ SRE MCP Server │────│ AWS Bedrock │
│ (Q Developer) │ │ │ │ (Claude-3) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
│
┌──────────────────┐
│ EKS Cluster │
│ (Kubernetes) │
└──────────────────┘
Security Considerations
- Non-root container execution
- Read-only root filesystem
- Minimal RBAC permissions
- Secure secrets management for AWS credentials
- Network policies for pod-to-pod communication
Monitoring & Observability
- Structured JSON logging with correlation IDs
- Health and readiness probes
- Prometheus metrics endpoint (planned)
- Distributed tracing support (planned)
Development
Adding New Runbooks
- Add runbook definition to
src/runbooks.py - Register in
SREMCPServer.setup_tools() - Update documentation
Adding New Playbooks
- Create playbook template in
src/playbooks.py - Define step-by-step workflow
- Add automation hooks where appropriate
Contributing
- Follow existing code patterns
- Add comprehensive error handling
- Include structured logging
- Update documentation
- Test with dry-run capabilities