nachtschatt3n/ai-sre
If you are the rightful owner of ai-sre and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The AI SRE project provides a lightweight MCP server for AI-powered Site Reliability Engineering operations, exposing Kubernetes, Git, and Flux operations through a REST API.
🤖 AI SRE - Intelligent Site Reliability Engineering
A lean, containerized toolbox for AI-powered Site Reliability Engineering operations. This project provides a Model Context Protocol (MCP) compliant server that exposes Kubernetes, Git, and Flux operations through standardized MCP tools, designed to be seamlessly integrated with N8N's MCP Client node.
🚀 Production Ready: Lightweight (< 256MB RAM), fast startup (< 10s), and battle-tested for Kubernetes operations with Flux GitOps. Now with full MCP protocol support!
🏗️ Architecture
- Container: Stateless command executor (< 256MB RAM, < 500MB image)
- Orchestration: External (N8N handles intelligence, alert processing, RAG/vector store)
- GitOps: Flux-only approach for simplified operations
- Protocol: Model Context Protocol (MCP) - JSON-RPC over WebSocket/HTTP
- Integration: Direct compatibility with N8N's MCP Client node
🚀 Quick Start
Prerequisites
- Docker & Docker Compose
- Kubernetes cluster (local or remote)
- Flux CLI (for GitOps operations)
- Git repository access
Installation
# Clone and setup
git clone https://github.com/nachtschatt3n/ai-sre.git
cd ai-sre
# Configure environment
cp env.template .env
# Edit .env with your configuration
# Build and run
make init
make build
make run
# Test the MCP server
curl http://localhost:8080/health
# Test MCP protocol
curl -X POST http://localhost:8080/mcp/http \
-H "Content-Type: application/json" \
-d '{"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}}'
Git Repository Configuration
The AI SRE server can automatically clone and manage your Kubernetes GitOps repository on startup.
Environment Variables
Create a .env file from the template:
cp env.template .env
nano .env
Required Git Configuration:
# Kubernetes GitOps repository
K8S_GIT_REPO=git@github.com:your-org/your-k8s-repo.git
K8S_GIT_BRANCH=main
K8S_REPO_PATH=/app/k8s-repo
# Git user configuration
GITHUB_USER=AI-SRE
GITHUB_EMAIL=ai-sre@your-org.com
# Authentication (choose one)
GITHUB_TOKEN=ghp_xxxxxxxxxxxx # For HTTPS
# OR mount SSH keys: ${HOME}/.ssh:/home/aisre/.ssh:ro
Authentication Methods
Option 1: SSH Keys (Recommended)
# Mount SSH keys in docker-compose.yaml
volumes:
- ${HOME}/.ssh:/home/aisre/.ssh:ro
# Set repository URL to SSH
K8S_GIT_REPO=git@github.com:your-org/your-k8s-repo.git
Option 2: GitHub Token
# Set repository URL to HTTPS
K8S_GIT_REPO=https://github.com/your-org/your-k8s-repo.git
GITHUB_TOKEN=ghp_xxxxxxxxxxxx
Automatic Repository Initialization
On startup, the server will:
- ✅ Clone the repository if it doesn't exist
- ✅ Pull latest changes if repository exists
- ✅ Checkout the specified branch
- ✅ Display repository status and latest commit
Docker Compose (Recommended)
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f ai-sre
# Stop services
docker-compose down
Kubernetes Deployment
# Deploy to Kubernetes
kubectl apply -f k8s-repo/manifests/
# Check deployment
kubectl get pods -n ai-sre
# Port forward for testing
kubectl port-forward svc/ai-sre 8080:8080 -n ai-sre
✨ Key Features
🎯 AI-Powered Operations
- Intelligent Incident Response: Automated diagnosis and remediation
- Pattern Recognition: Learns from incidents to prevent future issues
- Predictive Monitoring: Proactive alerting based on learned patterns
- Self-Healing: Automatic recovery from common Kubernetes issues
🔧 MCP Protocol Tools
Kubernetes Operations
kubectl_get- Get Kubernetes resources (pods, nodes, services, etc.)kubectl_describe- Describe Kubernetes resources with detailed informationkubectl_logs- Retrieve pod logs for troubleshooting
GitOps & Flux
flux_status- Get Flux GitOps synchronization status and health
Git Operations
git_status- Check git repository statusgit_pull- Pull latest changes from the Kubernetes Git repositorygit_commit- Commit changes to the Kubernetes Git repositorygit_push- Push changes to the Kubernetes Git repository
CLI Tools & Diagnostics
cli_tool- Execute CLI tools (jq, grep, sed, curl, cat, tree, find, etc.)health_check- Check system and service health status
MCP Protocol Support
- WebSocket Connection: Real-time bidirectional communication
- JSON-RPC 2.0: Standardized message format
- Tool Discovery: Automatic tool listing and capability negotiation
- Resource Management: Access to configuration and version information
System & Health
GET /health- Health check endpointGET /ready- Readiness probe- WebSocket:
ws://host:port/mcp- MCP protocol endpoint - HTTP MCP:
POST /mcp/http- HTTP-based MCP for testing
📦 Container Specifications
| Specification | Value | Notes |
|---|---|---|
| Image Size | < 500MB | Optimized Alpine-based image |
| Memory Usage | < 256MB | Efficient resource utilization |
| CPU Usage | < 0.2 cores | Lightweight processing |
| Startup Time | < 10 seconds | Fast container initialization |
| Dependencies | Minimal | Python + aiohttp only |
📚 Comprehensive Runbook System
- 4,212+ Lines of Content: Extensive knowledge base with templates and best practices
- Static Templates: Incident response, remediation, and postmortem templates
- Dynamic Content: Real incident tracking and pattern recognition
- Learning System: Continuous knowledge base updates and pattern evolution
- Best Practices: Monitoring, alerting, and operational guidelines
- Search & Discovery: Intelligent search index for quick knowledge retrieval
🛡️ Security & Reliability
- RBAC Integration: Full Kubernetes RBAC support
- Secret Management: Secure handling of credentials and tokens
- Audit Logging: Complete operation audit trail
- Error Handling: Graceful failure recovery
- Resource Limits: Built-in resource constraints
💡 Usage Examples
N8N MCP Client Integration
The AI SRE server is fully compatible with N8N's MCP Client node, providing seamless integration for automated incident response and GitOps workflows.
{
"connectionType": "WebSocket",
"serverUrl": "ws://ai-sre.ai.svc.cluster.local:8080/mcp",
"authentication": {
"type": "none"
}
}
Available N8N Integration Resources
- 🤖 : 12 comprehensive prompts for different scenarios
- 📝 : Real-world pod crashloop response workflow
- 🧠 : Continuous learning and knowledge updates
- 🔗 : Complete setup and configuration guide
MCP Tool Execution Examples
Get Kubernetes Pods
{
"method": "tools/call",
"params": {
"name": "kubectl_get",
"arguments": {
"resource": "pods",
"namespace": "default",
"output": "json"
}
}
}
Get Pod Logs
{
"method": "tools/call",
"params": {
"name": "kubectl_logs",
"arguments": {
"pod": "my-app-7d4b8c9f-x2k9m",
"namespace": "production",
"lines": 100
}
}
}
Check Flux GitOps Status
{
"method": "tools/call",
"params": {
"name": "flux_status",
"arguments": {
"namespace": "flux-system"
}
}
}
Execute CLI Tools
{
"method": "tools/call",
"params": {
"name": "cli_tool",
"arguments": {
"tool": "jq",
"args": [".status.phase"],
"input": "{{ $json.kubectl_output }}"
}
}
}
Health Check
{
"method": "tools/call",
"params": {
"name": "health_check",
"arguments": {}
}
}
Git Repository Operations
{
"method": "tools/call",
"params": {
"name": "git_pull",
"arguments": {
"branch": "main",
"force": false
}
}
}
Commit Changes
{
"method": "tools/call",
"params": {
"name": "git_commit",
"arguments": {
"message": "fix: update resource limits",
"files": ["manifests/deployment.yaml"],
"all": false
}
}
}
Push Changes
{
"method": "tools/call",
"params": {
"name": "git_push",
"arguments": {
"branch": "main",
"force": false
}
}
}
MCP Protocol Testing
# Test health endpoint
curl http://localhost:8080/health
# Test MCP initialization
curl -X POST http://localhost:8080/mcp/http \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {"tools": {}},
"clientInfo": {"name": "test-client", "version": "1.0.0"}
}
}'
# List available tools
curl -X POST http://localhost:8080/mcp/http \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/list",
"params": {}
}'
# Execute a tool
curl -X POST http://localhost:8080/mcp/http \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "kubectl_get",
"arguments": {
"resource": "pods",
"namespace": "default"
}
}
}'
WebSocket Connection (JavaScript)
const ws = new WebSocket('ws://localhost:8080/mcp');
ws.onopen = () => {
// Initialize MCP connection
ws.send(JSON.stringify({
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {"tools": {}},
"clientInfo": {"name": "my-client", "version": "1.0.0"}
}
}));
};
ws.onmessage = (event) => {
const response = JSON.parse(event.data);
console.log('MCP Response:', response);
};
📁 Project Structure
ai-sre/
├── 📄 README.md # This comprehensive guide
├── 📄 LICENSE # MIT License
├── 🐳 Dockerfile # Lean container definition
├── 🐳 docker-compose.yaml # Local development setup
├── 🔧 Makefile # Build and management commands
├── ⚙️ env.template # Environment variables template
├── 🧪 test_mcp.py # MCP server testing script
├── 📁 docs/ # Documentation
│ ├── 📋 REQUIREMENTS.md # Comprehensive requirements
│ ├── 🏗️ ARCHITECTURE.md # Architecture overview
│ ├── 📚 RUNBOOK_SYSTEM.md # Runbook system documentation
│ ├── 🚀 QUICKSTART.md # Developer quick start guide
│ ├── 🔗 N8N_INTEGRATION_GUIDE.md # Complete N8N integration guide
│ ├── 🤖 n8n-agent-prompts.md # N8N Agent node prompts
│ ├── 📝 n8n-incident-example.md # Real-world incident example
│ └── 🧠 n8n-learning-prompts.md # Learning and knowledge prompts
├── 📁 runbooks/ # Comprehensive runbook system
│ ├── 📁 static/ # Static runbook templates
│ │ ├── 📝 agent.md # AI knowledge base and capabilities
│ │ ├── 📁 templates/ # Incident response templates
│ │ │ ├── 📋 incident.md # Incident response template
│ │ │ ├── 🔧 remediation.md # Remediation process template
│ │ │ └── 📊 postmortem.md # Postmortem analysis template
│ │ └── 📁 best-practices/ # Best practices and guidelines
│ │ └── 📈 monitoring.md # Monitoring best practices
│ ├── 📁 dynamic/ # Dynamic runbook content
│ │ ├── 📁 incidents/ # Real incident documentation
│ │ │ └── 🚨 2025-10-15-pod-crashloop.md
│ │ ├── 📁 patterns/ # Learned patterns and trends
│ │ │ └── 💾 memory-pressure.md
│ │ └── 📁 resolutions/ # Resolution strategies
│ │ ├── 📁 auto-generated/ # Automated resolutions
│ │ │ └── 🔧 memory-limit-increase.md
│ │ └── 📁 manual/ # Manual resolution procedures
│ └── 📁 cache/ # Runbook system cache
│ ├── 🔍 search-index.json # Search index for runbooks
│ ├── 📊 patterns.json # Pattern recognition cache
│ └── 📈 metrics.json # Usage metrics and analytics
├── 📁 src/
│ └── 🐍 mcp_server_protocol.py # MCP Protocol compliant server
├── 📁 scripts/
│ └── 🚀 entrypoint.sh # Container initialization
├── 📁 config/
│ └── ⚙️ config.yaml # Runtime configuration
├── 📁 k8s-repo/ # Kubernetes deployment manifests
│ ├── 📋 configmap.yaml # Kubernetes ConfigMap & Secrets
│ ├── 📋 storage.yaml # Persistent storage claims
│ └── 📋 deployment.yaml # Complete deployment manifests
├── 📁 logs/ # Application logs
└── 📁 work/ # Working directory
🔧 Development
See for detailed development instructions.
Local Development
# Install dependencies
pip install aiohttp pyyaml
# Run MCP server locally
python src/mcp_server_protocol.py
# Test MCP server
python test_mcp.py
# Run tests
make test-mcp
make test-git-tools
# Lint code
make lint
Testing
# Test MCP protocol
make test-mcp
# Test Git tools
make test-git-tools
# Test kubectl tools
make test-kubectl
# Test Flux tools
make test-flux
# Test with Docker Compose
docker-compose up -d
📖 Documentation
- 📋 - Comprehensive project requirements
- 🏗️ - System architecture details
- 📚 - Comprehensive runbook system documentation
- 🚀 - Developer quick start guide
- 🔗 - Complete N8N integration guide
- 🤖 - 12 detailed N8N Agent node prompts
- 📝 - Real-world incident response example
- 🧠 - Continuous learning and knowledge update prompts
- 📝 - AI agent capabilities and knowledge base
🤝 Contributing
We welcome contributions! This project follows a lean architecture principle where:
- Container: Remains stateless and lightweight
- Intelligence: Handled externally (N8N, AI agents)
- GitOps: Flux-only approach for simplicity
- API: Clean REST endpoints for all operations
Contribution Guidelines
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Standards
- Follow PEP 8 for Python code
- Use type hints where appropriate
- Add tests for new features
- Update documentation for API changes
- Keep container size minimal
🆘 Support & Community
- 📧 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📚 Wiki: Project Wiki
- 🐛 Bug Reports: Use GitHub Issues with the
buglabel
🏆 Acknowledgments
- Flux - GitOps toolkit for Kubernetes
- Kubernetes - Container orchestration platform
- N8N - Workflow automation platform
- MCP - Model Context Protocol specification
📊 Project Status
📄 License
This project is licensed under the MIT License - see the file for details.
🚀 Built with ❤️ for the Kubernetes and GitOps community