ai-sre

nachtschatt3n/ai-sre

3.1

If you are the rightful owner of ai-sre and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The AI SRE project provides a lightweight MCP server for AI-powered Site Reliability Engineering operations, exposing Kubernetes, Git, and Flux operations through a REST API.

🤖 AI SRE - Intelligent Site Reliability Engineering

License: MIT Python 3.9+ Docker Kubernetes Flux

CI/CD Pipeline Simple Security Scan Multi-Platform Tests ARM Build GitHub Release Docker Image Multi-Platform Build

A lean, containerized toolbox for AI-powered Site Reliability Engineering operations. This project provides a Model Context Protocol (MCP) compliant server that exposes Kubernetes, Git, and Flux operations through standardized MCP tools, designed to be seamlessly integrated with N8N's MCP Client node.

🚀 Production Ready: Lightweight (< 256MB RAM), fast startup (< 10s), and battle-tested for Kubernetes operations with Flux GitOps. Now with full MCP protocol support!

🏗️ Architecture

  • Container: Stateless command executor (< 256MB RAM, < 500MB image)
  • Orchestration: External (N8N handles intelligence, alert processing, RAG/vector store)
  • GitOps: Flux-only approach for simplified operations
  • Protocol: Model Context Protocol (MCP) - JSON-RPC over WebSocket/HTTP
  • Integration: Direct compatibility with N8N's MCP Client node

🚀 Quick Start

Prerequisites

  • Docker & Docker Compose
  • Kubernetes cluster (local or remote)
  • Flux CLI (for GitOps operations)
  • Git repository access

Installation

# Clone and setup
git clone https://github.com/nachtschatt3n/ai-sre.git
cd ai-sre

# Configure environment
cp env.template .env
# Edit .env with your configuration

# Build and run
make init
make build
make run

# Test the MCP server
curl http://localhost:8080/health

# Test MCP protocol
curl -X POST http://localhost:8080/mcp/http \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}}'

Git Repository Configuration

The AI SRE server can automatically clone and manage your Kubernetes GitOps repository on startup.

Environment Variables

Create a .env file from the template:

cp env.template .env
nano .env

Required Git Configuration:

# Kubernetes GitOps repository
K8S_GIT_REPO=git@github.com:your-org/your-k8s-repo.git
K8S_GIT_BRANCH=main
K8S_REPO_PATH=/app/k8s-repo

# Git user configuration
GITHUB_USER=AI-SRE
GITHUB_EMAIL=ai-sre@your-org.com

# Authentication (choose one)
GITHUB_TOKEN=ghp_xxxxxxxxxxxx  # For HTTPS
# OR mount SSH keys: ${HOME}/.ssh:/home/aisre/.ssh:ro
Authentication Methods

Option 1: SSH Keys (Recommended)

# Mount SSH keys in docker-compose.yaml
volumes:
  - ${HOME}/.ssh:/home/aisre/.ssh:ro

# Set repository URL to SSH
K8S_GIT_REPO=git@github.com:your-org/your-k8s-repo.git

Option 2: GitHub Token

# Set repository URL to HTTPS
K8S_GIT_REPO=https://github.com/your-org/your-k8s-repo.git
GITHUB_TOKEN=ghp_xxxxxxxxxxxx
Automatic Repository Initialization

On startup, the server will:

  1. ✅ Clone the repository if it doesn't exist
  2. ✅ Pull latest changes if repository exists
  3. ✅ Checkout the specified branch
  4. ✅ Display repository status and latest commit

Docker Compose (Recommended)

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f ai-sre

# Stop services
docker-compose down

Kubernetes Deployment

# Deploy to Kubernetes
kubectl apply -f k8s-repo/manifests/

# Check deployment
kubectl get pods -n ai-sre

# Port forward for testing
kubectl port-forward svc/ai-sre 8080:8080 -n ai-sre

✨ Key Features

🎯 AI-Powered Operations

  • Intelligent Incident Response: Automated diagnosis and remediation
  • Pattern Recognition: Learns from incidents to prevent future issues
  • Predictive Monitoring: Proactive alerting based on learned patterns
  • Self-Healing: Automatic recovery from common Kubernetes issues

🔧 MCP Protocol Tools

Kubernetes Operations
  • kubectl_get - Get Kubernetes resources (pods, nodes, services, etc.)
  • kubectl_describe - Describe Kubernetes resources with detailed information
  • kubectl_logs - Retrieve pod logs for troubleshooting
GitOps & Flux
  • flux_status - Get Flux GitOps synchronization status and health
Git Operations
  • git_status - Check git repository status
  • git_pull - Pull latest changes from the Kubernetes Git repository
  • git_commit - Commit changes to the Kubernetes Git repository
  • git_push - Push changes to the Kubernetes Git repository
CLI Tools & Diagnostics
  • cli_tool - Execute CLI tools (jq, grep, sed, curl, cat, tree, find, etc.)
  • health_check - Check system and service health status
MCP Protocol Support
  • WebSocket Connection: Real-time bidirectional communication
  • JSON-RPC 2.0: Standardized message format
  • Tool Discovery: Automatic tool listing and capability negotiation
  • Resource Management: Access to configuration and version information
System & Health
  • GET /health - Health check endpoint
  • GET /ready - Readiness probe
  • WebSocket: ws://host:port/mcp - MCP protocol endpoint
  • HTTP MCP: POST /mcp/http - HTTP-based MCP for testing

📦 Container Specifications

SpecificationValueNotes
Image Size< 500MBOptimized Alpine-based image
Memory Usage< 256MBEfficient resource utilization
CPU Usage< 0.2 coresLightweight processing
Startup Time< 10 secondsFast container initialization
DependenciesMinimalPython + aiohttp only

📚 Comprehensive Runbook System

  • 4,212+ Lines of Content: Extensive knowledge base with templates and best practices
  • Static Templates: Incident response, remediation, and postmortem templates
  • Dynamic Content: Real incident tracking and pattern recognition
  • Learning System: Continuous knowledge base updates and pattern evolution
  • Best Practices: Monitoring, alerting, and operational guidelines
  • Search & Discovery: Intelligent search index for quick knowledge retrieval

🛡️ Security & Reliability

  • RBAC Integration: Full Kubernetes RBAC support
  • Secret Management: Secure handling of credentials and tokens
  • Audit Logging: Complete operation audit trail
  • Error Handling: Graceful failure recovery
  • Resource Limits: Built-in resource constraints

💡 Usage Examples

N8N MCP Client Integration

The AI SRE server is fully compatible with N8N's MCP Client node, providing seamless integration for automated incident response and GitOps workflows.

{
  "connectionType": "WebSocket",
  "serverUrl": "ws://ai-sre.ai.svc.cluster.local:8080/mcp",
  "authentication": {
    "type": "none"
  }
}
Available N8N Integration Resources
  • 🤖 : 12 comprehensive prompts for different scenarios
  • 📝 : Real-world pod crashloop response workflow
  • 🧠 : Continuous learning and knowledge updates
  • 🔗 : Complete setup and configuration guide

MCP Tool Execution Examples

Get Kubernetes Pods
{
  "method": "tools/call",
  "params": {
    "name": "kubectl_get",
    "arguments": {
      "resource": "pods",
      "namespace": "default",
      "output": "json"
    }
  }
}
Get Pod Logs
{
  "method": "tools/call",
  "params": {
    "name": "kubectl_logs",
    "arguments": {
      "pod": "my-app-7d4b8c9f-x2k9m",
      "namespace": "production",
      "lines": 100
    }
  }
}
Check Flux GitOps Status
{
  "method": "tools/call",
  "params": {
    "name": "flux_status",
    "arguments": {
      "namespace": "flux-system"
    }
  }
}
Execute CLI Tools
{
  "method": "tools/call",
  "params": {
    "name": "cli_tool",
    "arguments": {
      "tool": "jq",
      "args": [".status.phase"],
      "input": "{{ $json.kubectl_output }}"
    }
  }
}
Health Check
{
  "method": "tools/call",
  "params": {
    "name": "health_check",
    "arguments": {}
  }
}
Git Repository Operations
{
  "method": "tools/call",
  "params": {
    "name": "git_pull",
    "arguments": {
      "branch": "main",
      "force": false
    }
  }
}
Commit Changes
{
  "method": "tools/call",
  "params": {
    "name": "git_commit",
    "arguments": {
      "message": "fix: update resource limits",
      "files": ["manifests/deployment.yaml"],
      "all": false
    }
  }
}
Push Changes
{
  "method": "tools/call",
  "params": {
    "name": "git_push",
    "arguments": {
      "branch": "main",
      "force": false
    }
  }
}

MCP Protocol Testing

# Test health endpoint
curl http://localhost:8080/health

# Test MCP initialization
curl -X POST http://localhost:8080/mcp/http \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2024-11-05",
      "capabilities": {"tools": {}},
      "clientInfo": {"name": "test-client", "version": "1.0.0"}
    }
  }'

# List available tools
curl -X POST http://localhost:8080/mcp/http \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/list",
    "params": {}
  }'

# Execute a tool
curl -X POST http://localhost:8080/mcp/http \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "kubectl_get",
      "arguments": {
        "resource": "pods",
        "namespace": "default"
      }
    }
  }'

WebSocket Connection (JavaScript)

const ws = new WebSocket('ws://localhost:8080/mcp');

ws.onopen = () => {
  // Initialize MCP connection
  ws.send(JSON.stringify({
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2024-11-05",
      "capabilities": {"tools": {}},
      "clientInfo": {"name": "my-client", "version": "1.0.0"}
    }
  }));
};

ws.onmessage = (event) => {
  const response = JSON.parse(event.data);
  console.log('MCP Response:', response);
};

📁 Project Structure

ai-sre/
├── 📄 README.md                    # This comprehensive guide
├── 📄 LICENSE                      # MIT License
├── 🐳 Dockerfile                   # Lean container definition
├── 🐳 docker-compose.yaml          # Local development setup
├── 🔧 Makefile                     # Build and management commands
├── ⚙️  env.template                 # Environment variables template
├── 🧪 test_mcp.py                  # MCP server testing script
├── 📁 docs/                        # Documentation
│   ├── 📋 REQUIREMENTS.md          # Comprehensive requirements
│   ├── 🏗️  ARCHITECTURE.md          # Architecture overview
│   ├── 📚 RUNBOOK_SYSTEM.md        # Runbook system documentation
│   ├── 🚀 QUICKSTART.md            # Developer quick start guide
│   ├── 🔗 N8N_INTEGRATION_GUIDE.md # Complete N8N integration guide
│   ├── 🤖 n8n-agent-prompts.md     # N8N Agent node prompts
│   ├── 📝 n8n-incident-example.md  # Real-world incident example
│   └── 🧠 n8n-learning-prompts.md  # Learning and knowledge prompts
├── 📁 runbooks/                    # Comprehensive runbook system
│   ├── 📁 static/                  # Static runbook templates
│   │   ├── 📝 agent.md             # AI knowledge base and capabilities
│   │   ├── 📁 templates/           # Incident response templates
│   │   │   ├── 📋 incident.md      # Incident response template
│   │   │   ├── 🔧 remediation.md   # Remediation process template
│   │   │   └── 📊 postmortem.md    # Postmortem analysis template
│   │   └── 📁 best-practices/      # Best practices and guidelines
│   │       └── 📈 monitoring.md    # Monitoring best practices
│   ├── 📁 dynamic/                 # Dynamic runbook content
│   │   ├── 📁 incidents/           # Real incident documentation
│   │   │   └── 🚨 2025-10-15-pod-crashloop.md
│   │   ├── 📁 patterns/            # Learned patterns and trends
│   │   │   └── 💾 memory-pressure.md
│   │   └── 📁 resolutions/         # Resolution strategies
│   │       ├── 📁 auto-generated/  # Automated resolutions
│   │       │   └── 🔧 memory-limit-increase.md
│   │       └── 📁 manual/          # Manual resolution procedures
│   └── 📁 cache/                   # Runbook system cache
│       ├── 🔍 search-index.json    # Search index for runbooks
│       ├── 📊 patterns.json        # Pattern recognition cache
│       └── 📈 metrics.json         # Usage metrics and analytics
├── 📁 src/
│   └── 🐍 mcp_server_protocol.py   # MCP Protocol compliant server
├── 📁 scripts/
│   └── 🚀 entrypoint.sh            # Container initialization
├── 📁 config/
│   └── ⚙️  config.yaml              # Runtime configuration
├── 📁 k8s-repo/                    # Kubernetes deployment manifests
│   ├── 📋 configmap.yaml           # Kubernetes ConfigMap & Secrets
│   ├── 📋 storage.yaml             # Persistent storage claims
│   └── 📋 deployment.yaml          # Complete deployment manifests
├── 📁 logs/                        # Application logs
└── 📁 work/                        # Working directory

🔧 Development

See for detailed development instructions.

Local Development

# Install dependencies
pip install aiohttp pyyaml

# Run MCP server locally
python src/mcp_server_protocol.py

# Test MCP server
python test_mcp.py

# Run tests
make test-mcp
make test-git-tools

# Lint code
make lint

Testing

# Test MCP protocol
make test-mcp

# Test Git tools
make test-git-tools

# Test kubectl tools
make test-kubectl

# Test Flux tools
make test-flux

# Test with Docker Compose
docker-compose up -d

📖 Documentation

  • 📋 - Comprehensive project requirements
  • 🏗️ - System architecture details
  • 📚 - Comprehensive runbook system documentation
  • 🚀 - Developer quick start guide
  • 🔗 - Complete N8N integration guide
  • 🤖 - 12 detailed N8N Agent node prompts
  • 📝 - Real-world incident response example
  • 🧠 - Continuous learning and knowledge update prompts
  • 📝 - AI agent capabilities and knowledge base

🤝 Contributing

We welcome contributions! This project follows a lean architecture principle where:

  • Container: Remains stateless and lightweight
  • Intelligence: Handled externally (N8N, AI agents)
  • GitOps: Flux-only approach for simplicity
  • API: Clean REST endpoints for all operations

Contribution Guidelines

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Code Standards

  • Follow PEP 8 for Python code
  • Use type hints where appropriate
  • Add tests for new features
  • Update documentation for API changes
  • Keep container size minimal

🆘 Support & Community

🏆 Acknowledgments

  • Flux - GitOps toolkit for Kubernetes
  • Kubernetes - Container orchestration platform
  • N8N - Workflow automation platform
  • MCP - Model Context Protocol specification

📊 Project Status

📄 License

This project is licensed under the MIT License - see the file for details.


🚀 Built with ❤️ for the Kubernetes and GitOps community

⭐ Star this repo | 🐛 Report Bug | 💡 Request Feature