shaowenchen/ops-mcp-server
If you are the rightful owner of ops-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Ops MCP Server is a modular server for operational data querying using the Model Context Protocol (MCP), built with Go.
Ops MCP Server
š Enterprise-Grade AI Operations Assistant - A comprehensive Model Context Protocol (MCP) server that bridges AI assistants with your entire observability stack.
šÆ Project Vision
Transform AI assistants into intelligent operations engineers by providing seamless access to your Kubernetes clusters, monitoring systems, logging infrastructure, and operational procedures through a unified, secure, and scalable MCP interface.
š Project Overview
Ops MCP Server is a production-ready, Go-based MCP server that empowers AI assistants (Claude, VS Code, Cursor, etc.) to become your intelligent operations co-pilot. By implementing the Model Context Protocol standard, it provides a secure, unified gateway to your entire observability ecosystem:
- š Real-time Kubernetes Operations: Monitor pods, deployments, nodes, and cluster events
- š Advanced Metrics Intelligence: Query Prometheus metrics with natural language
- š Comprehensive Log Analysis: Search and analyze logs across Elasticsearch clusters
- š§ Automated Operations: Execute standard operating procedures (SOPS) safely
- š Distributed Tracing: Investigate performance issues with Jaeger integration
šļø System Architecture
graph TB
subgraph "š¤ AI Assistant Layer"
Claude["Claude Desktop"]
VSCode["VS Code"]
Cursor["Cursor IDE"]
Other["Other MCP Clients"]
end
subgraph "š MCP Protocol Layer"
MCP["Model Context Protocol<br/>(JSON-RPC 2.0)"]
end
subgraph "š Ops MCP Server"
subgraph "š Transport Layer"
HTTP["HTTP/SSE Server<br/>:80"]
STDIO["STDIO Interface"]
end
subgraph "š§ Core Engine"
Router["Request Router"]
Auth["Authentication<br/>& Authorization"]
Cache["Response Cache"]
Logger["Structured Logging"]
end
subgraph "š§ Plugin Modules"
SOPS["āļø SOPS Module<br/>Standard Operations"]
Events["šÆ Events Module<br/>Kubernetes Events"]
Metrics["š Metrics Module<br/>Prometheus Queries"]
Logs["š Logs Module<br/>Elasticsearch Search"]
Traces["š Traces Module<br/>Jaeger Integration"]
end
subgraph "āļø Configuration"
Config["config.yaml<br/>Module Settings"]
Env["Environment Variables<br/>Credentials & Secrets"]
end
end
subgraph "āļø Observability Stack"
subgraph "Kubernetes Cluster"
K8sAPI["Kubernetes API Server"]
Pods["Pods & Deployments"]
Nodes["Cluster Nodes"]
end
Prometheus["š Prometheus<br/>Metrics & Alerting"]
ES["š Elasticsearch<br/>Log Aggregation"]
Jaeger["š Jaeger<br/>Distributed Tracing"]
OpsAPI["āļø Ops API<br/>SOPS Execution"]
end
%% Client connections
Claude --> MCP
VSCode --> MCP
Cursor --> MCP
Other --> MCP
%% MCP to server
MCP --> HTTP
MCP --> STDIO
%% Server internal flow
HTTP --> Router
STDIO --> Router
Router --> Auth
Auth --> Cache
Cache --> Logger
%% Module routing
Router --> SOPS
Router --> Events
Router --> Metrics
Router --> Logs
Router --> Traces
%% External connections
Events -->|HTTPS/TLS| K8sAPI
Metrics -->|HTTPS/TLS| Prometheus
Logs -->|HTTPS/TLS| ES
Traces -->|HTTPS/TLS| Jaeger
SOPS -->|HTTPS/TLS| OpsAPI
%% Configuration
Config --> SOPS
Config --> Events
Config --> Metrics
Config --> Logs
Config --> Traces
Env --> SOPS
Env --> Events
Env --> Metrics
Env --> Logs
Env --> Traces
%% Data flow
K8sAPI --> Pods
K8sAPI --> Nodes
classDef client fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef server fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef module fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px
classDef external fill:#fff3e0,stroke:#e65100,stroke-width:2px
class Claude,VSCode,Cursor,Other client
class HTTP,STDIO,Router,Auth,Cache,Logger server
class SOPS,Events,Metrics,Logs,Traces module
class K8sAPI,Prometheus,ES,Jaeger,OpsAPI external
⨠Key Features
šļø Enterprise-Ready Architecture
- š Security-First Design: Role-based access control, secure credential management, and audit logging
- ā” High Performance: Built with Go 1.24.2, optimized for low latency and high throughput
- š Multi-Protocol Support: HTTP/SSE and stdio modes for flexible deployment scenarios
- š Scalable & Reliable: Production-tested with comprehensive error handling and retry mechanisms
- šļø Highly Configurable: YAML-based configuration with environment variable support
š§© Modular Plugin System
- āļø SOPS Module: Execute standardized operational procedures with safety checks and rollback capabilities
- šÆ Events Module: Real-time Kubernetes event monitoring with intelligent filtering and correlation
- š Metrics Module: Advanced Prometheus querying with natural language processing capabilities
- š Logs Module: Powerful Elasticsearch integration with full-text search and log aggregation
- š Traces Module: Distributed tracing analysis with Jaeger integration for performance optimization
š¤ AI-Native Design
- š§ Intelligent Context: Provides rich metadata and context to AI assistants for better decision-making
- š¬ Natural Language Interface: Converts complex queries into simple, conversational interactions
- š Real-time Updates: Live data streaming for immediate operational insights
- š Smart Aggregation: Intelligent data summarization and trend analysis
š ļø Available Tools
The server exposes a comprehensive suite of MCP tools, each designed for specific operational tasks:
āļø Standard Operations (SOPS)
Tool | Description | Use Case |
---|---|---|
execute-sops | Execute standardized operational procedures with safety checks | Infrastructure management, automated deployments, maintenance tasks |
šÆ Kubernetes Events
Tool | Description | Use Case |
---|---|---|
get-pod-events | Retrieve real-time events from pods across namespaces/clusters | Pod troubleshooting, deployment monitoring, resource issues |
get-deployment-events | Monitor deployment lifecycle events and status changes | Release tracking, rollback decisions, scaling operations |
get-node-events | Track node-level events and cluster health indicators | Node maintenance, capacity planning, hardware issues |
š Prometheus Metrics
Tool | Description | Use Case |
---|---|---|
list-metrics | Discover available metrics and their metadata | Metric exploration, dashboard creation, alerting setup |
query-metrics | Execute instant PromQL queries for real-time data | Performance monitoring, SLA tracking, capacity analysis |
query-metrics-range | Run time-series queries for historical analysis | Trend analysis, capacity planning, performance optimization |
š Elasticsearch Logs
Tool | Description | Use Case |
---|---|---|
search-logs | Full-text search across all log messages | Error investigation, security analysis, compliance auditing |
list-log-indices | Discover available log indices and their structure | Log management, retention policies, index optimization |
get-pod-logs | Retrieve logs from specific Kubernetes pods | Application debugging, performance analysis, error tracking |
š Distributed Tracing
Tool | Description | Use Case |
---|---|---|
get-services | List all available services in the tracing system | Service discovery, architecture analysis, dependency mapping |
get-operations | Retrieve available operations with span kinds | API analysis, performance profiling, service optimization |
get-trace | Fetch complete trace details by trace ID | Root cause analysis, performance bottleneck identification |
find-traces | Search traces based on various criteria | Error investigation, performance analysis, user journey tracking |
Tool Naming Convention
Tools use a consistent naming convention with hyphens as separators:
- Format:
{prefix}{verb-noun-context}{suffix}
- Examples:
get-pod-events
,list-metrics
,search-logs
- Configurable: Prefix and suffix can be customized for each module
Configuration
Configure the server using a YAML file (default path: configs/config.yaml
):
# Logging configuration
log:
level: info # Log level: debug, info, warn, error
# Server configuration
server:
host: 0.0.0.0 # Server binding address
port: 80 # Server port
mode: sse # Server mode: stdio or sse
# SOPS module configuration
sops:
enabled: true # Whether to enable SOPS module
tools:
prefix: "" # Tool name prefix
suffix: "-from-ops" # Tool name suffix
ops:
endpoint: "https://ops-server.your-company.com" # SOPS API endpoint
token: "${SOPS_TOKEN}" # API token (supports environment variables)
# Events module configuration
events:
enabled: true # Whether to enable events module
tools:
prefix: "" # Tool name prefix
suffix: "-from-ops" # Tool name suffix
ops:
endpoint: "https://ops-server.your-company.com" # Events API endpoint
token: "${EVENTS_OPS_TOKEN}" # API token (supports environment variables)
# Metrics module configuration
metrics:
enabled: true # Whether to enable metrics module
tools:
prefix: "" # Tool name prefix
suffix: "-from-prometheus" # Tool name suffix
prometheus:
endpoint: "https://prometheus.your-company.com" # Prometheus endpoint
timeout: 30 # Request timeout (seconds)
# Logs module configuration
logs:
enabled: true # Whether to enable logs module
tools:
prefix: "" # Tool name prefix
suffix: "-from-elasticsearch" # Tool name suffix
elasticsearch:
endpoint: "https://elasticsearch.your-company.com" # Elasticsearch endpoint
username: "${LOGS_ELASTICSEARCH_USERNAME}" # Username (supports environment variables)
password: "${LOGS_ELASTICSEARCH_PASSWORD}" # Password (supports environment variables)
timeout: 30 # Request timeout (seconds)
# Traces module configuration
traces:
enabled: true # Whether to enable traces module
tools:
prefix: "" # Tool name prefix
suffix: "-from-jaeger" # Tool name suffix
jaeger:
endpoint: "https://jaeger.your-company.com" # Jaeger API endpoint
timeout: 30 # Request timeout (seconds)
Environment Variables
Set the following environment variables in production:
Note:
- Command line flags take precedence over environment variables. If both are specified, the command line flag value will be used.
- By default, all modules are disabled. You need to explicitly enable modules using either command line flags or environment variables.
# Module enablement (default: all disabled)
export SOPS_ENABLED="true"
export EVENTS_ENABLED="true"
export METRICS_ENABLED="true"
export LOGS_ENABLED="true"
export TRACES_ENABLED="true"
# SOPS API configuration
export SOPS_OPS_ENDPOINT="https://ops-server.your-company.com"
export SOPS_OPS_TOKEN="your-sops-api-token"
# Events API configuration
export EVENTS_OPS_ENDPOINT="https://ops-server.your-company.com"
export EVENTS_OPS_TOKEN="your-events-api-token"
# Traces configuration
export TRACES_JAEGER_ENDPOINT="https://jaeger.your-company.com"
export TRACES_JAEGER_TIMEOUT="30"
# Elasticsearch configuration
export LOGS_ELASTICSEARCH_USERNAME="elastic"
export LOGS_ELASTICSEARCH_PASSWORD="your-elasticsearch-password"
# Optional: Prometheus authentication
# export PROMETHEUS_TOKEN="your-prometheus-token"
# Server configuration
export SERVER_HOST="0.0.0.0"
export SERVER_PORT="80"
export LOG_LEVEL="info"
Tool Name Configuration Examples
With the above configuration, the actual tool names will be:
SOPS Tools
execute-sops-from-ops
Events Tools
get-pod-events-from-ops
get-deployment-events-from-ops
get-node-events-from-ops
Metrics Tools
list-metrics-from-prometheus
query-metrics-from-prometheus
query-metrics-range-from-prometheus
Logs Tools
search-logs-from-elasticsearch
list-log-indices-from-elasticsearch
get-pod-logs-from-elasticsearch
Traces Tools
get-services-from-jaeger
get-operations-from-jaeger
get-trace-from-jaeger
find-traces-from-jaeger
To use default tool names (no prefix/suffix), set both prefix
and suffix
to empty strings ""
.
Usage Guide
Tool Call Examples
You can call tools with parameters (using the actual configured tool names):
// Execute SOPS procedure
const sopsResult = await mcpClient.callTool("execute-sops-from-ops", {
sops_id: "deploy-application",
parameters: '{"app_name": "my-app", "version": "v1.2.3"}',
timeout: "30m",
dry_run: "false",
});
// Get Pod events
const events = await mcpClient.callTool("get-pod-events-from-ops", {
cluster: "production",
namespace: "ai-nlp-fcheck",
limit: "20",
});
// Execute metrics query
const result = await mcpClient.callTool("query-metrics-from-prometheus", {
query: "count by (cluster) (up)",
});
// Search logs
const logs = await mcpClient.callTool("search-logs-from-elasticsearch", {
search_term: "error",
limit: "50",
time_range: "1h",
});
// Query traces
const traces = await mcpClient.callTool("find-traces-from-jaeger", {
service: "my-service",
operation: "my-operation",
limit: "10",
});
Claude Desktop Integration Example
Using this MCP server in Claude Desktop:
{
"mcpServers": {
"ops-mcp-server": {
"command": "docker",
"args": [
"run",
"--rm",
"-i",
"--env",
"SOPS_ENABLED=true",
"--env",
"SOPS_OPS_ENDPOINT=https://ops-server.your-company.com",
"--env",
"SOPS_OPS_TOKEN=your-sops-token",
"--env",
"EVENTS_ENABLED=true",
"--env",
"EVENTS_OPS_ENDPOINT=https://ops-server.your-company.com",
"--env",
"EVENTS_OPS_TOKEN=your-events-token",
"--env",
"METRICS_ENABLED=true",
"--env",
"LOGS_ENABLED=true",
"--env",
"LOGS_ELASTICSEARCH_USERNAME=elastic",
"--env",
"LOGS_ELASTICSEARCH_PASSWORD=your-password",
"--env",
"TRACES_ENABLED=true",
"--env",
"TRACES_JAEGER_ENDPOINT=https://jaeger.your-company.com",
"--env",
"TRACES_JAEGER_TIMEOUT=30",
"shaowenchen/ops-mcp-server:latest",
"--enable-sops",
"--enable-events",
"--enable-metrics",
"--enable-logs",
"--enable-traces"
]
}
}
}
Running the Server
Docker Container (Recommended)
Docker Quick Start
# Run with default configuration
docker run -d \
--name ops-mcp-server \
-p 80:80 \
-e SOPS_ENABLED="true" \
-e SOPS_OPS_ENDPOINT="https://ops-server.your-company.com" \
-e SOPS_OPS_TOKEN="your-sops-api-token" \
-e EVENTS_ENABLED="true" \
-e EVENTS_OPS_ENDPOINT="https://ops-server.your-company.com" \
-e EVENTS_OPS_TOKEN="your-events-api-token" \
-e METRICS_ENABLED="true" \
-e LOGS_ENABLED="true" \
-e LOGS_ELASTICSEARCH_USERNAME="elastic" \
-e LOGS_ELASTICSEARCH_PASSWORD="your-elasticsearch-password" \
-e TRACES_ENABLED="true" \
-e TRACES_JAEGER_ENDPOINT="https://jaeger.your-company.com" \
-e TRACES_JAEGER_TIMEOUT="30" \
shaowenchen/ops-mcp-server:latest \
--mode=sse --enable-sops --enable-events --enable-metrics --enable-logs --enable-traces
Docker with Custom Configuration
# Run with custom configuration file
docker run -d \
--name ops-mcp-server \
-p 80:80 \
-v $(pwd)/configs/config.yaml:/runtime/configs/config.yaml \
-e SOPS_ENABLED="true" \
-e SOPS_OPS_ENDPOINT="https://ops-server.your-company.com" \
-e SOPS_OPS_TOKEN="your-sops-api-token" \
-e EVENTS_ENABLED="true" \
-e EVENTS_OPS_ENDPOINT="https://ops-server.your-company.com" \
-e EVENTS_OPS_TOKEN="your-events-api-token" \
-e METRICS_ENABLED="true" \
-e LOGS_ENABLED="true" \
-e LOGS_ELASTICSEARCH_USERNAME="elastic" \
-e LOGS_ELASTICSEARCH_PASSWORD="your-elasticsearch-password" \
-e TRACES_ENABLED="true" \
-e TRACES_JAEGER_ENDPOINT="https://jaeger.your-company.com" \
-e TRACES_JAEGER_TIMEOUT="30" \
shaowenchen/ops-mcp-server:latest \
--config=./configs/config.yaml --mode=sse
Docker Compose Deployment
version: "3.8"
services:
ops-mcp-server:
image: shaowenchen/ops-mcp-server:latest
ports:
- "80:80"
environment:
- OPS_MCP_ENV=production
- OPS_MCP_LOG_LEVEL=info
- SOPS_ENABLED=${SOPS_ENABLED}
- SOPS_OPS_ENDPOINT=${SOPS_OPS_ENDPOINT}
- SOPS_OPS_TOKEN=${SOPS_OPS_TOKEN}
- EVENTS_ENABLED=${EVENTS_ENABLED}
- EVENTS_OPS_ENDPOINT=${EVENTS_OPS_ENDPOINT}
- EVENTS_OPS_TOKEN=${EVENTS_OPS_TOKEN}
- METRICS_ENABLED=${METRICS_ENABLED}
- LOGS_ENABLED=${LOGS_ENABLED}
- LOGS_ELASTICSEARCH_USERNAME=${LOGS_ELASTICSEARCH_USERNAME}
- LOGS_ELASTICSEARCH_PASSWORD=${LOGS_ELASTICSEARCH_PASSWORD}
- TRACES_ENABLED=${TRACES_ENABLED}
- TRACES_JAEGER_ENDPOINT=${TRACES_JAEGER_ENDPOINT}
- TRACES_JAEGER_TIMEOUT=${TRACES_JAEGER_TIMEOUT}
command:
[
"--mode=sse",
"--enable-sops",
"--enable-events",
"--enable-metrics",
"--enable-logs",
"--enable-traces",
]
healthcheck:
test:
[
"CMD",
"wget",
"--no-verbose",
"--tries=1",
"--spider",
"http://localhost:80/healthz",
]
interval: 30s
timeout: 3s
retries: 3
restart: unless-stopped
volumes:
- ./configs:/runtime/configs:ro # Mount configuration files (optional)
Local Development
Build from Source
# Clone the project
git clone https://github.com/shaowenchen/ops-mcp-server.git
cd ops-mcp-server
# Install dependencies
make dev-setup
# Build the project
make build
# Run server (stdio mode, for MCP clients)
./bin/ops-mcp-server --enable-sops --enable-events --enable-metrics --enable-logs --enable-traces
# Run server (SSE mode, for HTTP API)
./bin/ops-mcp-server --mode=sse --enable-sops --enable-events --enable-metrics --enable-logs --enable-traces
Using Makefile
# Quick development cycle
make quick # Format, check, test, build
# Run specific modules
make run-sops # Run SOPS module only
make run-events # Run events module only
make run-metrics # Run metrics module only
make run-logs # Run logs module only
make run-traces # Run traces module only
make run-all # Run all modules
# Test MCP functionality
make test-mcp
Kubernetes Deployment
Quick Deployment
# Build and deploy to Kubernetes
make k8s-build-deploy
# Or execute step by step
make docker-build docker-push k8s-deploy
Check Deployment Status
# Check deployment status
make k8s-status
# View application logs
make k8s-logs
# Clean up resources
make k8s-cleanup
Server Modes
SSE Mode (Server-Sent Events)
SSE mode is suitable for web-based clients and HTTP API access:
# Access server: http://localhost:80
# Health check endpoint: http://localhost:80/healthz
# MCP endpoint: http://localhost:80/mcp
Health check response example:
{
"status": "ok",
"service": "ops-mcp-server",
"version": "1.0.0",
"timestamp": "2024-01-20T10:30:00Z",
"mode": "sse",
"modules": {
"sops": true,
"events": true,
"metrics": true,
"logs": true,
"traces": true
},
"tools_count": 14
}
STDIO Mode
STDIO mode is suitable for direct MCP client integration (such as Claude Desktop):
./ops-mcp-server --enable-sops --enable-events --enable-metrics --enable-logs --enable-traces
Command Line Options
# Basic options
--mode # Server mode (stdio|sse, default: stdio)
--config # Configuration file path (default: configs/config.yaml)
--host # Server host (default: 0.0.0.0)
--port # Server port (default: 80)
--log-level # Log level (debug|info|warn|error, default: info)
# Module switches
--enable-sops # Enable SOPS module
--enable-events # Enable events module
--enable-metrics # Enable metrics module
--enable-logs # Enable logs module
--enable-traces # Enable traces module
# Usage example
./ops-mcp-server --mode=sse --enable-sops --enable-events --enable-metrics --enable-logs --enable-traces --port=8080 --log-level=debug
Development Guide
Project Structure
ops-mcp-server/
āāā cmd/server/ # Main program entry
āāā pkg/
ā āāā config/ # Configuration structure definitions
ā āāā modules/ # Business modules
ā ā āāā sops/ # SOPS module
ā ā āāā events/ # Events module
ā ā āāā metrics/ # Metrics module
ā ā āāā logs/ # Logs module
ā ā āāā traces/ # Traces module
ā āāā server/ # Server configuration
āāā configs/ # Configuration files
āāā deploy/ # Deployment configuration
āāā bin/ # Build output
āāā vendor/ # Go dependencies
Build and Test
# Complete test and build
make all
# Multi-platform build
make build-all
# Run tests
make test
# Generate test coverage report
make test-coverage
# Code check and format
make lint fmt
Troubleshooting
Common Issues
- Tool not found: Check if the module is enabled and if the tool name in the configuration file is correct
- Connection timeout: Verify network connectivity to external services (Prometheus, Elasticsearch)
- Authentication failure: Confirm that authentication information in environment variables is correct
- Port conflict: Use
--port
parameter to specify another port
Debug Mode
# Enable debug logging
./ops-mcp-server --log-level=debug --enable-events --enable-metrics --enable-logs --enable-sops
# View detailed request logs
export LOG_LEVEL=debug
docker run -e LOG_LEVEL=debug shaowenchen/ops-mcp-server:latest
Contributing
- Fork the project
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Development Environment Setup
# Set up development environment
make dev-setup
# Run pre-commit checks
make quick
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support and Feedback
- š Report Bugs
- š” Feature Requests
- š Documentation
- š§ Email: mail@chenshaowen.com
šÆ Why Choose Ops MCP Server?
š Immediate Value
- ā” 10x Faster Troubleshooting: Get instant answers to complex operational questions
- š§ AI-Powered Insights: Leverage AI's pattern recognition for proactive issue detection
- š 24/7 Operations: Never miss critical alerts with intelligent monitoring
- š Unified View: Single interface for all your observability data
š¢ Enterprise Benefits
- š Enterprise Security: Role-based access, audit trails, and secure credential management
- š Scalable Architecture: Handles high-volume production workloads with ease
- š ļø Easy Integration: Drop-in replacement for existing monitoring workflows
- š° Cost Effective: Reduce operational overhead and improve team productivity
š¤ AI-First Design
- š¬ Natural Language: Ask questions in plain English, get structured data back
- š§© Modular: Enable only the modules you need, customize as you grow
- š Real-time: Live data streaming for immediate operational insights
- š Smart: Intelligent data aggregation and trend analysis
š Transform your AI assistant into an intelligent operations engineer today!
Ready to revolutionize your operations workflow? Get started now ā