w3sqr/k8s-mcp-and-adk-agent
If you are the rightful owner of k8s-mcp-and-adk-agent and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The GKE Intelligent Monitoring System is a comprehensive solution for monitoring and managing Google Kubernetes Engine (GKE) clusters using AI-driven insights and the Model Context Protocol (MCP).
Kubernetes Management & GKE Monitoring MCP Server
Created for GKE Turns 10 Hackathon #GKETurns10 #GKEHackathon
Project Overview
The GKE Intelligent Monitoring System is an advanced monitoring and management solution that combines the power of Google Kubernetes Engine (GKE) with AI-driven insights using Google's AI Development Kit (ADK) and Model Context Protocol (MCP). This system provides intelligent monitoring, automated troubleshooting, and proactive management capabilities for GKE clusters.
Features
-
Intelligent Cluster Monitoring
- Real-time monitoring of GKE cluster health
- Pod lifecycle management and status tracking
- Service and deployment monitoring
- Resource utilization insights
-
AI-Powered Troubleshooting
- Automated problem detection and diagnosis
- Intelligent remediation suggestions
- Predictive analytics for potential issues
- Network connectivity analysis
-
Advanced Management Capabilities
- Dynamic scaling of deployments
- YAML manifest management
- Pod execution and log analysis
- Security context management
-
Integration Capabilities
- Seamless integration with GKE clusters
- Support for Prometheus metrics
- Cloud Monitoring integration
- Custom tooling support through MCP
-
Core tools
- Discoverable tool catalog via
GET /tools
on the MCP server. - Tool proxy endpoints
POST /tool/<name>
accepting JSON{ "args": [...], "kwargs": {...} }
and returning{ "ok": true, "result": ... }
or{ "ok": false, "error": "..." }
.
- Discoverable tool catalog via
-
Core tools included:
-
get_cluster_info
- Get basic cluster information, node status, and health -
list_pods
- List pods with status, resource usage, and readiness -
get_pod_logs
- Retrieve pod logs for troubleshooting -
describe_pod
- Get detailed pod information and events -
get_service_status
- Check service endpoints and networking -
get_deployment_status
- Monitor deployment health and replica status -
delete_resource
- Delete a Kubernetes resource (deployment, service, pod, etc.). -
suggest_troubleshooting
- AI-powered troubleshooting recommendations -
automate remediation
- Image pull remediation analysis for pod -
get_gke_cluster_metrics
- GKE-specific performance metrics -
scale deployment
- Scale a deployment to a specific number of replicas -
exec pod command
- Execute a command inside a pod container. -
network_connectivity_test
- Test network connectivity and DNS resolution
Technologies Used
- Google Kubernetes Engine (GKE)
- Model Context Protocol (MCP) server via
mcp.server.fastmcp
- Google ADK (
google.adk.agents.LlmAgent
) for the conversational agent - Python 3.11,
kubernetes
Python client,httpx
,requests
- Docker + Artifact Registry (or GCR) and Cloud Build for CI
kubectl
manifests and RBAC for in-cluster deployment
Data Sources & External Services
- Kubernetes API (in-cluster via ServiceAccount or kubeconfig)
- Google Cloud Project metadata when configured (GCP_PROJECT_ID, cluster name/zone)
- Optional metrics-server for resource metrics (node/pod top)
Architecture
The system consists of three main components:
- ADK Agent - Handles AI-driven interactions and decision-making
- MCP Server - Provides tooling and Kubernetes management capabilities
- GKE Integration - Direct interface with GKE clusters and resources
graph TB
subgraph "User Interaction"
UI[User/Client]
end
subgraph "ADK Layer"
ADK[ADK Agent]
FASTAPI[FastAPI Server]
end
subgraph "MCP Layer"
MCP[MCP Server]
Tools[K8s Tools]
Health[Health Check]
Monitor[Monitoring Tools]
end
subgraph "GKE Cluster"
API[Kubernetes API]
Pods[Pods]
Services[Services]
Deploy[Deployments]
end
subgraph "Google Cloud"
Monitoring[Cloud Monitoring]
Logging[Cloud Logging]
end
UI --> FASTAPI
FASTAPI --> ADK
ADK --> MCP
MCP --> Tools
MCP --> Health
MCP --> Monitor
Tools --> API
Health --> API
Monitor --> API
Monitor --> Monitoring
API --> Pods
API --> Services
API --> Deploy
Pods --> Logging
Services --> Logging
Deploy --> Logging
classDef gcp fill:#4285F4,stroke:#4285F4,color:white;
classDef k8s fill:#326CE5,stroke:#326CE5,color:white;
classDef adk fill:#34A853,stroke:#34A853,color:white;
classDef mcp fill:#EA4335,stroke:#EA4335,color:white;
class Monitoring,Logging gcp;
class API,Pods,Services,Deploy k8s;
class ADK,FASTAPI adk;
class MCP,Tools,Health,Monitor mcp;
How to Build & Deploy
- Build images (Cloud Build or docker build/push).
- Apply RBAC and Deployment manifests:
kubectl apply -f k8s-manifests/k8s-mcp-rbac.yaml
andk8s-manifests/k8s-mcp-deployment.yaml
. - Deploy ADK agent:
kubectl apply -f deployment.yaml
. - Monitor logs:
kubectl logs -f deployment/k8s-mcp-server
andkubectl logs -f deployment/adk-agent
.
Configuration
Environment Variables
Key environment variables:
GCP_PROJECT_ID
- Your Google Cloud project IDGKE_CLUSTER_NAME
- Target GKE cluster nameGKE_ZONE
- GKE cluster zone/regionMCP_SERVICE_URL
- MCP server endpoint
AI Model Configuration
The project uses Google's ADK LlmAgent
which internally uses the Gemini 2.0 Flash model through Vertex AI. Important points about the AI configuration:
- Vertex AI Authentication: The LlmAgent requires a Vertex AI API key which is NOT set through environment variables for security reasons. Instead, it should be configured through Google Cloud's secret management system.
Deployment
The system uses Kubernetes manifests for deployment, including:
- Deployment configurations
- RBAC settings
- Service accounts
- Network policies
Findings and Learnings
During the development of this project for the GKE Turns 10 Hackathon, several key insights were gained:
- GKE's robust API and integration capabilities make it an ideal platform for building intelligent monitoring solutions
- Combining ADK with Kubernetes operations enables sophisticated automation and decision-making
- MCP provides a flexible framework for extending monitoring capabilities
- Real-time monitoring with AI-driven insights can significantly improve cluster management
Future Enhancements
- Enhanced predictive analytics for resource scaling
- Machine learning models for anomaly detection
- Extended automation capabilities
- Integration with additional Google Cloud services
Contribution & Hackathon Note
This project was created and adapted for entry in the GKE Turns 10 Hackathon. Contributions and improvements are welcome. #GKEHackathon
License
MIT License