christian-schlichtherle/druid-mcp
If you are the rightful owner of druid-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
This repository provides an MCP server for read-only access to Apache Druid clusters for data analysis and monitoring.
execute_sql_query
Execute SQL queries on a specified cluster.
list_datasources
List all available datasources in a cluster.
get_datasource_schema
Retrieve dimensions and metrics for a datasource.
list_supervisors
List all supervisors with optional state information.
get_cluster_status
Get the overall health status of a cluster.
Druid MCP
This repository provides an MCP (Model Context Protocol) server for comprehensive read-only access to one or more Apache Druid clusters for ad-hoc data analysis, monitoring, troubleshooting, and comparison.
For example, if you have multiple environments, each with their own Druid cluster, but ingesting data from the same origin, then you could use a simple prompt like this to run a complex cross-cluster comparison:
Check and compare the datasource for SumUp across all Druid clusters.
The AI agent should then list the Druid clusters, list their datasources, explore their schemas, segments, tasks, and their data distribution all by itself, and ultimately present you a nice summary.
Overview
The Druid MCP server enables AI applications to interact with Apache Druid through the Model Context Protocol (MCP). It provides a standardized interface for querying and exploring Druid datasources, making it easy to integrate Druid data into AI workflows.
Features
- Multi-Cluster Support: Connect to and query multiple Druid clusters simultaneously
- Query Execution: SQL and native JSON queries
- Datasource Management: List, explore, and inspect datasource schemas
- Ingestion Monitoring: Supervisor and task status tracking
- Cluster Operations: Service health monitoring and segment analysis
- Lookup Management: Query and inspect lookup tables
- Smart Analysis Prompts: Pre-built prompts for common data analysis tasks
- Schema Caching: Efficient caching with 5-minute TTL for better performance in resources only
Prerequisites
- Python 3.11+
- Access to an Apache Druid cluster
- MCP-compatible client (e.g., Claude Desktop)
Installation
Install dependencies using uv:
uv install
Or with pip:
pip install -e .
Configuration
Multi-Cluster Support
Configure multiple Druid clusters with whitespace-separated key=value pairs:
# Define available clusters (optional, defaults to "localhost=http://localhost:8088")
export DRUID_CLUSTER_URLS="localhost=http://localhost:8088 dev=https://druid.dev.example.com prod=https://druid.prod.example.com"
The cluster names can be any valid string and are used in:
- Resource URIs (e.g.,
druid://dev/v2/datasources
) - Tool parameters (e.g.,
execute_sql_query("dev", "SELECT ...")
) - Cluster management commands
Usage
Development Mode
Run with auto-reload for development:
mcp dev main.py
Production Mode
Run the server in production:
mcp run main.py
Claude Desktop Integration
Install the server for use with Claude Desktop:
mcp install main.py
Available Tools
The MCP server exposes 16 tools organized by functionality:
Important: All tools (except cluster management tools) require an explicit cluster
parameter as the first argument. This enables efficient multi-cluster operations without state management.
Query Execution
execute_sql_query(cluster, query, context)
: Execute SQL queriesexecute_native_query(cluster, query)
: Execute native JSON queries (timeseries, topN, groupBy, etc.)
Datasource Operations
list_datasources(cluster, include_details)
: List all available datasourcesget_datasource_schema(cluster, datasource)
: Get dimensions and metrics for a datasource
Ingestion Monitoring
list_supervisors(cluster, include_state)
: List all supervisors with optional state informationget_supervisor_status(cluster, supervisor_id)
: Get detailed supervisor status and healthlist_tasks(cluster, ...)
: List tasks with filtering optionsget_task_status(cluster, task_id)
: Get status of specific tasks
Cluster Management
get_cluster_status(cluster)
: Overall cluster health statuslist_services(cluster, service_type)
: List active services by typelist_segments(cluster, datasource, full)
: List segments for a datasourceget_segments_info(cluster, datasource)
: Get aggregated segment statistics
Lookup Management
list_lookups(cluster, tier)
: List lookups by tierget_lookup(cluster, lookup_id, tier)
: Get specific lookup configurationget_lookup_status(cluster, lookup_id, tier)
: Get lookup loading status across nodes
Multi-Cluster Management
list_clusters()
: List all configured Druid cluster names
Analysis Prompts
5 pre-built prompts with smart defaults (last month to now):
analyze_time_range
: Analyze data within time periodsexplore_datasource
: Explore datasource structure and contentmonitor_ingestion
: Monitor ingestion health and progresscompare_periods
: Compare metrics between time periodsdata_quality_check
: Perform comprehensive data quality checks
Examples
Basic Usage
# List datasources from production cluster
list_datasources("prod")
# Execute SQL query on development cluster
execute_sql_query("dev", "SELECT COUNT(*) FROM wikipedia WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' DAY")
# Get datasource schema from staging
get_datasource_schema("stage", "wikipedia")
Multi-Cluster Operations
# Compare data across environments
dev_count = execute_sql_query("dev", "SELECT COUNT(*) FROM orders")
prod_count = execute_sql_query("prod", "SELECT COUNT(*) FROM orders")
# Check cluster health across all environments
for cluster in ["dev", "stage", "prod"]:
status = get_cluster_status(cluster)
print(f"{cluster}: {status}")
# Monitor ingestion across clusters
list_supervisors("dev", include_state=True)
list_supervisors("prod", include_state=True)
Advanced Queries
# Native query with explicit cluster
execute_native_query("prod", {
"queryType": "timeseries",
"dataSource": "wikipedia",
"intervals": ["2024-01-01/2024-01-02"],
"granularity": "hour",
"aggregations": [
{"type": "count", "name": "edits"},
{"type": "longSum", "name": "added", "fieldName": "added"}
]
})
# Complex filtering with multiple parameters
list_tasks("prod",
datasource="wikipedia",
state="running",
max_tasks=10
)
Resources
Druid API endpoints are exposed as MCP resources with caching and multi-cluster support:
Resource URI Format
druid://{cluster}/{path}
Available Resources
druid://localhost/v2/datasources
- List all datasourcesdruid://localhost/v2/datasources/{name}
- Get datasource schemadruid://localhost/coordinator/v1/datasources/{name}
- Coordinator infodruid://localhost/coordinator/v1/metadata/datasources/{name}/segments
- Segmentsdruid://localhost/overlord/v1/tasks
- List tasksdruid://localhost/overlord/v1/supervisors
- List supervisorsdruid://localhost/status/health
- Health check
Replace localhost
with any configured cluster name to access different environments.
Security
This MCP server provides read-only access to your Druid cluster. No write operations are supported or allowed.
License
MIT