druid-mcp

christian-schlichtherle/druid-mcp

3.2

If you are the rightful owner of druid-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

This repository provides an MCP server for read-only access to Apache Druid clusters for data analysis and monitoring.

Tools
  1. execute_sql_query

    Execute SQL queries on a specified cluster.

  2. list_datasources

    List all available datasources in a cluster.

  3. get_datasource_schema

    Retrieve dimensions and metrics for a datasource.

  4. list_supervisors

    List all supervisors with optional state information.

  5. get_cluster_status

    Get the overall health status of a cluster.

Druid MCP

This repository provides an MCP (Model Context Protocol) server for comprehensive read-only access to one or more Apache Druid clusters for ad-hoc data analysis, monitoring, troubleshooting, and comparison.

For example, if you have multiple environments, each with their own Druid cluster, but ingesting data from the same origin, then you could use a simple prompt like this to run a complex cross-cluster comparison:

Check and compare the datasource for SumUp across all Druid clusters.

The AI agent should then list the Druid clusters, list their datasources, explore their schemas, segments, tasks, and their data distribution all by itself, and ultimately present you a nice summary.

Overview

The Druid MCP server enables AI applications to interact with Apache Druid through the Model Context Protocol (MCP). It provides a standardized interface for querying and exploring Druid datasources, making it easy to integrate Druid data into AI workflows.

Features

  • Multi-Cluster Support: Connect to and query multiple Druid clusters simultaneously
  • Query Execution: SQL and native JSON queries
  • Datasource Management: List, explore, and inspect datasource schemas
  • Ingestion Monitoring: Supervisor and task status tracking
  • Cluster Operations: Service health monitoring and segment analysis
  • Lookup Management: Query and inspect lookup tables
  • Smart Analysis Prompts: Pre-built prompts for common data analysis tasks
  • Schema Caching: Efficient caching with 5-minute TTL for better performance in resources only

Prerequisites

  • Python 3.11+
  • Access to an Apache Druid cluster
  • MCP-compatible client (e.g., Claude Desktop)

Installation

Install dependencies using uv:

uv install

Or with pip:

pip install -e .

Configuration

Multi-Cluster Support

Configure multiple Druid clusters with whitespace-separated key=value pairs:

# Define available clusters (optional, defaults to "localhost=http://localhost:8088")
export DRUID_CLUSTER_URLS="localhost=http://localhost:8088 dev=https://druid.dev.example.com prod=https://druid.prod.example.com"

The cluster names can be any valid string and are used in:

  • Resource URIs (e.g., druid://dev/v2/datasources)
  • Tool parameters (e.g., execute_sql_query("dev", "SELECT ..."))
  • Cluster management commands

Usage

Development Mode

Run with auto-reload for development:

mcp dev main.py

Production Mode

Run the server in production:

mcp run main.py

Claude Desktop Integration

Install the server for use with Claude Desktop:

mcp install main.py

Available Tools

The MCP server exposes 16 tools organized by functionality:

Important: All tools (except cluster management tools) require an explicit cluster parameter as the first argument. This enables efficient multi-cluster operations without state management.

Query Execution

  • execute_sql_query(cluster, query, context): Execute SQL queries
  • execute_native_query(cluster, query): Execute native JSON queries (timeseries, topN, groupBy, etc.)

Datasource Operations

  • list_datasources(cluster, include_details): List all available datasources
  • get_datasource_schema(cluster, datasource): Get dimensions and metrics for a datasource

Ingestion Monitoring

  • list_supervisors(cluster, include_state): List all supervisors with optional state information
  • get_supervisor_status(cluster, supervisor_id): Get detailed supervisor status and health
  • list_tasks(cluster, ...): List tasks with filtering options
  • get_task_status(cluster, task_id): Get status of specific tasks

Cluster Management

  • get_cluster_status(cluster): Overall cluster health status
  • list_services(cluster, service_type): List active services by type
  • list_segments(cluster, datasource, full): List segments for a datasource
  • get_segments_info(cluster, datasource): Get aggregated segment statistics

Lookup Management

  • list_lookups(cluster, tier): List lookups by tier
  • get_lookup(cluster, lookup_id, tier): Get specific lookup configuration
  • get_lookup_status(cluster, lookup_id, tier): Get lookup loading status across nodes

Multi-Cluster Management

  • list_clusters(): List all configured Druid cluster names

Analysis Prompts

5 pre-built prompts with smart defaults (last month to now):

  • analyze_time_range: Analyze data within time periods
  • explore_datasource: Explore datasource structure and content
  • monitor_ingestion: Monitor ingestion health and progress
  • compare_periods: Compare metrics between time periods
  • data_quality_check: Perform comprehensive data quality checks

Examples

Basic Usage

# List datasources from production cluster
list_datasources("prod")

# Execute SQL query on development cluster
execute_sql_query("dev", "SELECT COUNT(*) FROM wikipedia WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' DAY")

# Get datasource schema from staging
get_datasource_schema("stage", "wikipedia")

Multi-Cluster Operations

# Compare data across environments
dev_count = execute_sql_query("dev", "SELECT COUNT(*) FROM orders")
prod_count = execute_sql_query("prod", "SELECT COUNT(*) FROM orders")

# Check cluster health across all environments
for cluster in ["dev", "stage", "prod"]:
    status = get_cluster_status(cluster)
    print(f"{cluster}: {status}")

# Monitor ingestion across clusters
list_supervisors("dev", include_state=True)
list_supervisors("prod", include_state=True)

Advanced Queries

# Native query with explicit cluster
execute_native_query("prod", {
    "queryType": "timeseries",
    "dataSource": "wikipedia",
    "intervals": ["2024-01-01/2024-01-02"],
    "granularity": "hour",
    "aggregations": [
        {"type": "count", "name": "edits"},
        {"type": "longSum", "name": "added", "fieldName": "added"}
    ]
})

# Complex filtering with multiple parameters
list_tasks("prod", 
    datasource="wikipedia",
    state="running",
    max_tasks=10
)

Resources

Druid API endpoints are exposed as MCP resources with caching and multi-cluster support:

Resource URI Format

druid://{cluster}/{path}

Available Resources

  • druid://localhost/v2/datasources - List all datasources
  • druid://localhost/v2/datasources/{name} - Get datasource schema
  • druid://localhost/coordinator/v1/datasources/{name} - Coordinator info
  • druid://localhost/coordinator/v1/metadata/datasources/{name}/segments - Segments
  • druid://localhost/overlord/v1/tasks - List tasks
  • druid://localhost/overlord/v1/supervisors - List supervisors
  • druid://localhost/status/health - Health check

Replace localhost with any configured cluster name to access different environments.

Security

This MCP server provides read-only access to your Druid cluster. No write operations are supported or allowed.

License

MIT