databricks-mcp-server

keruru-amuri/databricks-mcp-server

3.1

If you are the rightful owner of databricks-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Databricks MCP Server is a Docker-based server that integrates Databricks CLI and SDK, enabling AI assistants to interact with Databricks workspaces through a unified interface.

Tools
5
Resources
0
Prompts
0

Databricks MCP Server

A Docker-based Model Context Protocol (MCP) server that provides comprehensive Databricks integration by combining both the Databricks CLI and SDK. This server enables AI assistants to interact with Databricks workspaces through a unified interface that intelligently chooses between CLI commands and SDK operations based on the task requirements.

🚀 Key Features

🐳 Docker-First Architecture

  • Containerized deployment for consistent environments
  • Pre-configured with Databricks CLI and SDK
  • Easy deployment to any container platform
  • Isolated dependencies and secure execution

🔄 Dual-Mode Operations

  • CLI Tools: Direct command-line interface for file operations, authentication, and workspace management
  • SDK Tools: Programmatic access for complex queries, data operations, and automation
  • Hybrid Tools: Intelligent selection between CLI and SDK based on operation type

🖥️ Cluster Management

  • List, start, stop, and restart clusters
  • Get detailed cluster status and configuration
  • Automatic cluster lifecycle management
  • Real-time cluster monitoring

📝 Workspace Operations

  • Create, read, update, and delete notebooks and files
  • Execute code on Databricks clusters
  • Support for Python, SQL, Scala, and R notebooks
  • Workspace file system navigation

🗄️ Unity Catalog Integration

  • Browse catalogs, schemas, and tables
  • Execute SQL queries on SQL warehouses
  • Get detailed table metadata and schema information
  • Data lineage and governance operations

🤖 ML & MLflow Support

  • List experiments and registered models
  • Get model versions and serving endpoints
  • Access MLflow tracking and model registry
  • Model deployment and monitoring

Job & Pipeline Management

  • Create and run notebooks as jobs
  • Monitor job execution and get results
  • Delta Live Tables pipeline operations
  • Workflow orchestration

🚀 Quick Start

Prerequisites

  • Python 3.11+
  • Databricks workspace access
  • Personal Access Token or Service Principal credentials

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd databricks-mcp-server
    
  2. Create virtual environment:

    python -m venv .venv
    .venv\Scripts\activate  # Windows
    # source .venv/bin/activate  # Linux/macOS
    
  3. Install dependencies:

    pip install -e .
    
  4. Configure credentials: Set your credentials in the MCP client configuration (see Usage section below)

Usage with MCP Clients

Claude Desktop Configuration

After installing the package (pip install -e .), add this to your Claude Desktop MCP servers configuration:

{
  "mcpServers": {
    "databricks": {
      "command": "databricks-mcp-server",
      "env": {
        "DATABRICKS_HOST": "https://your-workspace.cloud.databricks.com",
        "DATABRICKS_TOKEN": "your-databricks-token-here",
        "DATABRICKS_MCP_PORT": "4040",
        "DATABRICKS_MCP_HOST": "0.0.0.0"
      }
    }
  }
}

Replace the placeholder values:

  • your-workspace.cloud.databricks.com → Your actual Databricks workspace URL
  • your-databricks-token-here → Your actual Databricks Personal Access Token
Direct Command Line Usage
# After package installation
databricks-mcp-server

# Alternative
python -m databricks_mcp_server

⚙️ Configuration

MCP Client Configuration

The server gets configuration from environment variables set in your MCP client configuration. No .env file is needed - all configuration is done through the MCP client JSON.

Required Environment Variables:
  • DATABRICKS_HOST - Your Databricks workspace URL
  • DATABRICKS_TOKEN - Your Databricks Personal Access Token
Optional Environment Variables:
  • DATABRICKS_MCP_PORT - Server port (default: 3000)
  • DATABRICKS_MCP_HOST - Server host (default: 0.0.0.0)
  • DATABRICKS_MCP_LOG_LEVEL - Log level (default: INFO)
  • DATABRICKS_CLUSTER_ID - Default cluster for execution
  • DATABRICKS_WAREHOUSE_ID - Default SQL warehouse
Alternative Authentication (Optional):
  • DATABRICKS_CLIENT_ID - OAuth client ID
  • DATABRICKS_CLIENT_SECRET - OAuth client secret
  • DATABRICKS_AZURE_TENANT_ID - Azure AD tenant ID

MCP Client Configuration Examples

For Claude Desktop (Python Package)

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "databricks": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "--env-file", "/path/to/.env",
        "databricks-mcp-server:latest"
      ]
    }
  }
}
For VS Code with MCP Extension (Docker)

Add to your MCP configuration:

{
  "mcpServers": {
    "databricks": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "-e", "DATABRICKS_HOST=https://your-workspace.cloud.databricks.com",
        "-e", "DATABRICKS_TOKEN=your-pat-token",
        "databricks-mcp-server:latest"
      ]
    }
  }
}
For Development (Local Python)
{
  "mcpServers": {
    "databricks": {
      "command": "uvx",
      "args": ["databricks-mcp-server"],
      "env": {
        "DATABRICKS_HOST": "https://your-workspace.cloud.databricks.com",
        "DATABRICKS_TOKEN": "your-pat-token"
      }
    }
  }
}

📖 Usage Examples

Cluster Management

# List all clusters (uses SDK for detailed info)
clusters = await cluster_list()

# Get cluster status (hybrid: CLI for quick status, SDK for details)
status = await cluster_get_status(cluster_id="1234-567890-abc123")

# Start a cluster (SDK for programmatic control)
result = await cluster_start(cluster_id="1234-567890-abc123", wait=True)

# Monitor cluster events (SDK streaming)
events = await cluster_get_events(cluster_id="1234-567890-abc123")

Workspace Operations

# List workspace files (CLI for file system operations)
files = await workspace_list(path="/Users/your-email@company.com")

# Create a new notebook (CLI for file operations)
await workspace_create_notebook(
    path="/Users/your-email@company.com/my-notebook",
    content="print('Hello, Databricks!')",
    language="PYTHON"
)

# Execute code on a cluster (SDK for execution control)
result = await cluster_execute_code(
    cluster_id="1234-567890-abc123",
    code="spark.sql('SELECT 1').show()",
    language="python"
)

# Upload files to workspace (CLI for file uploads)
await workspace_upload_file(
    local_path="./data.csv",
    workspace_path="/FileStore/shared_uploads/data.csv"
)

Job Management

# Create and run a notebook job (hybrid approach)
job_run = await job_run_notebook(
    notebook_path="/Users/your-email@company.com/my-notebook",
    cluster_id="1234-567890-abc123",
    parameters={"param1": "value1"},
    wait=True
)

# Monitor job progress (SDK for real-time updates)
status = await job_get_run_status(run_id=job_run["run_id"])

# Get job output (CLI for log retrieval)
output = await job_get_run_output(run_id=job_run["run_id"])

Unity Catalog

# List catalogs (SDK for metadata operations)
catalogs = await catalog_list_catalogs()

# Browse schema and tables (SDK for structured queries)
tables = await catalog_list_tables(catalog_name="main", schema_name="default")

# Execute SQL query (SDK for query execution)
result = await catalog_execute_sql(
    warehouse_id="abc123def456",
    query="SELECT COUNT(*) FROM main.default.my_table"
)

# Get table lineage (SDK for governance features)
lineage = await catalog_get_table_lineage(
    catalog_name="main",
    schema_name="default",
    table_name="my_table"
)

ML & MLflow Operations

# List MLflow experiments (SDK for ML operations)
experiments = await ml_list_experiments()

# Create and log to experiment (SDK for tracking)
run = await ml_create_run(
    experiment_id="123456",
    run_name="my-experiment-run"
)

# Register model (SDK for model registry)
model = await ml_register_model(
    name="my-model",
    source="runs:/run-id/model"
)

# Deploy model to serving endpoint (SDK for deployment)
endpoint = await ml_create_serving_endpoint(
    name="my-model-endpoint",
    model_name="my-model",
    model_version="1"
)

🛠️ Available Tools

🖥️ Cluster Management Tools

  • cluster_list - List all clusters with detailed information (SDK)
  • cluster_get_status - Get real-time cluster status (Hybrid)
  • cluster_start - Start a cluster and wait for ready state (SDK)
  • cluster_restart - Restart a cluster with optional configuration updates (SDK)
  • cluster_terminate - Terminate a cluster (SDK)
  • cluster_get_events - Get cluster event logs (SDK)
  • cluster_resize - Resize cluster (add/remove nodes) (SDK)

📁 Workspace Management Tools

  • workspace_list - List files and folders in workspace (CLI)
  • workspace_get_content - Get file/notebook content (CLI)
  • workspace_create_notebook - Create new notebook (CLI)
  • workspace_create_folder - Create workspace folder (CLI)
  • workspace_upload_file - Upload local file to workspace (CLI)
  • workspace_download_file - Download workspace file locally (CLI)
  • workspace_delete - Delete workspace file/folder (CLI)
  • workspace_move - Move/rename workspace items (CLI)

🔧 Code Execution Tools

  • cluster_execute_code - Execute code on cluster interactively (SDK)
  • cluster_execute_notebook - Execute entire notebook on cluster (SDK)
  • cluster_get_execution_context - Get or create execution context (SDK)
  • cluster_cancel_execution - Cancel running code execution (SDK)

⚡ Job Management Tools

  • job_list - List all jobs in workspace (SDK)
  • job_get - Get job configuration and details (SDK)
  • job_create - Create new job definition (SDK)
  • job_run_notebook - Run notebook as one-time job (SDK)
  • job_run_now - Trigger existing job run (SDK)
  • job_get_run_status - Get job run status and progress (SDK)
  • job_get_run_output - Get job run logs and output (CLI)
  • job_list_runs - List recent job runs (SDK)
  • job_cancel_run - Cancel running job (SDK)

🗄️ Unity Catalog Tools

  • catalog_list_catalogs - List available catalogs (SDK)
  • catalog_list_schemas - List schemas in catalog (SDK)
  • catalog_list_tables - List tables in schema (SDK)
  • catalog_get_table - Get table metadata and schema (SDK)
  • catalog_execute_sql - Execute SQL query on warehouse (SDK)
  • catalog_list_warehouses - List SQL warehouses (SDK)
  • catalog_get_table_lineage - Get data lineage information (SDK)
  • catalog_create_table - Create new table (SDK)
  • catalog_grant_permissions - Grant table/schema permissions (CLI)

🤖 ML & MLflow Tools

  • ml_list_experiments - List MLflow experiments (SDK)
  • ml_get_experiment - Get experiment details and runs (SDK)
  • ml_create_experiment - Create new experiment (SDK)
  • ml_create_run - Create new experiment run (SDK)
  • ml_log_metrics - Log metrics to run (SDK)
  • ml_log_artifacts - Log artifacts to run (SDK)
  • ml_list_models - List registered models (SDK)
  • ml_get_model - Get model details and versions (SDK)
  • ml_register_model - Register model from run (SDK)
  • ml_list_model_versions - List model versions (SDK)
  • ml_list_serving_endpoints - List model serving endpoints (SDK)
  • ml_create_serving_endpoint - Create serving endpoint (SDK)
  • ml_get_serving_endpoint - Get endpoint status and config (SDK)

🔍 Monitoring & Logging Tools

  • logs_get_cluster_logs - Get cluster driver/executor logs (CLI)
  • logs_get_job_logs - Get job run logs (CLI)
  • metrics_get_cluster_metrics - Get cluster performance metrics (SDK)
  • metrics_get_job_metrics - Get job execution metrics (SDK)

🏗️ Architecture & Development

Project Structure

databricks-mcp-server/
├── README.md                    # This file
├── pyproject.toml              # Python package configuration
├── Dockerfile                  # Docker container definition
├── docker-compose.yml          # Docker Compose configuration
├── .env.example                # Environment variables template
├── .gitignore                  # Git ignore patterns
├── databricks_mcp_server/      # Main package
│   ├── __init__.py             # Package initialization
│   ├── server.py               # FastMCP server implementation
│   ├── config.py               # Configuration management
│   ├── models.py               # Pydantic data models
│   └── tools/                  # Tool implementations
│       ├── __init__.py         # Tools package init
│       └── sdk_tools.py        # Reliable Databricks SDK tools
├── tests/                      # Test suite
│   ├── __init__.py
│   ├── test_sdk_tools.py
│   └── test_integration.py
└── scripts/                    # Utility scripts
    ├── setup.sh                # Development setup
    └── test.sh                 # Test runner

Tool Architecture

The server implements a reliable SDK-based architecture:

SDK Tools (sdk_tools.py)

  • Native Python SDK integration using the official databricks-sdk
  • Best for: All Databricks operations - reliable, fast, and production-ready
  • Advantages:
    • Type safety and structured responses
    • Async operations for better performance
    • No CLI dependencies or timeout issues
    • Consistent JSON responses with proper error handling
    • Direct API access for real-time data
  • Uses: databricks-sdk with async/await patterns and proper error handling

Note: CLI and hybrid tools have been removed due to timeout issues and function calling bugs. The SDK approach provides all necessary functionality with better reliability.

Development Setup

Local Development
# Clone repository
git clone <repository-url>
cd databricks-mcp-server

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Set up pre-commit hooks
pre-commit install
Docker Development
# Build development image
docker build -t databricks-mcp-server:dev .

# Run with development mount
docker run -it --rm \
  -v $(pwd):/app \
  -e DATABRICKS_HOST="your-host" \
  -e DATABRICKS_TOKEN="your-token" \
  databricks-mcp-server:dev

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=databricks_mcp_server

# Run specific test categories
pytest tests/test_cli_tools.py
pytest tests/test_sdk_tools.py
pytest tests/test_integration.py

# Run tests in Docker
docker run --rm \
  -e DATABRICKS_HOST="test-host" \
  -e DATABRICKS_TOKEN="test-token" \
  databricks-mcp-server:latest pytest

Code Quality

# Format code
ruff format .

# Lint code
ruff check .

# Type checking
mypy databricks_mcp_server/

# Security scanning
bandit -r databricks_mcp_server/

🔧 Troubleshooting

Common Issues

Authentication Problems
# Verify connection
docker run --rm \
  -e DATABRICKS_HOST="your-host" \
  -e DATABRICKS_TOKEN="your-token" \
  databricks-mcp-server:latest \
  databricks auth describe

# Test with CLI
databricks clusters list
Docker Issues
# Check container logs
docker logs databricks-mcp

# Debug container
docker run -it --rm \
  -e DATABRICKS_HOST="your-host" \
  -e DATABRICKS_TOKEN="your-token" \
  databricks-mcp-server:latest bash
Common Error Solutions
  1. Authentication Error:

    • Verify DATABRICKS_HOST format (include https://)
    • Check PAT token permissions and expiration
    • Ensure workspace access
  2. Cluster Not Found:

    • Verify cluster ID exists: databricks clusters list
    • Check cluster permissions
    • Ensure cluster is in correct workspace
  3. Permission Denied:

    • Review PAT token scopes
    • Check workspace admin settings
    • Verify Unity Catalog permissions
  4. Timeout Errors:

    • Increase timeout values in environment variables
    • Check network connectivity
    • Verify cluster startup time
  5. Docker Container Issues:

    • Ensure proper environment variable passing
    • Check Docker daemon status
    • Verify image build completed successfully

Debug Mode

Enable detailed logging:

# Environment variable
export DATABRICKS_MCP_LOG_LEVEL=DEBUG

# Docker run with debug
docker run --rm \
  -e DATABRICKS_MCP_LOG_LEVEL=DEBUG \
  -e DATABRICKS_HOST="your-host" \
  -e DATABRICKS_TOKEN="your-token" \
  databricks-mcp-server:latest

Health Checks

# Test MCP server health
curl http://localhost:3000/health

# Test Databricks connectivity
docker exec databricks-mcp databricks auth describe

🤝 Contributing

We welcome contributions! Please follow these steps:

Development Workflow

  1. Fork and Clone

    git clone https://github.com/your-username/databricks-mcp-server.git
    cd databricks-mcp-server
    
  2. Create Feature Branch

    git checkout -b feature/your-feature-name
    
  3. Set Up Development Environment

    python -m venv venv
    source venv/bin/activate
    pip install -e ".[dev]"
    pre-commit install
    
  4. Make Changes

    • Add new tools in appropriate files (cli_tools.py, sdk_tools.py, hybrid_tools.py)
    • Follow existing patterns and naming conventions
    • Add comprehensive docstrings and type hints
  5. Test Your Changes

    pytest
    ruff check .
    mypy databricks_mcp_server/
    
  6. Submit Pull Request

    • Ensure all tests pass
    • Update documentation if needed
    • Provide clear description of changes

Contribution Guidelines

  • Code Style: Follow PEP 8, use ruff for formatting
  • Testing: Add tests for new functionality
  • Documentation: Update README and docstrings
  • Type Hints: Use type annotations throughout
  • Error Handling: Implement proper error handling and logging

📄 License

This project is licensed under the MIT License - see the file for details.

🙏 Acknowledgments