keruru-amuri/databricks-mcp-server
If you are the rightful owner of databricks-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Databricks MCP Server is a Docker-based server that integrates Databricks CLI and SDK, enabling AI assistants to interact with Databricks workspaces through a unified interface.
Databricks MCP Server
A Docker-based Model Context Protocol (MCP) server that provides comprehensive Databricks integration by combining both the Databricks CLI and SDK. This server enables AI assistants to interact with Databricks workspaces through a unified interface that intelligently chooses between CLI commands and SDK operations based on the task requirements.
🚀 Key Features
🐳 Docker-First Architecture
- Containerized deployment for consistent environments
- Pre-configured with Databricks CLI and SDK
- Easy deployment to any container platform
- Isolated dependencies and secure execution
🔄 Dual-Mode Operations
- CLI Tools: Direct command-line interface for file operations, authentication, and workspace management
- SDK Tools: Programmatic access for complex queries, data operations, and automation
- Hybrid Tools: Intelligent selection between CLI and SDK based on operation type
🖥️ Cluster Management
- List, start, stop, and restart clusters
- Get detailed cluster status and configuration
- Automatic cluster lifecycle management
- Real-time cluster monitoring
📝 Workspace Operations
- Create, read, update, and delete notebooks and files
- Execute code on Databricks clusters
- Support for Python, SQL, Scala, and R notebooks
- Workspace file system navigation
🗄️ Unity Catalog Integration
- Browse catalogs, schemas, and tables
- Execute SQL queries on SQL warehouses
- Get detailed table metadata and schema information
- Data lineage and governance operations
🤖 ML & MLflow Support
- List experiments and registered models
- Get model versions and serving endpoints
- Access MLflow tracking and model registry
- Model deployment and monitoring
⚡ Job & Pipeline Management
- Create and run notebooks as jobs
- Monitor job execution and get results
- Delta Live Tables pipeline operations
- Workflow orchestration
🚀 Quick Start
Prerequisites
- Python 3.11+
- Databricks workspace access
- Personal Access Token or Service Principal credentials
Installation
-
Clone the repository:
git clone <repository-url> cd databricks-mcp-server -
Create virtual environment:
python -m venv .venv .venv\Scripts\activate # Windows # source .venv/bin/activate # Linux/macOS -
Install dependencies:
pip install -e . -
Configure credentials: Set your credentials in the MCP client configuration (see Usage section below)
Usage with MCP Clients
Claude Desktop Configuration
After installing the package (pip install -e .), add this to your Claude Desktop MCP servers configuration:
{
"mcpServers": {
"databricks": {
"command": "databricks-mcp-server",
"env": {
"DATABRICKS_HOST": "https://your-workspace.cloud.databricks.com",
"DATABRICKS_TOKEN": "your-databricks-token-here",
"DATABRICKS_MCP_PORT": "4040",
"DATABRICKS_MCP_HOST": "0.0.0.0"
}
}
}
}
Replace the placeholder values:
your-workspace.cloud.databricks.com→ Your actual Databricks workspace URLyour-databricks-token-here→ Your actual Databricks Personal Access Token
Direct Command Line Usage
# After package installation
databricks-mcp-server
# Alternative
python -m databricks_mcp_server
⚙️ Configuration
MCP Client Configuration
The server gets configuration from environment variables set in your MCP client configuration. No .env file is needed - all configuration is done through the MCP client JSON.
Required Environment Variables:
DATABRICKS_HOST- Your Databricks workspace URLDATABRICKS_TOKEN- Your Databricks Personal Access Token
Optional Environment Variables:
DATABRICKS_MCP_PORT- Server port (default: 3000)DATABRICKS_MCP_HOST- Server host (default: 0.0.0.0)DATABRICKS_MCP_LOG_LEVEL- Log level (default: INFO)DATABRICKS_CLUSTER_ID- Default cluster for executionDATABRICKS_WAREHOUSE_ID- Default SQL warehouse
Alternative Authentication (Optional):
DATABRICKS_CLIENT_ID- OAuth client IDDATABRICKS_CLIENT_SECRET- OAuth client secretDATABRICKS_AZURE_TENANT_ID- Azure AD tenant ID
MCP Client Configuration Examples
For Claude Desktop (Python Package)
Add to your claude_desktop_config.json:
{
"mcpServers": {
"databricks": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"--env-file", "/path/to/.env",
"databricks-mcp-server:latest"
]
}
}
}
For VS Code with MCP Extension (Docker)
Add to your MCP configuration:
{
"mcpServers": {
"databricks": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"-e", "DATABRICKS_HOST=https://your-workspace.cloud.databricks.com",
"-e", "DATABRICKS_TOKEN=your-pat-token",
"databricks-mcp-server:latest"
]
}
}
}
For Development (Local Python)
{
"mcpServers": {
"databricks": {
"command": "uvx",
"args": ["databricks-mcp-server"],
"env": {
"DATABRICKS_HOST": "https://your-workspace.cloud.databricks.com",
"DATABRICKS_TOKEN": "your-pat-token"
}
}
}
}
📖 Usage Examples
Cluster Management
# List all clusters (uses SDK for detailed info)
clusters = await cluster_list()
# Get cluster status (hybrid: CLI for quick status, SDK for details)
status = await cluster_get_status(cluster_id="1234-567890-abc123")
# Start a cluster (SDK for programmatic control)
result = await cluster_start(cluster_id="1234-567890-abc123", wait=True)
# Monitor cluster events (SDK streaming)
events = await cluster_get_events(cluster_id="1234-567890-abc123")
Workspace Operations
# List workspace files (CLI for file system operations)
files = await workspace_list(path="/Users/your-email@company.com")
# Create a new notebook (CLI for file operations)
await workspace_create_notebook(
path="/Users/your-email@company.com/my-notebook",
content="print('Hello, Databricks!')",
language="PYTHON"
)
# Execute code on a cluster (SDK for execution control)
result = await cluster_execute_code(
cluster_id="1234-567890-abc123",
code="spark.sql('SELECT 1').show()",
language="python"
)
# Upload files to workspace (CLI for file uploads)
await workspace_upload_file(
local_path="./data.csv",
workspace_path="/FileStore/shared_uploads/data.csv"
)
Job Management
# Create and run a notebook job (hybrid approach)
job_run = await job_run_notebook(
notebook_path="/Users/your-email@company.com/my-notebook",
cluster_id="1234-567890-abc123",
parameters={"param1": "value1"},
wait=True
)
# Monitor job progress (SDK for real-time updates)
status = await job_get_run_status(run_id=job_run["run_id"])
# Get job output (CLI for log retrieval)
output = await job_get_run_output(run_id=job_run["run_id"])
Unity Catalog
# List catalogs (SDK for metadata operations)
catalogs = await catalog_list_catalogs()
# Browse schema and tables (SDK for structured queries)
tables = await catalog_list_tables(catalog_name="main", schema_name="default")
# Execute SQL query (SDK for query execution)
result = await catalog_execute_sql(
warehouse_id="abc123def456",
query="SELECT COUNT(*) FROM main.default.my_table"
)
# Get table lineage (SDK for governance features)
lineage = await catalog_get_table_lineage(
catalog_name="main",
schema_name="default",
table_name="my_table"
)
ML & MLflow Operations
# List MLflow experiments (SDK for ML operations)
experiments = await ml_list_experiments()
# Create and log to experiment (SDK for tracking)
run = await ml_create_run(
experiment_id="123456",
run_name="my-experiment-run"
)
# Register model (SDK for model registry)
model = await ml_register_model(
name="my-model",
source="runs:/run-id/model"
)
# Deploy model to serving endpoint (SDK for deployment)
endpoint = await ml_create_serving_endpoint(
name="my-model-endpoint",
model_name="my-model",
model_version="1"
)
🛠️ Available Tools
🖥️ Cluster Management Tools
cluster_list- List all clusters with detailed information (SDK)cluster_get_status- Get real-time cluster status (Hybrid)cluster_start- Start a cluster and wait for ready state (SDK)cluster_restart- Restart a cluster with optional configuration updates (SDK)cluster_terminate- Terminate a cluster (SDK)cluster_get_events- Get cluster event logs (SDK)cluster_resize- Resize cluster (add/remove nodes) (SDK)
📁 Workspace Management Tools
workspace_list- List files and folders in workspace (CLI)workspace_get_content- Get file/notebook content (CLI)workspace_create_notebook- Create new notebook (CLI)workspace_create_folder- Create workspace folder (CLI)workspace_upload_file- Upload local file to workspace (CLI)workspace_download_file- Download workspace file locally (CLI)workspace_delete- Delete workspace file/folder (CLI)workspace_move- Move/rename workspace items (CLI)
🔧 Code Execution Tools
cluster_execute_code- Execute code on cluster interactively (SDK)cluster_execute_notebook- Execute entire notebook on cluster (SDK)cluster_get_execution_context- Get or create execution context (SDK)cluster_cancel_execution- Cancel running code execution (SDK)
⚡ Job Management Tools
job_list- List all jobs in workspace (SDK)job_get- Get job configuration and details (SDK)job_create- Create new job definition (SDK)job_run_notebook- Run notebook as one-time job (SDK)job_run_now- Trigger existing job run (SDK)job_get_run_status- Get job run status and progress (SDK)job_get_run_output- Get job run logs and output (CLI)job_list_runs- List recent job runs (SDK)job_cancel_run- Cancel running job (SDK)
🗄️ Unity Catalog Tools
catalog_list_catalogs- List available catalogs (SDK)catalog_list_schemas- List schemas in catalog (SDK)catalog_list_tables- List tables in schema (SDK)catalog_get_table- Get table metadata and schema (SDK)catalog_execute_sql- Execute SQL query on warehouse (SDK)catalog_list_warehouses- List SQL warehouses (SDK)catalog_get_table_lineage- Get data lineage information (SDK)catalog_create_table- Create new table (SDK)catalog_grant_permissions- Grant table/schema permissions (CLI)
🤖 ML & MLflow Tools
ml_list_experiments- List MLflow experiments (SDK)ml_get_experiment- Get experiment details and runs (SDK)ml_create_experiment- Create new experiment (SDK)ml_create_run- Create new experiment run (SDK)ml_log_metrics- Log metrics to run (SDK)ml_log_artifacts- Log artifacts to run (SDK)ml_list_models- List registered models (SDK)ml_get_model- Get model details and versions (SDK)ml_register_model- Register model from run (SDK)ml_list_model_versions- List model versions (SDK)ml_list_serving_endpoints- List model serving endpoints (SDK)ml_create_serving_endpoint- Create serving endpoint (SDK)ml_get_serving_endpoint- Get endpoint status and config (SDK)
🔍 Monitoring & Logging Tools
logs_get_cluster_logs- Get cluster driver/executor logs (CLI)logs_get_job_logs- Get job run logs (CLI)metrics_get_cluster_metrics- Get cluster performance metrics (SDK)metrics_get_job_metrics- Get job execution metrics (SDK)
🏗️ Architecture & Development
Project Structure
databricks-mcp-server/
├── README.md # This file
├── pyproject.toml # Python package configuration
├── Dockerfile # Docker container definition
├── docker-compose.yml # Docker Compose configuration
├── .env.example # Environment variables template
├── .gitignore # Git ignore patterns
├── databricks_mcp_server/ # Main package
│ ├── __init__.py # Package initialization
│ ├── server.py # FastMCP server implementation
│ ├── config.py # Configuration management
│ ├── models.py # Pydantic data models
│ └── tools/ # Tool implementations
│ ├── __init__.py # Tools package init
│ └── sdk_tools.py # Reliable Databricks SDK tools
├── tests/ # Test suite
│ ├── __init__.py
│ ├── test_sdk_tools.py
│ └── test_integration.py
└── scripts/ # Utility scripts
├── setup.sh # Development setup
└── test.sh # Test runner
Tool Architecture
The server implements a reliable SDK-based architecture:
SDK Tools (sdk_tools.py)
- Native Python SDK integration using the official
databricks-sdk - Best for: All Databricks operations - reliable, fast, and production-ready
- Advantages:
- Type safety and structured responses
- Async operations for better performance
- No CLI dependencies or timeout issues
- Consistent JSON responses with proper error handling
- Direct API access for real-time data
- Uses:
databricks-sdkwith async/await patterns and proper error handling
Note: CLI and hybrid tools have been removed due to timeout issues and function calling bugs. The SDK approach provides all necessary functionality with better reliability.
Development Setup
Local Development
# Clone repository
git clone <repository-url>
cd databricks-mcp-server
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Set up pre-commit hooks
pre-commit install
Docker Development
# Build development image
docker build -t databricks-mcp-server:dev .
# Run with development mount
docker run -it --rm \
-v $(pwd):/app \
-e DATABRICKS_HOST="your-host" \
-e DATABRICKS_TOKEN="your-token" \
databricks-mcp-server:dev
Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=databricks_mcp_server
# Run specific test categories
pytest tests/test_cli_tools.py
pytest tests/test_sdk_tools.py
pytest tests/test_integration.py
# Run tests in Docker
docker run --rm \
-e DATABRICKS_HOST="test-host" \
-e DATABRICKS_TOKEN="test-token" \
databricks-mcp-server:latest pytest
Code Quality
# Format code
ruff format .
# Lint code
ruff check .
# Type checking
mypy databricks_mcp_server/
# Security scanning
bandit -r databricks_mcp_server/
🔧 Troubleshooting
Common Issues
Authentication Problems
# Verify connection
docker run --rm \
-e DATABRICKS_HOST="your-host" \
-e DATABRICKS_TOKEN="your-token" \
databricks-mcp-server:latest \
databricks auth describe
# Test with CLI
databricks clusters list
Docker Issues
# Check container logs
docker logs databricks-mcp
# Debug container
docker run -it --rm \
-e DATABRICKS_HOST="your-host" \
-e DATABRICKS_TOKEN="your-token" \
databricks-mcp-server:latest bash
Common Error Solutions
-
Authentication Error:
- Verify
DATABRICKS_HOSTformat (includehttps://) - Check PAT token permissions and expiration
- Ensure workspace access
- Verify
-
Cluster Not Found:
- Verify cluster ID exists:
databricks clusters list - Check cluster permissions
- Ensure cluster is in correct workspace
- Verify cluster ID exists:
-
Permission Denied:
- Review PAT token scopes
- Check workspace admin settings
- Verify Unity Catalog permissions
-
Timeout Errors:
- Increase timeout values in environment variables
- Check network connectivity
- Verify cluster startup time
-
Docker Container Issues:
- Ensure proper environment variable passing
- Check Docker daemon status
- Verify image build completed successfully
Debug Mode
Enable detailed logging:
# Environment variable
export DATABRICKS_MCP_LOG_LEVEL=DEBUG
# Docker run with debug
docker run --rm \
-e DATABRICKS_MCP_LOG_LEVEL=DEBUG \
-e DATABRICKS_HOST="your-host" \
-e DATABRICKS_TOKEN="your-token" \
databricks-mcp-server:latest
Health Checks
# Test MCP server health
curl http://localhost:3000/health
# Test Databricks connectivity
docker exec databricks-mcp databricks auth describe
🤝 Contributing
We welcome contributions! Please follow these steps:
Development Workflow
-
Fork and Clone
git clone https://github.com/your-username/databricks-mcp-server.git cd databricks-mcp-server -
Create Feature Branch
git checkout -b feature/your-feature-name -
Set Up Development Environment
python -m venv venv source venv/bin/activate pip install -e ".[dev]" pre-commit install -
Make Changes
- Add new tools in appropriate files (
cli_tools.py,sdk_tools.py,hybrid_tools.py) - Follow existing patterns and naming conventions
- Add comprehensive docstrings and type hints
- Add new tools in appropriate files (
-
Test Your Changes
pytest ruff check . mypy databricks_mcp_server/ -
Submit Pull Request
- Ensure all tests pass
- Update documentation if needed
- Provide clear description of changes
Contribution Guidelines
- Code Style: Follow PEP 8, use
rufffor formatting - Testing: Add tests for new functionality
- Documentation: Update README and docstrings
- Type Hints: Use type annotations throughout
- Error Handling: Implement proper error handling and logging
📄 License
This project is licensed under the MIT License - see the file for details.
🙏 Acknowledgments
- Databricks for the excellent CLI and SDK
- Model Context Protocol for the MCP specification
- FastMCP for the server framework
- The open-source community for inspiration and contributions