databricks-mcp-server by octavioccl - MCP Server

Databricks MCP Server

A comprehensive FastMCP server that provides AI agents with powerful tools to interact with Databricks workspaces. Built with modern MCP best practices using individual @mcp.tool() decorated functions in a single, efficient server.

🚀 Architecture

This project uses a unified FastMCP server architecture with all tools implemented as individual @mcp.tool() decorated functions, providing:

35+ MCP Tools across 6 comprehensive categories
Single Entry Point: Simplified deployment and management
Async/Await Support: With event loop conflict handling for Docker environments
JSON Responses: Structured, consistent tool outputs
Thread-Safe: Concurrent tool execution support
Docker Ready: Optimized for containerized deployment with Poetry

✨ Features

🗄️ Catalog Management Tools (6 tools)

list_catalogs - Browse available data catalogs
list_schemas - Explore schemas within catalogs
list_tables - Discover tables and views
get_table_info - Get detailed table metadata and schema
search_tables - Find tables using pattern matching
generate_sql_query - AI-powered SQL generation from natural language

🔍 Advanced Query Execution (2 tools)

execute_query - Execute SQL queries with automatic LIMIT handling
execute_statement - Advanced SQL execution with parameters, catalogs, schemas, and timeout control

🖥️ Cluster Management (7 tools)

list_clusters - View all workspace clusters
get_cluster - Get detailed cluster information
create_cluster - Create new clusters with autoscaling
start_cluster - Start stopped clusters
terminate_cluster - Terminate running clusters
restart_cluster - Restart clusters for maintenance
resize_cluster - Dynamically resize cluster capacity

⚙️ Job Management (9 tools)

list_jobs - Browse all workspace jobs
get_job - Get detailed job configuration
run_job - Execute jobs with custom parameters
create_job - Create new job definitions
update_job - Modify existing jobs
delete_job - Remove job definitions
get_run - Get job run details and status
cancel_run - Cancel running job executions
list_runs - Browse job execution history

📓 Notebook Operations (7 tools)

list_notebooks - Browse workspace notebooks
get_notebook - Retrieve notebook metadata
export_notebook - Export in multiple formats (SOURCE, HTML, JUPYTER, DBC)
import_notebook - Import notebooks with base64 content
delete_notebook - Remove notebooks safely
create_directory - Create workspace directories
get_notebook_status - Check notebook availability

📁 DBFS File System (8 tools)

list_files - Browse DBFS directories
get_file - Download file contents (text/binary)
put_file - Upload files with base64 encoding
upload_large_file - Chunked upload for large files
delete_file - Remove files and directories
get_status - Get file/directory metadata
create_directory - Create DBFS directories
move_file - Move/rename files and directories

🛠️ Installation & Setup

Prerequisites

Python 3.8+
Databricks workspace access
Personal Access Token or Service Principal credentials

Quick Start

Install dependencies:

pip install fastmcp 'mcp[cli]' databricks-sdk

Set environment variables:

export DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
export DATABRICKS_TOKEN=your-personal-access-token
export DATABRICKS_SQL_WAREHOUSE_ID=your-warehouse-id  # optional

Run the server:

# Using the CLI script
./bin/databricks-mcp-server

# Or directly with Python
python src/databricks_mcp/servers/main.py

Claude Desktop Configuration

Add to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "databricks": {
      "command": "python",
      "args": ["/path/to/databricks-mcp-server/src/databricks_mcp/servers/main.py"],
      "env": {
        "DATABRICKS_HOST": "https://your-workspace.cloud.databricks.com",
        "DATABRICKS_TOKEN": "your-token-here"
      }
    }
  }
}

🐳 Docker Deployment

Using Docker Compose

Copy environment configuration:

cp config.env.example config.env
# Edit config.env with your Databricks credentials

Build and run:

docker-compose -f deploy/docker/docker-compose.yml up --build

Claude Desktop with Docker

{
  "mcpServers": {
    "databricks": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "--env-file", "/path/to/config.env",
        "databricks-mcp-server"
      ]
    }
  }
}

🔧 Configuration

Environment Variables

Required:

DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-personal-access-token

Optional:

DATABRICKS_SQL_WAREHOUSE_ID=your-sql-warehouse-id
DATABRICKS_DEFAULT_CATALOG=main
DATABRICKS_DEFAULT_SCHEMA=default
MCP_SERVER_NAME=databricks-mcp
MCP_LOG_LEVEL=INFO
MCP_ENABLE_QUERY_EXECUTION=true
MCP_ENABLE_NATURAL_LANGUAGE=true

Security Considerations

Store tokens securely (environment variables, not in code)
Use SQL Warehouse IDs for query execution (recommended)
Consider read-only access tokens for production use
Validate all SQL queries through built-in query validator

🧪 Testing & Development

Test Connection

./bin/databricks-mcp-server --test

Development Mode

# Start with debug logging
./bin/databricks-mcp-server --log DEBUG

# Test with MCP Inspector
npx @modelcontextprotocol/inspector python src/databricks_mcp/servers/main.py

Adding New Tools

Add your tool function to src/databricks_mcp/core/server_fastmcp.py
Use the @mcp.tool() decorator
Follow the established error handling pattern
Test with the MCP Inspector

Example:

@mcp.tool()
async def my_new_tool(param: str) -> str:
    """Description of what this tool does."""
    try:
        client = get_databricks_client()
        
        # Try async first, fall back to sync in thread if needed
        try:
            result = await client.some_operation(param)
        except RuntimeError as e:
            if "cannot be called from a running event loop" in str(e):
                logger.warning("Event loop conflict detected, running in separate thread")
                result = run_sync_in_thread(client.some_operation(param))
            else:
                raise
        
        return json.dumps(result, indent=2)
    except Exception as e:
        return json.dumps({
            "status": "error",
            "error": str(e)
        }, indent=2)

📊 Performance

Memory Usage: ~50-100MB per server instance
Startup Time: ~2-5 seconds (depending on Databricks connection)
Tool Execution: ~100-2000ms per tool (depending on operation)
Concurrent Requests: Thread-safe, supports multiple concurrent tool calls
Docker Overhead: Minimal, single process architecture

🔍 Troubleshooting

Common Issues

AsyncIO Event Loop Conflicts: The server automatically handles these by running operations in separate threads
Connection Timeouts: Check your DATABRICKS_HOST and DATABRICKS_TOKEN
Permission Errors: Ensure your token has appropriate workspace permissions
Docker Issues: Verify environment variables are properly passed to the container

Debug Mode

./bin/databricks-mcp-server --log DEBUG --test

Logs

The server provides comprehensive logging. Check logs for:

Connection status
Tool execution details
Error messages with suggested fixes
Performance metrics

📚 Documentation

- Detailed architecture overview
- Handling async event loops
- Container deployment
- Usage examples and demos

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes following the established patterns
Test your changes (python -m py_compile src/databricks_mcp/core/server_fastmcp.py)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the file for details.

🙏 Acknowledgments

FastMCP - Modern MCP server framework
Databricks SDK - Python SDK for Databricks
Model Context Protocol - Protocol specification
Anthropic - MCP protocol development

Built with ❤️ using FastMCP and the Databricks SDK

This server provides a comprehensive interface between AI agents and Databricks workspaces, enabling powerful data analysis, job management, and workspace automation through natural language interactions.

octavioccl/databricks-mcp-server

list_catalogs

execute_query

list_clusters

list_jobs

list_notebooks

list_files