octavioccl/databricks-mcp-server
If you are the rightful owner of databricks-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The FastMCP server for Databricks provides AI agents with a robust interface to interact with Databricks workspaces, enabling efficient data management and automation.
list_catalogs
Browse available data catalogs
execute_query
Execute SQL queries with automatic LIMIT handling
list_clusters
View all workspace clusters
list_jobs
Browse all workspace jobs
list_notebooks
Browse workspace notebooks
list_files
Browse DBFS directories
Databricks MCP Server
A comprehensive FastMCP server that provides AI agents with powerful tools to interact with Databricks workspaces. Built with modern MCP best practices using individual @mcp.tool()
decorated functions in a single, efficient server.
๐ Architecture
This project uses a unified FastMCP server architecture with all tools implemented as individual @mcp.tool()
decorated functions, providing:
- 35+ MCP Tools across 6 comprehensive categories
- Single Entry Point: Simplified deployment and management
- Async/Await Support: With event loop conflict handling for Docker environments
- JSON Responses: Structured, consistent tool outputs
- Thread-Safe: Concurrent tool execution support
- Docker Ready: Optimized for containerized deployment with Poetry
โจ Features
๐๏ธ Catalog Management Tools (6 tools)
list_catalogs
- Browse available data catalogslist_schemas
- Explore schemas within catalogslist_tables
- Discover tables and viewsget_table_info
- Get detailed table metadata and schemasearch_tables
- Find tables using pattern matchinggenerate_sql_query
- AI-powered SQL generation from natural language
๐ Advanced Query Execution (2 tools)
execute_query
- Execute SQL queries with automatic LIMIT handlingexecute_statement
- Advanced SQL execution with parameters, catalogs, schemas, and timeout control
๐ฅ๏ธ Cluster Management (7 tools)
list_clusters
- View all workspace clustersget_cluster
- Get detailed cluster informationcreate_cluster
- Create new clusters with autoscalingstart_cluster
- Start stopped clustersterminate_cluster
- Terminate running clustersrestart_cluster
- Restart clusters for maintenanceresize_cluster
- Dynamically resize cluster capacity
โ๏ธ Job Management (9 tools)
list_jobs
- Browse all workspace jobsget_job
- Get detailed job configurationrun_job
- Execute jobs with custom parameterscreate_job
- Create new job definitionsupdate_job
- Modify existing jobsdelete_job
- Remove job definitionsget_run
- Get job run details and statuscancel_run
- Cancel running job executionslist_runs
- Browse job execution history
๐ Notebook Operations (7 tools)
list_notebooks
- Browse workspace notebooksget_notebook
- Retrieve notebook metadataexport_notebook
- Export in multiple formats (SOURCE, HTML, JUPYTER, DBC)import_notebook
- Import notebooks with base64 contentdelete_notebook
- Remove notebooks safelycreate_directory
- Create workspace directoriesget_notebook_status
- Check notebook availability
๐ DBFS File System (8 tools)
list_files
- Browse DBFS directoriesget_file
- Download file contents (text/binary)put_file
- Upload files with base64 encodingupload_large_file
- Chunked upload for large filesdelete_file
- Remove files and directoriesget_status
- Get file/directory metadatacreate_directory
- Create DBFS directoriesmove_file
- Move/rename files and directories
๐ ๏ธ Installation & Setup
Prerequisites
- Python 3.8+
- Databricks workspace access
- Personal Access Token or Service Principal credentials
Quick Start
-
Install dependencies:
pip install fastmcp 'mcp[cli]' databricks-sdk
-
Set environment variables:
export DATABRICKS_HOST=https://your-workspace.cloud.databricks.com export DATABRICKS_TOKEN=your-personal-access-token export DATABRICKS_SQL_WAREHOUSE_ID=your-warehouse-id # optional
-
Run the server:
# Using the CLI script ./bin/databricks-mcp-server # Or directly with Python python src/databricks_mcp/servers/main.py
Claude Desktop Configuration
Add to your Claude Desktop configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json
{
"mcpServers": {
"databricks": {
"command": "python",
"args": ["/path/to/databricks-mcp-server/src/databricks_mcp/servers/main.py"],
"env": {
"DATABRICKS_HOST": "https://your-workspace.cloud.databricks.com",
"DATABRICKS_TOKEN": "your-token-here"
}
}
}
}
๐ณ Docker Deployment
Using Docker Compose
-
Copy environment configuration:
cp config.env.example config.env # Edit config.env with your Databricks credentials
-
Build and run:
docker-compose -f deploy/docker/docker-compose.yml up --build
Claude Desktop with Docker
{
"mcpServers": {
"databricks": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"--env-file", "/path/to/config.env",
"databricks-mcp-server"
]
}
}
}
๐ง Configuration
Environment Variables
Required:
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-personal-access-token
Optional:
DATABRICKS_SQL_WAREHOUSE_ID=your-sql-warehouse-id
DATABRICKS_DEFAULT_CATALOG=main
DATABRICKS_DEFAULT_SCHEMA=default
MCP_SERVER_NAME=databricks-mcp
MCP_LOG_LEVEL=INFO
MCP_ENABLE_QUERY_EXECUTION=true
MCP_ENABLE_NATURAL_LANGUAGE=true
Security Considerations
- Store tokens securely (environment variables, not in code)
- Use SQL Warehouse IDs for query execution (recommended)
- Consider read-only access tokens for production use
- Validate all SQL queries through built-in query validator
๐งช Testing & Development
Test Connection
./bin/databricks-mcp-server --test
Development Mode
# Start with debug logging
./bin/databricks-mcp-server --log DEBUG
# Test with MCP Inspector
npx @modelcontextprotocol/inspector python src/databricks_mcp/servers/main.py
Adding New Tools
- Add your tool function to
src/databricks_mcp/core/server_fastmcp.py
- Use the
@mcp.tool()
decorator - Follow the established error handling pattern
- Test with the MCP Inspector
Example:
@mcp.tool()
async def my_new_tool(param: str) -> str:
"""Description of what this tool does."""
try:
client = get_databricks_client()
# Try async first, fall back to sync in thread if needed
try:
result = await client.some_operation(param)
except RuntimeError as e:
if "cannot be called from a running event loop" in str(e):
logger.warning("Event loop conflict detected, running in separate thread")
result = run_sync_in_thread(client.some_operation(param))
else:
raise
return json.dumps(result, indent=2)
except Exception as e:
return json.dumps({
"status": "error",
"error": str(e)
}, indent=2)
๐ Performance
- Memory Usage: ~50-100MB per server instance
- Startup Time: ~2-5 seconds (depending on Databricks connection)
- Tool Execution: ~100-2000ms per tool (depending on operation)
- Concurrent Requests: Thread-safe, supports multiple concurrent tool calls
- Docker Overhead: Minimal, single process architecture
๐ Troubleshooting
Common Issues
- AsyncIO Event Loop Conflicts: The server automatically handles these by running operations in separate threads
- Connection Timeouts: Check your
DATABRICKS_HOST
andDATABRICKS_TOKEN
- Permission Errors: Ensure your token has appropriate workspace permissions
- Docker Issues: Verify environment variables are properly passed to the container
Debug Mode
./bin/databricks-mcp-server --log DEBUG --test
Logs
The server provides comprehensive logging. Check logs for:
- Connection status
- Tool execution details
- Error messages with suggested fixes
- Performance metrics
๐ Documentation
- - Detailed architecture overview
- - Handling async event loops
- - Container deployment
- - Usage examples and demos
๐ค Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes following the established patterns
- Test your changes (
python -m py_compile src/databricks_mcp/core/server_fastmcp.py
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the file for details.
๐ Acknowledgments
- FastMCP - Modern MCP server framework
- Databricks SDK - Python SDK for Databricks
- Model Context Protocol - Protocol specification
- Anthropic - MCP protocol development
Built with โค๏ธ using FastMCP and the Databricks SDK
This server provides a comprehensive interface between AI agents and Databricks workspaces, enabling powerful data analysis, job management, and workspace automation through natural language interactions.