octavioccl/databricks-mcp-server
If you are the rightful owner of databricks-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The FastMCP server for Databricks provides AI agents with a robust interface to interact with Databricks workspaces, enabling efficient data management and automation.
Databricks MCP Server
A comprehensive FastMCP server that provides AI agents with powerful tools to interact with Databricks workspaces. Built with modern MCP best practices using individual @mcp.tool() decorated functions in a single, efficient server.
🚀 Architecture
This project uses a unified FastMCP server architecture with all tools implemented as individual @mcp.tool() decorated functions, providing:
- 35+ MCP Tools across 6 comprehensive categories
- Single Entry Point: Simplified deployment and management
- Async/Await Support: With event loop conflict handling for Docker environments
- JSON Responses: Structured, consistent tool outputs
- Thread-Safe: Concurrent tool execution support
- Docker Ready: Optimized for containerized deployment with Poetry
✨ Features
🗄️ Catalog Management Tools (6 tools)
list_catalogs- Browse available data catalogslist_schemas- Explore schemas within catalogslist_tables- Discover tables and viewsget_table_info- Get detailed table metadata and schemasearch_tables- Find tables using pattern matchinggenerate_sql_query- AI-powered SQL generation from natural language
🔍 Advanced Query Execution (2 tools)
execute_query- Execute SQL queries with automatic LIMIT handlingexecute_statement- Advanced SQL execution with parameters, catalogs, schemas, and timeout control
🖥️ Cluster Management (7 tools)
list_clusters- View all workspace clustersget_cluster- Get detailed cluster informationcreate_cluster- Create new clusters with autoscalingstart_cluster- Start stopped clustersterminate_cluster- Terminate running clustersrestart_cluster- Restart clusters for maintenanceresize_cluster- Dynamically resize cluster capacity
⚙️ Job Management (9 tools)
list_jobs- Browse all workspace jobsget_job- Get detailed job configurationrun_job- Execute jobs with custom parameterscreate_job- Create new job definitionsupdate_job- Modify existing jobsdelete_job- Remove job definitionsget_run- Get job run details and statuscancel_run- Cancel running job executionslist_runs- Browse job execution history
📓 Notebook Operations (7 tools)
list_notebooks- Browse workspace notebooksget_notebook- Retrieve notebook metadataexport_notebook- Export in multiple formats (SOURCE, HTML, JUPYTER, DBC)import_notebook- Import notebooks with base64 contentdelete_notebook- Remove notebooks safelycreate_directory- Create workspace directoriesget_notebook_status- Check notebook availability
📁 DBFS File System (8 tools)
list_files- Browse DBFS directoriesget_file- Download file contents (text/binary)put_file- Upload files with base64 encodingupload_large_file- Chunked upload for large filesdelete_file- Remove files and directoriesget_status- Get file/directory metadatacreate_directory- Create DBFS directoriesmove_file- Move/rename files and directories
🛠️ Installation & Setup
Prerequisites
- Python 3.8+
- Databricks workspace access
- Personal Access Token or Service Principal credentials
Quick Start
-
Install dependencies:
pip install fastmcp 'mcp[cli]' databricks-sdk -
Set environment variables:
export DATABRICKS_HOST=https://your-workspace.cloud.databricks.com export DATABRICKS_TOKEN=your-personal-access-token export DATABRICKS_SQL_WAREHOUSE_ID=your-warehouse-id # optional -
Run the server:
# Using the CLI script ./bin/databricks-mcp-server # Or directly with Python python src/databricks_mcp/servers/main.py
Claude Desktop Configuration
Add to your Claude Desktop configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json
{
"mcpServers": {
"databricks": {
"command": "python",
"args": ["/path/to/databricks-mcp-server/src/databricks_mcp/servers/main.py"],
"env": {
"DATABRICKS_HOST": "https://your-workspace.cloud.databricks.com",
"DATABRICKS_TOKEN": "your-token-here"
}
}
}
}
🐳 Docker Deployment
Using Docker Compose
-
Copy environment configuration:
cp config.env.example config.env # Edit config.env with your Databricks credentials -
Build and run:
docker-compose -f deploy/docker/docker-compose.yml up --build
Claude Desktop with Docker
{
"mcpServers": {
"databricks": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"--env-file", "/path/to/config.env",
"databricks-mcp-server"
]
}
}
}
🔧 Configuration
Environment Variables
Required:
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-personal-access-token
Optional:
DATABRICKS_SQL_WAREHOUSE_ID=your-sql-warehouse-id
DATABRICKS_DEFAULT_CATALOG=main
DATABRICKS_DEFAULT_SCHEMA=default
MCP_SERVER_NAME=databricks-mcp
MCP_LOG_LEVEL=INFO
MCP_ENABLE_QUERY_EXECUTION=true
MCP_ENABLE_NATURAL_LANGUAGE=true
Security Considerations
- Store tokens securely (environment variables, not in code)
- Use SQL Warehouse IDs for query execution (recommended)
- Consider read-only access tokens for production use
- Validate all SQL queries through built-in query validator
🧪 Testing & Development
Test Connection
./bin/databricks-mcp-server --test
Development Mode
# Start with debug logging
./bin/databricks-mcp-server --log DEBUG
# Test with MCP Inspector
npx @modelcontextprotocol/inspector python src/databricks_mcp/servers/main.py
Adding New Tools
- Add your tool function to
src/databricks_mcp/core/server_fastmcp.py - Use the
@mcp.tool()decorator - Follow the established error handling pattern
- Test with the MCP Inspector
Example:
@mcp.tool()
async def my_new_tool(param: str) -> str:
"""Description of what this tool does."""
try:
client = get_databricks_client()
# Try async first, fall back to sync in thread if needed
try:
result = await client.some_operation(param)
except RuntimeError as e:
if "cannot be called from a running event loop" in str(e):
logger.warning("Event loop conflict detected, running in separate thread")
result = run_sync_in_thread(client.some_operation(param))
else:
raise
return json.dumps(result, indent=2)
except Exception as e:
return json.dumps({
"status": "error",
"error": str(e)
}, indent=2)
📊 Performance
- Memory Usage: ~50-100MB per server instance
- Startup Time: ~2-5 seconds (depending on Databricks connection)
- Tool Execution: ~100-2000ms per tool (depending on operation)
- Concurrent Requests: Thread-safe, supports multiple concurrent tool calls
- Docker Overhead: Minimal, single process architecture
🔍 Troubleshooting
Common Issues
- AsyncIO Event Loop Conflicts: The server automatically handles these by running operations in separate threads
- Connection Timeouts: Check your
DATABRICKS_HOSTandDATABRICKS_TOKEN - Permission Errors: Ensure your token has appropriate workspace permissions
- Docker Issues: Verify environment variables are properly passed to the container
Debug Mode
./bin/databricks-mcp-server --log DEBUG --test
Logs
The server provides comprehensive logging. Check logs for:
- Connection status
- Tool execution details
- Error messages with suggested fixes
- Performance metrics
📚 Documentation
- - Detailed architecture overview
- - Handling async event loops
- - Container deployment
- - Usage examples and demos
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes following the established patterns
- Test your changes (
python -m py_compile src/databricks_mcp/core/server_fastmcp.py) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📝 License
This project is licensed under the MIT License - see the file for details.
🙏 Acknowledgments
- FastMCP - Modern MCP server framework
- Databricks SDK - Python SDK for Databricks
- Model Context Protocol - Protocol specification
- Anthropic - MCP protocol development
Built with ❤️ using FastMCP and the Databricks SDK
This server provides a comprehensive interface between AI agents and Databricks workspaces, enabling powerful data analysis, job management, and workspace automation through natural language interactions.