yang0733/databricks-mcp-server
If you are the rightful owner of databricks-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Databricks MCP Server is a community-driven project that provides a Model Context Protocol (MCP) server, enabling AI agents to interact with Databricks workspaces through over 50 operations.
Databricks MCP Server
A production-ready MCP server that exposes 50+ Databricks workspace operations through the MCP for AI agents like Cursor, Claude Desktop, and more.
⚠️ DISCLAIMER: This is NOT an official Databricks product and is NOT supported by Databricks. This is a community project that uses the Databricks SDK to provide MCP integration. Use at your own risk. For official Databricks support, please contact Databricks directly.
🚀 Quick Start (5 minutes)
1. Set Your Credentials
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="dapi..." # Your Personal Access Token
2. Start the Server
cd databricks_cli_mcp
./start_local_mcp.sh
Server runs at: http://localhost:8080/mcp
3. Configure Cursor
Add to Cursor MCP settings (~/.cursor/mcp.json):
{
"mcpServers": {
"databricks": {
"url": "http://localhost:8080/mcp"
}
}
}
4. Start Using!
In Cursor, try:
"List my Databricks clusters"
"Show me my SQL warehouses"
"Execute query: SELECT * FROM samples.nyctaxi.trips LIMIT 10"
📦 What You Get
50 Databricks Tools
- Clusters (7): Create, start, stop, list, manage
- Jobs (6): Create, run, cancel, monitor
- SQL (8): Execute queries, manage warehouses
- Unity Catalog (11): Catalogs, schemas, tables, volumes
- Workspace (6): Files, notebooks, directories
- Notebooks (5): Import, export, run
- Repos (5): Git integration
- Secrets (4): Secret management
- Tasks (2): Async operations
🏗️ Project Structure
databricks_cli_mcp/
├── README.md # This file
├── server.py # Main MCP server (FastMCP)
├── local_mcp_server.py # Local server with PAT auth
├── start_local_mcp.sh # Quick start script
│
├── auth.py # Authentication logic
├── databricks_client.py # Databricks SDK wrapper
├── task_manager.py # Async task management
├── tool_registry.py # Tool schema converter
│
├── tools/ # 50 Databricks tools
│ ├── clusters.py
│ ├── jobs.py
│ ├── sql.py
│ ├── unity_catalog.py
│ ├── workspace.py
│ ├── notebooks.py
│ ├── repos.py
│ └── secrets.py
│
├── app.yaml # Databricks Apps deployment config
├── requirements.txt # Python dependencies
└── pyproject.toml # Package configuration
🔐 Authentication
Option 1: PAT (Personal Access Token) - Recommended
Best for: Development, personal use
# Get your PAT from Databricks UI
# Settings → Developer → Access tokens → Generate new token
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="dapi..."
./start_local_mcp.sh
Pros: ✅ Simple, ✅ Quick, ✅ Works immediately
Cons: ⚠️ 90-day expiry, ⚠️ Manual renewal
Option 2: M2M OAuth (Advanced)
Best for: Production, teams, CI/CD
- Create service principal in Databricks
- Generate OAuth secret
- Set environment variables:
export DATABRICKS_APP_URL="https://your-app.databricksapps.com"
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_CLIENT_ID="<service-principal-id>"
export DATABRICKS_CLIENT_SECRET="<oauth-secret>"
./start_oauth_proxy.sh
Pros: ✅ Auto-refresh, ✅ Better security, ✅ Team sharing
Cons: ⚠️ Requires service principal setup
☁️ Deploy to Databricks Apps
1. Authenticate
databricks auth login
2. Deploy
./deploy_correct.sh
3. Get Your App URL
databricks apps get databricks-mcp-server | jq -r .url
🧪 Testing
Test Locally
python test_local.py
Test from Notebook
Upload notebooks/simple_test.py to Databricks and run it.
💡 Example Usage
In Cursor
"Create a cluster named 'test' with 2 workers"
"Start cluster 0730-172948-runts698"
"Execute SQL: SELECT current_database()"
"List tables in catalog main"
"Export notebook /Users/me/analysis"
"Show me my job runs"
Programmatically
from databricks.sdk import WorkspaceClient
client = WorkspaceClient()
# List clusters
clusters = list(client.clusters.list())
print(f"Found {len(clusters)} clusters")
# Execute SQL
result = client.sql.execute_query(
warehouse_id="abc123",
query="SELECT * FROM samples.nyctaxi.trips LIMIT 10"
)
🔧 Configuration
Environment Variables
| Variable | Description | Required |
|---|---|---|
DATABRICKS_HOST | Workspace URL | ✅ Yes |
DATABRICKS_TOKEN | Personal Access Token | ✅ Yes (PAT mode) |
DATABRICKS_CLIENT_ID | Service principal ID | For OAuth |
DATABRICKS_CLIENT_SECRET | OAuth secret | For OAuth |
DATABRICKS_APP_URL | Deployed app URL | For OAuth |
Server Configuration
The server runs on port 8080 by default. Change in start_local_mcp.sh:
python3 local_mcp_server.py --port 8080
🐛 Troubleshooting
Server won't start
# Check if port is in use
lsof -i :8080
# Kill existing process
kill -9 <PID>
# Restart
./start_local_mcp.sh
Authentication failed
# Verify credentials
echo $DATABRICKS_HOST
echo $DATABRICKS_TOKEN
# Test manually
curl -H "Authorization: Bearer $DATABRICKS_TOKEN" \
$DATABRICKS_HOST/api/2.0/clusters/list
Cursor not connecting
- Check server is running:
curl http://localhost:8080/health - Verify Cursor config:
cat ~/.cursor/mcp.json - Restart Cursor
- Check logs:
tail -f mcp_server.log
Permission errors
- Can't create clusters? Ask your admin for permissions
- Can't access catalogs? Check Unity Catalog permissions
- Can't run queries? Verify SQL warehouse access
📚 Key Concepts
MCP (Model Context Protocol)
An open protocol that lets AI assistants connect to external data sources and tools. Think of it as "API for AIs".
FastMCP
A Python framework for building MCP servers. Makes it easy to expose Python functions as tools for AI agents.
Databricks SDK
Official Python library for interacting with Databricks APIs. Powers all the tools in this server.
🔗 Resources
- Databricks API Docs: https://docs.databricks.com/api/
- MCP Protocol: https://modelcontextprotocol.io/
- FastMCP: https://github.com/jlowin/fastmcp
- Databricks SDK: https://github.com/databricks/databricks-sdk-py
📝 Requirements
- Python 3.11+
- Databricks workspace with API access
- Personal Access Token or Service Principal
- Cursor IDE (or any MCP-compatible client)
🎯 Common Tasks
Start a cluster
# In Cursor
"Start cluster <cluster-id>"
# Or directly
from databricks.sdk import WorkspaceClient
client = WorkspaceClient()
client.clusters.start(cluster_id="...")
Run a SQL query
# In Cursor
"Execute query: SELECT * FROM my_table LIMIT 10"
# Or directly
result = client.sql.execute_query(
warehouse_id="...",
query="SELECT * FROM my_table"
)
Create a job
# In Cursor
"Create a job that runs notebook /path/to/notebook daily"
# Or use the Databricks UI - easier for complex jobs!
⚙️ Advanced Configuration
Custom Tools
Add your own tools in tools/ directory:
# tools/custom.py
from fastmcp import Context
@mcp.tool()
async def my_custom_tool(param: str, context: Context):
"""My custom Databricks operation."""
client = get_databricks_client(context)
# Your logic here
return {"result": "success"}
Then register in server.py:
from tools import custom
Production Deployment
For production use:
- Use M2M OAuth (not PAT)
- Deploy to Databricks Apps
- Enable monitoring (
/metricsendpoint) - Set up proper logging
- Configure auto-scaling
🎉 Success!
You now have a fully functional Databricks MCP server!
Next steps:
- ✅ Start the server
- ✅ Configure Cursor
- ✅ Try listing your clusters
- ✅ Execute your first SQL query
- 🚀 Automate all the things!
📄 License
MIT
🤝 Contributing
Issues and PRs welcome! This project follows standard GitHub workflow.
💬 Support
- Check logs:
tail -f mcp_server.log - Review error messages in Cursor
- Verify Databricks permissions
- Test API access directly with
curl
Built with ❤️ for Databricks automation