scottcoggin/databricks-hardened-mcp-server
If you are the rightful owner of databricks-hardened-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Databricks MCP Server is a secure, production-ready server that facilitates interaction between LLM-powered applications and Databricks using the Model Context Protocol.
Databricks MCP Server
A secure, production-ready Model Context Protocol (MCP) server that enables LLM-powered applications to interact with Databricks. Built with comprehensive security hardening, input validation, and data sanitization.
Features
Core Capabilities
- MCP Protocol Support: Full implementation of the Model Context Protocol for seamless LLM integration
- Databricks API Integration: Direct access to clusters, jobs, notebooks, DBFS, and SQL execution
- Async Architecture: Built with asyncio for efficient concurrent operations
- Comprehensive Testing: 133+ unit tests with 80%+ code coverage, all tests run in ~3-4 seconds
Security Hardening
- Input Validation: Prevents path traversal, null byte injection, and malicious input patterns
- Data Sanitization: Automatic redaction of sensitive information (tokens, passwords, API keys) from logs
- Environment Validation: Fail-fast startup with clear error messages for configuration issues
- ID Validation: Strict validation for cluster IDs, job IDs, warehouse IDs, and SQL statements
Available Tools
| Tool | Description | Parameters |
|---|---|---|
list_clusters | List all Databricks clusters | None |
create_cluster | Create a new cluster | cluster_name, spark_version, node_type_id, num_workers, autotermination_minutes |
terminate_cluster | Terminate a cluster | cluster_id |
get_cluster | Get cluster information | cluster_id |
start_cluster | Start a terminated cluster | cluster_id |
list_jobs | List all jobs | None |
run_job | Run a job | job_id, notebook_params |
list_notebooks | List workspace notebooks | path |
export_notebook | Export a notebook | path, format |
list_files | List DBFS files | dbfs_path |
execute_sql | Execute SQL statement | statement, warehouse_id, catalog, schema |
Quick Start
Prerequisites
- Python 3.10 or higher
uvpackage manager (recommended)- Databricks workspace and access token
Installation
-
Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh -
Clone and setup:
git clone https://github.com/JustTryAI/databricks-mcp-server.git cd databricks-mcp-server # Create virtual environment and install dependencies uv venv source .venv/bin/activate uv pip install -e ".[dev]" -
Configure credentials:
cp .env.example .env # Edit .env with your Databricks credentials: # DATABRICKS_HOST=https://your-workspace.azuredatabricks.net # DATABRICKS_TOKEN=your-personal-access-token
Running the Server
# Start the MCP server
./start_mcp_server.sh
# Or use the script in the scripts directory
./scripts/start_mcp_server.sh
Quick Commands
# Run all tests
uv run pytest tests/
# Run tests with coverage
uv run pytest --cov=src tests/ --cov-report=term-missing
# View Databricks resources
uv run python scripts/show_clusters.py
uv run python scripts/show_notebooks.py
# Run integration tests
bash scripts/run_direct_test.sh
Databricks Edition Support
This server works with both Databricks Community Edition (Free) and Premium editions.
Note: Some API endpoints may return 403 Forbidden on Community Edition due to permission restrictions. The server handles these gracefully and reports clear error messages.
Tested on:
- [TESTED] Community Edition (Free) - Limited API access
- [PLANNED] Premium Edition - Full API access (coming soon)
Development
Project Structure
databricks-mcp-server/
├── src/
│ ├── api/ # Databricks API client modules
│ ├── core/ # Configuration, utilities, validation
│ ├── server/ # MCP server implementation
│ └── cli/ # Command-line interface
├── tests/ # Unit tests (133+ tests)
├── scripts/ # Utility and startup scripts
├── examples/ # Usage examples
├── docs/ # Documentation
└── CLAUDE.md # AI assistant guidance
Code Standards
- Style: PEP 8 with 100-character line limit
- Type Hints: Required for all production code
- Docstrings: Google-style docstrings for all public APIs
- Testing: Minimum 80% code coverage
- Security: All inputs must be validated using
src/core/utils.pyfunctions
Security Requirements
When adding new features:
-
Validate all inputs using these functions:
validate_workspace_path()- For workspace pathsvalidate_dbfs_path()- For DBFS pathsvalidate_cluster_id()- For cluster IDsvalidate_job_id()- For job IDsvalidate_warehouse_id()- For warehouse IDsvalidate_sql_statement()- For SQL queries
-
Sanitize logs using
sanitize_for_logging()to prevent credential leaks -
Add tests for both valid and invalid inputs
Testing
# Run all tests
uv run pytest tests/
# Run specific test file
uv run pytest tests/test_api_clusters.py
# Run with verbose output
uv run pytest -v tests/
# Generate coverage report
uv run pytest --cov=src tests/ --cov-report=html
Linting
uv run pylint src/ tests/
uv run flake8 src/ tests/
uv run mypy src/
Security Features Deep Dive
Input Validation
All user-provided inputs are validated before API calls:
# Path validation prevents:
- Path traversal: ../../../etc/passwd
- Null byte injection: /path/to/file\0.txt
- Suspicious patterns: //, ~, etc.
# ID validation ensures:
- Valid characters only (alphanumeric, hyphens, underscores)
- Reasonable length limits
- No injection attacks
Data Sanitization
Sensitive data is automatically redacted from logs:
# Before logging:
{"user": "alice", "token": "dapi1234567890abcdef"}
# After sanitization:
{"user": "alice", "token": "**REDACTED**"}
Detects sensitive keys (case-insensitive):
- token, password, secret, api_key, apikey
- auth, credential, credentials, private_key
- access_token, refresh_token
Examples
Check the examples/ directory for usage demonstrations:
# Direct API usage
uv run python examples/direct_usage.py
# MCP client usage
uv run python examples/mcp_client_usage.py
Troubleshooting
Common Issues
403 Forbidden errors:
- Check your Databricks access token has the necessary permissions
- Community Edition has limited API access
- Verify your token scope includes workspace, clusters, and jobs
Connection errors:
- Verify
DATABRICKS_HOSTis set correctly (includehttps://) - Check your network can reach the Databricks workspace
- Ensure your token hasn't expired
Environment variables not loaded:
- Make sure
.envfile exists in the project root - Check file permissions allow reading
- Try exporting variables manually for testing
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass and coverage remains above 80%
- Follow the code standards
- Submit a pull request
License
MIT License - see file for details.
Copyright (c) 2025 DDATASERVICES LLC
Acknowledgments
Built with: