databricks-hardened-mcp-server by scottcoggin - MCP Server

Databricks MCP Server

A secure, production-ready Model Context Protocol (MCP) server that enables LLM-powered applications to interact with Databricks. Built with comprehensive security hardening, input validation, and data sanitization.

Features

Core Capabilities

MCP Protocol Support: Full implementation of the Model Context Protocol for seamless LLM integration
Databricks API Integration: Direct access to clusters, jobs, notebooks, DBFS, and SQL execution
Async Architecture: Built with asyncio for efficient concurrent operations
Comprehensive Testing: 133+ unit tests with 80%+ code coverage, all tests run in ~3-4 seconds

Security Hardening

Input Validation: Prevents path traversal, null byte injection, and malicious input patterns
Data Sanitization: Automatic redaction of sensitive information (tokens, passwords, API keys) from logs
Environment Validation: Fail-fast startup with clear error messages for configuration issues
ID Validation: Strict validation for cluster IDs, job IDs, warehouse IDs, and SQL statements

Available Tools

Tool	Description	Parameters
`list_clusters`	List all Databricks clusters	None
`create_cluster`	Create a new cluster	cluster_name, spark_version, node_type_id, num_workers, autotermination_minutes
`terminate_cluster`	Terminate a cluster	cluster_id
`get_cluster`	Get cluster information	cluster_id
`start_cluster`	Start a terminated cluster	cluster_id
`list_jobs`	List all jobs	None
`run_job`	Run a job	job_id, notebook_params
`list_notebooks`	List workspace notebooks	path
`export_notebook`	Export a notebook	path, format
`list_files`	List DBFS files	dbfs_path
`execute_sql`	Execute SQL statement	statement, warehouse_id, catalog, schema

Quick Start

Prerequisites

Python 3.10 or higher
uv package manager (recommended)
Databricks workspace and access token

Installation

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Clone and setup:

git clone https://github.com/JustTryAI/databricks-mcp-server.git
cd databricks-mcp-server

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

Configure credentials:

cp .env.example .env
# Edit .env with your Databricks credentials:
# DATABRICKS_HOST=https://your-workspace.azuredatabricks.net
# DATABRICKS_TOKEN=your-personal-access-token

Running the Server

# Start the MCP server
./start_mcp_server.sh

# Or use the script in the scripts directory
./scripts/start_mcp_server.sh

Quick Commands

# Run all tests
uv run pytest tests/

# Run tests with coverage
uv run pytest --cov=src tests/ --cov-report=term-missing

# View Databricks resources
uv run python scripts/show_clusters.py
uv run python scripts/show_notebooks.py

# Run integration tests
bash scripts/run_direct_test.sh

Databricks Edition Support

This server works with both Databricks Community Edition (Free) and Premium editions.

Note: Some API endpoints may return 403 Forbidden on Community Edition due to permission restrictions. The server handles these gracefully and reports clear error messages.

Tested on:

[TESTED] Community Edition (Free) - Limited API access
[PLANNED] Premium Edition - Full API access (coming soon)

Development

Project Structure

databricks-mcp-server/
├── src/
│   ├── api/           # Databricks API client modules
│   ├── core/          # Configuration, utilities, validation
│   ├── server/        # MCP server implementation
│   └── cli/           # Command-line interface
├── tests/             # Unit tests (133+ tests)
├── scripts/           # Utility and startup scripts
├── examples/          # Usage examples
├── docs/              # Documentation
└── CLAUDE.md          # AI assistant guidance

Code Standards

Style: PEP 8 with 100-character line limit
Type Hints: Required for all production code
Docstrings: Google-style docstrings for all public APIs
Testing: Minimum 80% code coverage
Security: All inputs must be validated using src/core/utils.py functions

Security Requirements

When adding new features:

Validate all inputs using these functions:
- validate_workspace_path() - For workspace paths
- validate_dbfs_path() - For DBFS paths
- validate_cluster_id() - For cluster IDs
- validate_job_id() - For job IDs
- validate_warehouse_id() - For warehouse IDs
- validate_sql_statement() - For SQL queries
Sanitize logs using sanitize_for_logging() to prevent credential leaks
Add tests for both valid and invalid inputs

Testing

# Run all tests
uv run pytest tests/

# Run specific test file
uv run pytest tests/test_api_clusters.py

# Run with verbose output
uv run pytest -v tests/

# Generate coverage report
uv run pytest --cov=src tests/ --cov-report=html

Linting

uv run pylint src/ tests/
uv run flake8 src/ tests/
uv run mypy src/

Security Features Deep Dive

Input Validation

All user-provided inputs are validated before API calls:

# Path validation prevents:
- Path traversal: ../../../etc/passwd
- Null byte injection: /path/to/file\0.txt
- Suspicious patterns: //, ~, etc.

# ID validation ensures:
- Valid characters only (alphanumeric, hyphens, underscores)
- Reasonable length limits
- No injection attacks

Data Sanitization

Sensitive data is automatically redacted from logs:

# Before logging:
{"user": "alice", "token": "dapi1234567890abcdef"}

# After sanitization:
{"user": "alice", "token": "**REDACTED**"}

Detects sensitive keys (case-insensitive):

token, password, secret, api_key, apikey
auth, credential, credentials, private_key
access_token, refresh_token

Examples

Check the examples/ directory for usage demonstrations:

# Direct API usage
uv run python examples/direct_usage.py

# MCP client usage
uv run python examples/mcp_client_usage.py

Troubleshooting

Common Issues

403 Forbidden errors:

Check your Databricks access token has the necessary permissions
Community Edition has limited API access
Verify your token scope includes workspace, clusters, and jobs

Connection errors:

Verify DATABRICKS_HOST is set correctly (include https://)
Check your network can reach the Databricks workspace
Ensure your token hasn't expired

Environment variables not loaded:

Make sure .env file exists in the project root
Check file permissions allow reading
Try exporting variables manually for testing

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass and coverage remains above 80%
Follow the code standards
Submit a pull request

License

MIT License - see file for details.

Acknowledgments

Built with:

FastMCP - MCP server framework
uv - Fast Python package manager
pytest - Testing framework