databricks-hardened-mcp-server

scottcoggin/databricks-hardened-mcp-server

3.2

If you are the rightful owner of databricks-hardened-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Databricks MCP Server is a secure, production-ready server that facilitates interaction between LLM-powered applications and Databricks using the Model Context Protocol.

Tools
11
Resources
0
Prompts
0

Databricks MCP Server

A secure, production-ready Model Context Protocol (MCP) server that enables LLM-powered applications to interact with Databricks. Built with comprehensive security hardening, input validation, and data sanitization.

Features

Core Capabilities

  • MCP Protocol Support: Full implementation of the Model Context Protocol for seamless LLM integration
  • Databricks API Integration: Direct access to clusters, jobs, notebooks, DBFS, and SQL execution
  • Async Architecture: Built with asyncio for efficient concurrent operations
  • Comprehensive Testing: 133+ unit tests with 80%+ code coverage, all tests run in ~3-4 seconds

Security Hardening

  • Input Validation: Prevents path traversal, null byte injection, and malicious input patterns
  • Data Sanitization: Automatic redaction of sensitive information (tokens, passwords, API keys) from logs
  • Environment Validation: Fail-fast startup with clear error messages for configuration issues
  • ID Validation: Strict validation for cluster IDs, job IDs, warehouse IDs, and SQL statements

Available Tools

ToolDescriptionParameters
list_clustersList all Databricks clustersNone
create_clusterCreate a new clustercluster_name, spark_version, node_type_id, num_workers, autotermination_minutes
terminate_clusterTerminate a clustercluster_id
get_clusterGet cluster informationcluster_id
start_clusterStart a terminated clustercluster_id
list_jobsList all jobsNone
run_jobRun a jobjob_id, notebook_params
list_notebooksList workspace notebookspath
export_notebookExport a notebookpath, format
list_filesList DBFS filesdbfs_path
execute_sqlExecute SQL statementstatement, warehouse_id, catalog, schema

Quick Start

Prerequisites

  • Python 3.10 or higher
  • uv package manager (recommended)
  • Databricks workspace and access token

Installation

  1. Install uv (if not already installed):

    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  2. Clone and setup:

    git clone https://github.com/JustTryAI/databricks-mcp-server.git
    cd databricks-mcp-server
    
    # Create virtual environment and install dependencies
    uv venv
    source .venv/bin/activate
    uv pip install -e ".[dev]"
    
  3. Configure credentials:

    cp .env.example .env
    # Edit .env with your Databricks credentials:
    # DATABRICKS_HOST=https://your-workspace.azuredatabricks.net
    # DATABRICKS_TOKEN=your-personal-access-token
    

Running the Server

# Start the MCP server
./start_mcp_server.sh

# Or use the script in the scripts directory
./scripts/start_mcp_server.sh

Quick Commands

# Run all tests
uv run pytest tests/

# Run tests with coverage
uv run pytest --cov=src tests/ --cov-report=term-missing

# View Databricks resources
uv run python scripts/show_clusters.py
uv run python scripts/show_notebooks.py

# Run integration tests
bash scripts/run_direct_test.sh

Databricks Edition Support

This server works with both Databricks Community Edition (Free) and Premium editions.

Note: Some API endpoints may return 403 Forbidden on Community Edition due to permission restrictions. The server handles these gracefully and reports clear error messages.

Tested on:

  • [TESTED] Community Edition (Free) - Limited API access
  • [PLANNED] Premium Edition - Full API access (coming soon)

Development

Project Structure

databricks-mcp-server/
├── src/
│   ├── api/           # Databricks API client modules
│   ├── core/          # Configuration, utilities, validation
│   ├── server/        # MCP server implementation
│   └── cli/           # Command-line interface
├── tests/             # Unit tests (133+ tests)
├── scripts/           # Utility and startup scripts
├── examples/          # Usage examples
├── docs/              # Documentation
└── CLAUDE.md          # AI assistant guidance

Code Standards

  • Style: PEP 8 with 100-character line limit
  • Type Hints: Required for all production code
  • Docstrings: Google-style docstrings for all public APIs
  • Testing: Minimum 80% code coverage
  • Security: All inputs must be validated using src/core/utils.py functions

Security Requirements

When adding new features:

  1. Validate all inputs using these functions:

    • validate_workspace_path() - For workspace paths
    • validate_dbfs_path() - For DBFS paths
    • validate_cluster_id() - For cluster IDs
    • validate_job_id() - For job IDs
    • validate_warehouse_id() - For warehouse IDs
    • validate_sql_statement() - For SQL queries
  2. Sanitize logs using sanitize_for_logging() to prevent credential leaks

  3. Add tests for both valid and invalid inputs

Testing

# Run all tests
uv run pytest tests/

# Run specific test file
uv run pytest tests/test_api_clusters.py

# Run with verbose output
uv run pytest -v tests/

# Generate coverage report
uv run pytest --cov=src tests/ --cov-report=html

Linting

uv run pylint src/ tests/
uv run flake8 src/ tests/
uv run mypy src/

Security Features Deep Dive

Input Validation

All user-provided inputs are validated before API calls:

# Path validation prevents:
- Path traversal: ../../../etc/passwd
- Null byte injection: /path/to/file\0.txt
- Suspicious patterns: //, ~, etc.

# ID validation ensures:
- Valid characters only (alphanumeric, hyphens, underscores)
- Reasonable length limits
- No injection attacks

Data Sanitization

Sensitive data is automatically redacted from logs:

# Before logging:
{"user": "alice", "token": "dapi1234567890abcdef"}

# After sanitization:
{"user": "alice", "token": "**REDACTED**"}

Detects sensitive keys (case-insensitive):

  • token, password, secret, api_key, apikey
  • auth, credential, credentials, private_key
  • access_token, refresh_token

Examples

Check the examples/ directory for usage demonstrations:

# Direct API usage
uv run python examples/direct_usage.py

# MCP client usage
uv run python examples/mcp_client_usage.py

Troubleshooting

Common Issues

403 Forbidden errors:

  • Check your Databricks access token has the necessary permissions
  • Community Edition has limited API access
  • Verify your token scope includes workspace, clusters, and jobs

Connection errors:

  • Verify DATABRICKS_HOST is set correctly (include https://)
  • Check your network can reach the Databricks workspace
  • Ensure your token hasn't expired

Environment variables not loaded:

  • Make sure .env file exists in the project root
  • Check file permissions allow reading
  • Try exporting variables manually for testing

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass and coverage remains above 80%
  5. Follow the code standards
  6. Submit a pull request

License

MIT License - see file for details.

Copyright (c) 2025 DDATASERVICES LLC

Acknowledgments

Built with:

  • FastMCP - MCP server framework
  • uv - Fast Python package manager
  • pytest - Testing framework