alevret/mcp-databricks-demo
If you are the rightful owner of mcp-databricks-demo and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Model Context Protocol (MCP) server integrates Azure OpenAI API with Databricks, providing a natural language interface for SQL queries, job management, and infrastructure monitoring.
Azure OpenAI MCP Integration with Databricks
A Chainlit-based chat interface that connects Azure OpenAI API with Databricks through the Model Context Protocol (MCP). This application provides a natural language interface to interact with your Databricks workspace, execute SQL queries, manage jobs, and monitor infrastructure.
📋 Prerequisites
- Azure OpenAI API: Valid endpoint, API key, and model deployment
- Databricks Workspace: Active workspace with SQL warehouse configured
- Python Environment: Python 3.11 recommended (matches Docker image)
- Network Access: Connectivity to both Azure OpenAI and Databricks services
🚀 Quick Start
Automated Setup
git clone
# Clone the repository
git clone <repository-url>
cd mcp_databricks_aoai # adjust if your folder name differs
# Create a virtual environment
python -m venv .venv
# Activate the virtual environment
# On Windows
.venv\Scripts\activate
# On macOS/Linux
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create and configure .env file (see configuration section)
# Then run the automated startup (auto-connects the MCP server)
python start.py
⚙️ Configuration
Create a .env file in the project root with the following required variables:
# Databricks Configuration (required)
DATABRICKS_HOST="your_databricks_host"
DATABRICKS_TOKEN="your_access_token"
DATABRICKS_HTTP_PATH="your_warehouse_http_path"
# Azure OpenAI Configuration (required)
AZURE_OPENAI_ENDPOINT="your_endpoint_here"
AZURE_OPENAI_API_KEY="your_api_key_here"
AZURE_OPENAI_MODEL="your_deployment_name_here"
AZURE_OPENAI_API_VERSION="2025-04-01-preview"
# MCP Transport (optional / advanced)
# Leave unset for stdio (local Chainlit). Set to http or sse for remote exposure.
MCP_TRANSPORT="http"
MCP_HOST="0.0.0.0"
MCP_PORT="8000"
MCP_PATH="/mcp"
Getting Databricks Credentials
- Host: Your Databricks workspace URL (e.g.,
adb-123456789.1.azuredatabricks.net) - Token: Personal Access Token from User Settings → Developer → Access Tokens
- HTTP Path: SQL Warehouse connection details → Advanced options → JDBC/ODBC
Azure OpenAI Setup
- Create an Azure OpenAI resource in Azure
- Deploy a model (e.g. I'm using gpt-4.1-nano)
- Get endpoint and API key from the resource overview
Sample Data Setup (Optional)
To test the application with sample data, you can create a table with building sensor readings:
- Create Sample Table: Execute the SQL commands in
sample_data/sample.sqlto create abuilding_sensor_readingstable with 54 rows of sample data - Table Structure: The table includes sensor readings from different floors and rooms with temperature data and timestamps
- Usage: Perfect for testing queries like temperature analysis, sensor monitoring, and data exploration features
-- Execute the contents of sample_data/sample.sql in your Databricks SQL warehouse
-- This creates: default.building_sensor_readings table with sensor data
The sample data includes:
- 4 sensors across 3 floors (Lobby, Conference Rooms, Server Room, Office)
- Temperature readings in Celsius with timestamps
- 54 data points for comprehensive testing scenarios
🛠️ Available Tools & Capabilities
The MCP server provides 8 powerful tools for Databricks interaction:
📊 Data Operations
run_sql_query(sql)- Execute SELECT, SHOW, or DESCRIBE queries with safety validation- Supports markdown table formatting
- Results limited to 1000 rows for performance
- Automatic SQL injection protection
list_databases()- List all available databases/catalogs in the workspacedescribe_table(table_name)- Get comprehensive schema information for any tableget_schema()- Resource providing complete warehouse table structure
⚙️ Job Management
list_jobs()- Display all Databricks jobs with metadata (ID, name, creator)get_job_status(job_id)- Detailed run history and status for specific jobsget_job_details(job_id)- Complete job configuration and task breakdown
🏗️ Infrastructure & Monitoring
get_cluster_info()- List all clusters with state and configuration detailsget_interaction_history()- Session interaction tracking for debugging and audit
🔐 Safety Features
- Query Validation: Only SELECT, SHOW, and DESCRIBE statements permitted
- Result Limiting: Automatic pagination and row limits
- Connection Management: Robust connection pooling and cleanup
- Error Handling: Comprehensive error reporting with context
- Session Tracking: Complete audit trail of all interactions
🎯 Usage Guide
Automated Startup (Recommended)
The start.py script provides automated environment validation and startup:
python start.py
Features of start.py:
- ✅ Validates all required packages are installed
- ✅ Checks
.envfile configuration completeness - ✅ Provides helpful error messages for missing components
- ✅ Automatically starts Chainlit with proper MCP configuration
- ✅ Opens browser to the application interface
Manual Connection (Alternative)
If using chainlit run app.py directly:
- Start Application:
chainlit run app.py -m mcp_config.json - Connect MCP Server:
- Click "MCP Servers" button in the Chainlit interface
- Click "Connect an MCP"
- Enter configuration:
- Name:
databricks - Type:
stdio - Command:
python mcp_server.py
- Name:
- Click "Confirm"
- Verify Connection: Look for welcome message listing 8 available tools
Troubleshooting Connection Issues
- Ensure all environment variables are properly set
- Verify Databricks connectivity with a simple test query
- Check Azure OpenAI API key and model deployment status
- Review terminal output for detailed error messages
💬 Example Interactions
Here are practical examples of how to interact with your Databricks environment:
Database & Schema Exploration
"What databases are available in my workspace?"
"Show me all tables in the sales_analytics database"
"Describe the schema for the customer_data table"
"What columns does the orders table have?"
Data Querying
"Show me the first 10 rows from the sales_data table"
"Count the number of records in customer_transactions"
"What are the unique values in the status column?"
"Show recent orders from the last 7 days"
Job Management & Monitoring
"List all jobs in my workspace"
"What's the status of job 12345?"
"Show me details for the daily_ETL job"
"Which jobs failed in the last run?"
Infrastructure Management
"What clusters are available?"
"Show me cluster information and their current status"
"Which clusters are currently running?"
Session & Debugging
"Show me my interaction history"
"What queries have I run in this session?"
"Debug my last query"
Complex Analytical Queries
"Analyze sales trends by region in the last quarter"
"Compare performance metrics between different product categories"
"Show me the top 5 customers by total purchase amount"
🏗️ Architecture & Components
Core Components
app.py: Main Chainlit application with Azure OpenAI integrationmcp_server.py: FastMCP server implementing all Databricks toolsstart.py: Automated startup script with environment validationmcp_config.json: MCP server configuration for Chainlit integration
Key Classes & Functions
ChatClient: Manages Azure OpenAI interactions and message streamingget_databricks_connection(): Robust connection management with error handlingdatabricks_api_request(): Unified REST API client for Databricks servicesinteraction_history: Session-level tracking for debugging and audit
Security & Safety Architecture
- SQL Injection Prevention: Whitelist-based query validation
- Query Type Restriction: Only SELECT, SHOW, DESCRIBE operations
- Result Set Limiting: Automatic pagination and row count limits
- Connection Lifecycle: Proper resource cleanup and error handling
- Error Isolation: Comprehensive exception handling with user-friendly messages
Integration Flow
- User Input → Chainlit Interface
- Message Processing → Azure OpenAI API
- Tool Calls → MCP Server (FastMCP)
- Databricks Interaction → SQL Warehouse / REST API
- Response Formatting → Markdown Tables / Structured Output
- Session Tracking → Interaction History Storage
� Running with Docker (Optional)
A Dockerfile is included to run the MCP server in HTTP mode.
Build:
docker build -t mcp-databricks .
Run:
docker run --rm -p 8000:8000 \
-e DATABRICKS_HOST="$DATABRICKS_HOST" \
-e DATABRICKS_TOKEN="$DATABRICKS_TOKEN" \
-e DATABRICKS_HTTP_PATH="$DATABRICKS_HTTP_PATH" \
-e AZURE_OPENAI_ENDPOINT="$AZURE_OPENAI_ENDPOINT" \
-e AZURE_OPENAI_API_KEY="$AZURE_OPENAI_API_KEY" \
-e AZURE_OPENAI_MODEL="$AZURE_OPENAI_MODEL" \
-e AZURE_OPENAI_API_VERSION="$AZURE_OPENAI_API_VERSION" \
-e MCP_TRANSPORT="http" \
-e MCP_HOST="0.0.0.0" \
-e MCP_PORT="8000" \
-e MCP_PATH="/mcp" \
mcp-databricks
Then configure an MCP-capable client (future Chainlit HTTP support or other tooling) to http://localhost:8000/mcp.
Stdio mode remains the simplest approach for local Chainlit usage.
🔌 MCP Transport Modes
Environment-driven transport selection (falls back to stdio if unset / invalid):
| Variable | Values | Default | Applies To | Notes |
|---|---|---|---|---|
| MCP_TRANSPORT | stdio, http, sse | stdio | All | Primary transport selector |
| MCP_HOST | host/IP | 0.0.0.0 | http/sse | Ignored for stdio |
| MCP_PORT | integer | 8000 | http/sse | Ignored for stdio |
| MCP_PATH | path | /mcp | http/sse | HTTP base path |
get_schema is exposed as an MCP resource (schema://tables), while the rest are tools.
�🔧 Configuration & Customization
Chainlit Configuration
The application supports extensive customization through config.toml:
- Chain of Thought Display: Full CoT mode enabled for transparency
- Custom Styling: CSS and JavaScript injection support
- Header Customization: Custom navigation links and branding
Welcome Screen
Modify chainlit.md to customize the application welcome screen and instructions.
Environment Validation
The start.py script includes comprehensive environment checking:
- Package Dependencies: Validates all required Python packages
- Environment Variables: Ensures all
.envvariables are present - Connection Testing: Optional connectivity verification
MCP Server Configuration
The mcp_config.json file defines the MCP server setup (stdio). A .vscode/mcp.json is also included for editor integration.
{
"mcpServers": {
"databricks": {
"command": "python",
"args": ["mcp_server.py"],
"cwd": "."
}
}
}
🚨 Troubleshooting
Common Issues
Connection Errors
- Verify Databricks token and workspace URL
- Check network connectivity to Databricks
- Ensure SQL warehouse is running
Azure OpenAI Issues
- Confirm API key and endpoint are correct
- Verify model deployment exists and is active
- Check API version compatibility
MCP Server Problems
- Ensure Python dependencies are installed
- Check for port conflicts
- Review terminal output for detailed error messages
Query Execution Failures
- Verify table and database names exist
- Check SQL syntax for SELECT/SHOW/DESCRIBE statements
- Review query length limits (10,000 characters max)
Debug Mode
Enable detailed logging by checking terminal output when running with start.py or manually starting the application.
📚 Dependencies
Core Requirements (from requirements.txt)
chainlit # Web interface framework
python-dotenv # Environment variable management
openai # Azure OpenAI API client
mcp # Model Context Protocol
aiohttp # Async HTTP client
databricks-sql-connector # Databricks SQL connectivity
requests # HTTP requests for REST API
fastmcp==2.10.6 # Fast MCP server implementation (version pinned for reproducibility)
Optional Dependencies
- Debugging Tools: Built-in interaction history tracking
- Development: All dependencies included for local development
🤝 Contributing & Development
Development Setup
- Fork the repository
- Set up development environment with all dependencies
- Configure test environment variables
- Run tests with
python test_integration.py
Code Structure
- Follow existing code patterns for new tools
- Maintain comprehensive error handling
- Add interaction history tracking for new functions
- Include safety validation for any database operations
Testing
The project includes integration testing capabilities:
test_integration.py: Comprehensive testing frameworkdebug_mcp.py: MCP server debugging utilities
📄 License & Support
Documentation Resources
- Chainlit Documentation - Web interface framework
- Azure OpenAI Documentation - AI service setup
- Databricks SQL Documentation - SQL warehouse configuration
- Model Context Protocol - MCP specification and tools
Community & Support
- GitHub Issues for bug reports and feature requests
- Documentation updates and improvements welcome
- Community contributions encouraged
Built with ❤️ using Chainlit, Azure OpenAI, and Databricks
✅ Recently Updated Documentation
- Updated sample data path (
sample_data/sample.sql) - Added Docker & transport mode instructions
- Added optional MCP_* environment variables
- Corrected folder name in quick start
- Documented pinned
fastmcpversion - Clarified resource vs tool distinction (
get_schema)