jb3cloud/nl2sql-mcp
If you are the rightful owner of nl2sql-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The NL2SQL MCP server is a specialized system designed to convert natural language questions into safe, executable SQL queries, specifically focusing on the SELECT statement, ensuring type safety, testability, and compatibility across different databases.
nl2sql-mcp
A production-ready Model Context Protocol (MCP) server that transforms natural language into safe, executable SQL queries. Built for LLMs with comprehensive type safety, multi-database support, and intelligent schema analysis.
Database Support
This project provides an "all drivers" container image that can connect to the major databases without post-build steps. Drivers are modeled as optional dependencies and the Dockerfile installs the minimal OS libraries required at runtime.
- PostgreSQL: Python driver
psycopg[binary]
(v3); OS libs:libpq5
. - MySQL/MariaDB: Python driver
mysqlclient
; OS libs:libmariadb3
. - SQL Server: Python driver
pyodbc
; OS libs:unixodbc
, Microsoftmsodbcsql18
(+ optionalmssql-tools18
). - SQLite: stdlib
sqlite3
(no extra OS libs).
Local dev installs:
- All drivers:
uv sync --extra drivers-all
- Specific backends:
uv sync --extra postgres --extra mysql
(as needed)
Build the all-drivers image:
docker build -t nl2sql-mcp:drivers-all .
docker run --rm -p 8000:8000 nl2sql-mcp:drivers-all
Notes:
- Debian slim base maximizes prebuilt wheel compatibility. Avoid Alpine for DB stacks.
- To switch MySQL to a pure-Python driver, replace
mysqlclient
withmysql-connector-python
inpyproject.toml
and re-lock withuv lock --extra drivers-all
.
๐ Key Features
๐ฏ LLM-Optimized Intelligence
- Intent-first query planning with structured schema context and join recommendations
- Semantic schema analysis with automatic table classification (fact, dimension, bridge, reference)
- Multi-modal table discovery using lexical matching, embeddings, and graph traversal
- Column role detection (key, date, metric, category) with semantic tagging
๐ก๏ธ Production-Ready Safety
- SELECT-only execution with comprehensive SQL validation
- Dialect normalization and transpilation via SQLGlot
- Row limits and cell truncation to prevent resource exhaustion
- Comprehensive error handling with actionable feedback
๐ง Multi-Database Support
- SQLAlchemy-powered compatibility with PostgreSQL, MySQL, SQL Server, SQLite, Oracle, Snowflake
- Dialect-aware SQL generation and optimization
- Cross-platform schema reflection and analysis
๐๏ธ Enterprise Architecture
- Type-safe Pydantic models throughout
- Dependency injection for testability
- Background initialization with graceful degradation
- Zero hardcoded assumptions - adapts to any database schema
๐ฆ Quick Start
Prerequisites
- Python 3.13+
- uv package manager
- Database with appropriate drivers installed
Installation
# Clone the repository
git clone https://github.com/jb3cloud/nl2sql-mcp.git
cd nl2sql-mcp
# Install dependencies
uv sync
# Configure your database
cp .env.example .env
# Edit .env with your database connection details
Basic Usage
# Start the MCP server
uv run nl2sql-mcp
# Or run directly
uv run python -m nl2sql_mcp.server
# Or run using uvx
uvx --from git+https://github.com/jb3cloud/nl2sql-mcp nl2sql-mcp
๐ ๏ธ Makefile Workflow
The repository includes a Makefile that standardizes common tasks. All targets invoke tooling via uv run
to ensure a consistent environment.
# Discover available targets and variables
make help
# Run the full Quality Gauntlet (format โ lint โ typecheck โ test)
make quality
# Individual steps (as needed)
make format # Ruff format
make lint # Ruff check --fix
make typecheck # basedpyright (strict)
make test # pytest
make clean # Remove caches and build artifacts
Deployment helpers for Azure Container Apps are also provided:
# Create docker container
make docker
# Run docker container
make docker-run
# Verify Azure CLI login and (optionally) set subscription
make preflight AZ_SUBSCRIPTION="00000000-0000-0000-0000-000000000000"
# Build from source and deploy
make publish \
APP_NAME=my-nl2sql-mcp \
RESOURCE_GROUP=my-rg \
ENVIRONMENT=my-aca-env \
LOCATION=eastus \
AZ_SUBSCRIPTION="00000000-0000-0000-0000-000000000000" \
INGRESS=external \
TARGET_PORT=8000 \
ENV_ARGS="NL2SQL_MCP_DATABASE_URL=postgresql://user:pass@host:5432/db"
Supported variables (can be provided via environment or CLI): APP_NAME
, RESOURCE_GROUP
, LOCATION
, ENVIRONMENT
, AZ_SUBSCRIPTION
, INGRESS
(default external
), TARGET_PORT
(default 8000
).
Environment Configuration
Create a .env
file with your database connection:
# Required: Database connection
NL2SQL_MCP_DATABASE_URL=postgresql://user:pass@localhost:5432/dbname
# Optional: Result limits and debugging
NL2SQL_MCP_ROW_LIMIT=1000
NL2SQL_MCP_MAX_CELL_CHARS=500
NL2SQL_MCP_DEBUG_TOOLS=1 # Enable find_tables/find_columns tools
๐๏ธ Architecture
graph TD
A[LLM Client] -->|MCP Protocol| B[FastMCP Server]
subgraph "Core Services"
B --> C[Schema Service]
B --> D[SQLGlot Service]
B --> E[Execution Engine]
end
subgraph "Intelligence Layer"
C --> F[Schema Explorer]
C --> G[Query Engine]
F --> H[Reflection & Profiling]
F --> I[Graph Analysis]
G --> J[Multi-Modal Retrieval]
G --> K[Graph Expansion]
end
subgraph "Data Layer"
E --> L[SQLAlchemy Engine]
H --> L
L --> M[(Your Database)]
end
style A fill:#e1f5fe
style B fill:#f3e5f5
style M fill:#e8f5e8
The system follows a two-phase architecture:
- Schema Building Phase: Comprehensive database analysis, relationship mapping, and semantic understanding
- Query Processing Phase: Real-time natural language to SQL conversion with context-aware planning
๐ MCP Tools API
Core Workflow Tools
get_init_status()
Check server readiness and initialization progress.
// Response
{
"phase": "READY",
"attempts": 1,
"started_at": "2024-01-15T10:30:00Z",
"completed_at": "2024-01-15T10:30:45Z"
}
get_database_overview(req: DatabaseOverviewRequest)
High-level database summary with subject areas.
// Request
{
"include_subject_areas": true,
"area_limit": 8
}
// Response
{
"total_tables": 45,
"total_schemas": 3,
"subject_areas": [
{
"name": "Sales Analytics",
"tables": ["orders", "customers", "products"],
"summary": "Customer orders and product sales data"
}
]
}
plan_query_for_intent(req: PlanQueryRequest)
Intent-first SQL planning with structured guidance.
// Request
{
"request": "Show monthly revenue by region for 2024",
"constraints": {
"time_range": "2024-01-01..2024-12-31",
"metric": "revenue"
},
"detail_level": "standard"
}
// Response
{
"relevant_tables": [
{
"table_key": "sales.orders",
"relevance_score": 0.95,
"why_relevant": "Contains revenue data and date columns"
}
],
"join_plan": [
["sales.orders.customer_id", "customers.id"]
],
"main_table": "sales.orders",
"draft_sql": "SELECT DATE_TRUNC('month', order_date) as month...",
"confidence": 0.92
}
execute_query(req: ExecuteQueryRequest)
Safe SQL execution with validation and results.
// Request
{
"sql": "SELECT region, SUM(amount) as revenue FROM sales.orders WHERE order_date >= '2024-01-01' GROUP BY region"
}
// Response
{
"success": true,
"rows": [
{"region": "North", "revenue": 125000.50},
{"region": "South", "revenue": 98750.25}
],
"row_count": 2,
"columns": [
{"name": "region", "type": "VARCHAR"},
{"name": "revenue", "type": "DECIMAL"}
],
"execution_time_ms": 45,
"next_action": "success"
}
Discovery Tools
get_table_info(req: TableInfoRequest)
Detailed table metadata optimized for SQL generation.
// Request
{
"table_key": "sales.orders",
"include_samples": true,
"column_role_filter": ["key", "date", "metric"]
}
find_tables(req: FindTablesRequest)
(Debug Mode)
Fast table discovery by natural language intent.
find_columns(req: FindColumnsRequest)
(Debug Mode)
Column search for SELECT and WHERE clause building.
SQL Assistance Tools
sql_validate(sql: str)
Validate SQL syntax and structure.
sql_auto_transpile_for_database(sql: str)
Automatically detect and convert SQL dialects.
sql_optimize_for_database(sql: str)
Optimize SQL for your database engine.
โ๏ธ Configuration
Database Support
Database | SQLAlchemy Driver | Connection String Example |
---|---|---|
PostgreSQL | psycopg2 | postgresql://user:pass@host:5432/db |
MySQL | pymysql | mysql+pymysql://user:pass@host:3306/db |
SQL Server | pyodbc | mssql+pyodbc://user:pass@host/db?driver=ODBC+Driver+17 |
SQLite | Built-in | sqlite:///path/to/database.db |
Oracle | cx_oracle | oracle://user:pass@host:1521/service |
Snowflake | snowflake-sqlalchemy | snowflake://user:pass@account/db/schema |
Schema Configuration
Control schema analysis behavior:
# Via environment or configuration
NL2SQL_MCP_INCLUDE_SCHEMAS=public,analytics
NL2SQL_MCP_EXCLUDE_SCHEMAS=temp,audit
NL2SQL_MCP_SAMPLE_SIZE=100 # Rows per table for profiling
Performance Tuning
# Memory and processing limits
NL2SQL_MCP_ROW_LIMIT=1000 # Max rows returned
NL2SQL_MCP_MAX_CELL_CHARS=500 # Truncate long text
NL2SQL_MCP_SAMPLE_TIMEOUT=5 # Sampling timeout (seconds)
NL2SQL_MCP_MAX_COLS_FOR_EMBEDDINGS=20 # Column embedding limit
NL2SQL_MCP_REFLECT_TIMEOUT=15 # Reflection timeout per statement (seconds)
NL2SQL_MCP_ENABLE_LIGHTWEIGHT_NER=1 # Toggle NER enrichment during profiling (0 disables)
๐งช Testing and Development
Local Development Setup
# Install development dependencies
uv sync --dev
# Run type checking
uv run basedpyright
# Format and lint code
uv run ruff format .
uv run ruff check --fix .
# Run tests
uv run pytest -v
Testing with Live Database
# Schema intelligence testing
uv run python scripts/test_intelligence_harness.py "show sales by region"
# SQL tools testing
uv run python scripts/test_sqlglot_harness.py "SELECT TOP 10 * FROM customers"
Quality Assurance
- 100% type coverage with basedpyright strict mode
- Comprehensive test suite with pytest
- Zero linting violations with ruff
- Dependency injection for testability
- Pure functions where possible
๐ Advanced Usage
Custom Schema Analysis
from nl2sql_mcp.services import ConfigService, SchemaServiceManager
# Initialize with custom configuration
config = ConfigService()
manager = SchemaServiceManager.get_instance()
# Access schema service after initialization
schema_service = manager.get_schema_service()
result = schema_service.analyze_query_schema(
"Find customers with high lifetime value",
max_tables=10,
expand_strategy="fk_following"
)
Multi-Agent Integration
The MCP server integrates seamlessly with multi-agent frameworks:
# Example with your LLM framework
async def query_database(natural_language_query: str):
# 1. Plan the query
plan_result = await mcp_client.call_tool(
"plan_query_for_intent",
{"request": natural_language_query}
)
# 2. Execute the draft SQL
if plan_result.draft_sql:
execution_result = await mcp_client.call_tool(
"execute_query",
{"sql": plan_result.draft_sql}
)
return execution_result.rows
Code Standards
- Python 3.13+ with strict type checking
- Pydantic models for all data structures
- Pure functions and dependency injection for testability
- Comprehensive docstrings for public APIs
- Zero tolerance for type errors, lint violations, or test failures
๐ข Production Deployment
Environment Variables
# Required
NL2SQL_MCP_DATABASE_URL=postgresql://...
# Optional performance tuning
NL2SQL_MCP_ROW_LIMIT=5000
NL2SQL_MCP_MAX_CELL_CHARS=1000
NL2SQL_MCP_SAMPLE_SIZE=200
Health Monitoring
Monitor server health via the get_init_status()
tool:
READY
: Server fully operationalSTARTING
: Initialization in progressFAILED
: Initialization failed, check logs
๐ Related Research
This implementation incorporates state-of-the-art research in text-to-SQL:
- Multi-agent architectures for complex query decomposition
- Schema linking with bidirectional context understanding
- Error correction frameworks with multi-turn refinement
- Semantic understanding via lightweight NER and role classification
For detailed research context, see .
๐ Acknowledgments
- FastMCP for the excellent MCP server framework
- SQLGlot for multi-dialect SQL parsing and transpilation
- SQLAlchemy for robust database abstraction
- The text-to-SQL research community for advancing the field