nl2sql-mcp

jb3cloud/nl2sql-mcp

3.3

If you are the rightful owner of nl2sql-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The NL2SQL MCP server is a specialized system designed to convert natural language questions into safe, executable SQL queries, specifically focusing on the SELECT statement, ensuring type safety, testability, and compatibility across different databases.

Tools
5
Resources
0
Prompts
0

nl2sql-mcp

Python 3.13+ MCP Protocol Type Checked Code Style: Ruff

A production-ready Model Context Protocol (MCP) server that transforms natural language into safe, executable SQL queries. Built for LLMs with comprehensive type safety, multi-database support, and intelligent schema analysis.

Database Support

This project provides an "all drivers" container image that can connect to the major databases without post-build steps. Drivers are modeled as optional dependencies and the Dockerfile installs the minimal OS libraries required at runtime.

  • PostgreSQL: Python driver psycopg[binary] (v3); OS libs: libpq5.
  • MySQL/MariaDB: Python driver mysqlclient; OS libs: libmariadb3.
  • SQL Server: Python driver pyodbc; OS libs: unixodbc, Microsoft msodbcsql18 (+ optional mssql-tools18).
  • SQLite: stdlib sqlite3 (no extra OS libs).

Local dev installs:

  • All drivers: uv sync --extra drivers-all
  • Specific backends: uv sync --extra postgres --extra mysql (as needed)

Build the all-drivers image:

docker build -t nl2sql-mcp:drivers-all .
docker run --rm -p 8000:8000 nl2sql-mcp:drivers-all

Notes:

  • Debian slim base maximizes prebuilt wheel compatibility. Avoid Alpine for DB stacks.
  • To switch MySQL to a pure-Python driver, replace mysqlclient with mysql-connector-python in pyproject.toml and re-lock with uv lock --extra drivers-all.

๐Ÿš€ Key Features

๐ŸŽฏ LLM-Optimized Intelligence

  • Intent-first query planning with structured schema context and join recommendations
  • Semantic schema analysis with automatic table classification (fact, dimension, bridge, reference)
  • Multi-modal table discovery using lexical matching, embeddings, and graph traversal
  • Column role detection (key, date, metric, category) with semantic tagging

๐Ÿ›ก๏ธ Production-Ready Safety

  • SELECT-only execution with comprehensive SQL validation
  • Dialect normalization and transpilation via SQLGlot
  • Row limits and cell truncation to prevent resource exhaustion
  • Comprehensive error handling with actionable feedback

๐Ÿ”ง Multi-Database Support

  • SQLAlchemy-powered compatibility with PostgreSQL, MySQL, SQL Server, SQLite, Oracle, Snowflake
  • Dialect-aware SQL generation and optimization
  • Cross-platform schema reflection and analysis

๐Ÿ—๏ธ Enterprise Architecture

  • Type-safe Pydantic models throughout
  • Dependency injection for testability
  • Background initialization with graceful degradation
  • Zero hardcoded assumptions - adapts to any database schema

๐Ÿ“ฆ Quick Start

Prerequisites

  • Python 3.13+
  • uv package manager
  • Database with appropriate drivers installed

Installation

# Clone the repository
git clone https://github.com/jb3cloud/nl2sql-mcp.git
cd nl2sql-mcp

# Install dependencies
uv sync

# Configure your database
cp .env.example .env
# Edit .env with your database connection details

Basic Usage

# Start the MCP server
uv run nl2sql-mcp

# Or run directly
uv run python -m nl2sql_mcp.server

# Or run using uvx
uvx --from git+https://github.com/jb3cloud/nl2sql-mcp nl2sql-mcp

๐Ÿ› ๏ธ Makefile Workflow

The repository includes a Makefile that standardizes common tasks. All targets invoke tooling via uv run to ensure a consistent environment.

# Discover available targets and variables
make help

# Run the full Quality Gauntlet (format โ†’ lint โ†’ typecheck โ†’ test)
make quality

# Individual steps (as needed)
make format     # Ruff format
make lint       # Ruff check --fix
make typecheck  # basedpyright (strict)
make test       # pytest
make clean      # Remove caches and build artifacts

Deployment helpers for Azure Container Apps are also provided:

# Create docker container
make docker

# Run docker container
make docker-run

# Verify Azure CLI login and (optionally) set subscription
make preflight AZ_SUBSCRIPTION="00000000-0000-0000-0000-000000000000"

# Build from source and deploy
make publish \
  APP_NAME=my-nl2sql-mcp \
  RESOURCE_GROUP=my-rg \
  ENVIRONMENT=my-aca-env \
  LOCATION=eastus \
  AZ_SUBSCRIPTION="00000000-0000-0000-0000-000000000000" \
  INGRESS=external \
  TARGET_PORT=8000 \
  ENV_ARGS="NL2SQL_MCP_DATABASE_URL=postgresql://user:pass@host:5432/db"

Supported variables (can be provided via environment or CLI): APP_NAME, RESOURCE_GROUP, LOCATION, ENVIRONMENT, AZ_SUBSCRIPTION, INGRESS (default external), TARGET_PORT (default 8000).

Environment Configuration

Create a .env file with your database connection:

# Required: Database connection
NL2SQL_MCP_DATABASE_URL=postgresql://user:pass@localhost:5432/dbname

# Optional: Result limits and debugging
NL2SQL_MCP_ROW_LIMIT=1000
NL2SQL_MCP_MAX_CELL_CHARS=500
NL2SQL_MCP_DEBUG_TOOLS=1  # Enable find_tables/find_columns tools

๐Ÿ›๏ธ Architecture

graph TD
    A[LLM Client] -->|MCP Protocol| B[FastMCP Server]

    subgraph "Core Services"
        B --> C[Schema Service]
        B --> D[SQLGlot Service]
        B --> E[Execution Engine]
    end

    subgraph "Intelligence Layer"
        C --> F[Schema Explorer]
        C --> G[Query Engine]
        F --> H[Reflection & Profiling]
        F --> I[Graph Analysis]
        G --> J[Multi-Modal Retrieval]
        G --> K[Graph Expansion]
    end

    subgraph "Data Layer"
        E --> L[SQLAlchemy Engine]
        H --> L
        L --> M[(Your Database)]
    end

    style A fill:#e1f5fe
    style B fill:#f3e5f5
    style M fill:#e8f5e8

The system follows a two-phase architecture:

  1. Schema Building Phase: Comprehensive database analysis, relationship mapping, and semantic understanding
  2. Query Processing Phase: Real-time natural language to SQL conversion with context-aware planning

๐Ÿ”Œ MCP Tools API

Core Workflow Tools

get_init_status()

Check server readiness and initialization progress.

// Response
{
  "phase": "READY",
  "attempts": 1,
  "started_at": "2024-01-15T10:30:00Z",
  "completed_at": "2024-01-15T10:30:45Z"
}
get_database_overview(req: DatabaseOverviewRequest)

High-level database summary with subject areas.

// Request
{
  "include_subject_areas": true,
  "area_limit": 8
}

// Response
{
  "total_tables": 45,
  "total_schemas": 3,
  "subject_areas": [
    {
      "name": "Sales Analytics",
      "tables": ["orders", "customers", "products"],
      "summary": "Customer orders and product sales data"
    }
  ]
}
plan_query_for_intent(req: PlanQueryRequest)

Intent-first SQL planning with structured guidance.

// Request
{
  "request": "Show monthly revenue by region for 2024",
  "constraints": {
    "time_range": "2024-01-01..2024-12-31",
    "metric": "revenue"
  },
  "detail_level": "standard"
}

// Response
{
  "relevant_tables": [
    {
      "table_key": "sales.orders",
      "relevance_score": 0.95,
      "why_relevant": "Contains revenue data and date columns"
    }
  ],
  "join_plan": [
    ["sales.orders.customer_id", "customers.id"]
  ],
  "main_table": "sales.orders",
  "draft_sql": "SELECT DATE_TRUNC('month', order_date) as month...",
  "confidence": 0.92
}
execute_query(req: ExecuteQueryRequest)

Safe SQL execution with validation and results.

// Request
{
  "sql": "SELECT region, SUM(amount) as revenue FROM sales.orders WHERE order_date >= '2024-01-01' GROUP BY region"
}

// Response
{
  "success": true,
  "rows": [
    {"region": "North", "revenue": 125000.50},
    {"region": "South", "revenue": 98750.25}
  ],
  "row_count": 2,
  "columns": [
    {"name": "region", "type": "VARCHAR"},
    {"name": "revenue", "type": "DECIMAL"}
  ],
  "execution_time_ms": 45,
  "next_action": "success"
}

Discovery Tools

get_table_info(req: TableInfoRequest)

Detailed table metadata optimized for SQL generation.

// Request
{
  "table_key": "sales.orders",
  "include_samples": true,
  "column_role_filter": ["key", "date", "metric"]
}
find_tables(req: FindTablesRequest) (Debug Mode)

Fast table discovery by natural language intent.

find_columns(req: FindColumnsRequest) (Debug Mode)

Column search for SELECT and WHERE clause building.

SQL Assistance Tools

sql_validate(sql: str)

Validate SQL syntax and structure.

sql_auto_transpile_for_database(sql: str)

Automatically detect and convert SQL dialects.

sql_optimize_for_database(sql: str)

Optimize SQL for your database engine.

โš™๏ธ Configuration

Database Support

DatabaseSQLAlchemy DriverConnection String Example
PostgreSQLpsycopg2postgresql://user:pass@host:5432/db
MySQLpymysqlmysql+pymysql://user:pass@host:3306/db
SQL Serverpyodbcmssql+pyodbc://user:pass@host/db?driver=ODBC+Driver+17
SQLiteBuilt-insqlite:///path/to/database.db
Oraclecx_oracleoracle://user:pass@host:1521/service
Snowflakesnowflake-sqlalchemysnowflake://user:pass@account/db/schema

Schema Configuration

Control schema analysis behavior:

# Via environment or configuration
NL2SQL_MCP_INCLUDE_SCHEMAS=public,analytics
NL2SQL_MCP_EXCLUDE_SCHEMAS=temp,audit
NL2SQL_MCP_SAMPLE_SIZE=100  # Rows per table for profiling

Performance Tuning

# Memory and processing limits
NL2SQL_MCP_ROW_LIMIT=1000           # Max rows returned
NL2SQL_MCP_MAX_CELL_CHARS=500       # Truncate long text
NL2SQL_MCP_SAMPLE_TIMEOUT=5         # Sampling timeout (seconds)
NL2SQL_MCP_MAX_COLS_FOR_EMBEDDINGS=20  # Column embedding limit
NL2SQL_MCP_REFLECT_TIMEOUT=15       # Reflection timeout per statement (seconds)
NL2SQL_MCP_ENABLE_LIGHTWEIGHT_NER=1 # Toggle NER enrichment during profiling (0 disables)

๐Ÿงช Testing and Development

Local Development Setup

# Install development dependencies
uv sync --dev

# Run type checking
uv run basedpyright

# Format and lint code
uv run ruff format .
uv run ruff check --fix .

# Run tests
uv run pytest -v

Testing with Live Database

# Schema intelligence testing
uv run python scripts/test_intelligence_harness.py "show sales by region"

# SQL tools testing
uv run python scripts/test_sqlglot_harness.py "SELECT TOP 10 * FROM customers"

Quality Assurance

  • 100% type coverage with basedpyright strict mode
  • Comprehensive test suite with pytest
  • Zero linting violations with ruff
  • Dependency injection for testability
  • Pure functions where possible

๐Ÿ“š Advanced Usage

Custom Schema Analysis

from nl2sql_mcp.services import ConfigService, SchemaServiceManager

# Initialize with custom configuration
config = ConfigService()
manager = SchemaServiceManager.get_instance()

# Access schema service after initialization
schema_service = manager.get_schema_service()
result = schema_service.analyze_query_schema(
    "Find customers with high lifetime value",
    max_tables=10,
    expand_strategy="fk_following"
)

Multi-Agent Integration

The MCP server integrates seamlessly with multi-agent frameworks:

# Example with your LLM framework
async def query_database(natural_language_query: str):
    # 1. Plan the query
    plan_result = await mcp_client.call_tool(
        "plan_query_for_intent",
        {"request": natural_language_query}
    )

    # 2. Execute the draft SQL
    if plan_result.draft_sql:
        execution_result = await mcp_client.call_tool(
            "execute_query",
            {"sql": plan_result.draft_sql}
        )
        return execution_result.rows

Code Standards

  • Python 3.13+ with strict type checking
  • Pydantic models for all data structures
  • Pure functions and dependency injection for testability
  • Comprehensive docstrings for public APIs
  • Zero tolerance for type errors, lint violations, or test failures

๐Ÿข Production Deployment

Environment Variables

# Required
NL2SQL_MCP_DATABASE_URL=postgresql://...

# Optional performance tuning
NL2SQL_MCP_ROW_LIMIT=5000
NL2SQL_MCP_MAX_CELL_CHARS=1000
NL2SQL_MCP_SAMPLE_SIZE=200

Health Monitoring

Monitor server health via the get_init_status() tool:

  • READY: Server fully operational
  • STARTING: Initialization in progress
  • FAILED: Initialization failed, check logs

๐Ÿ“– Related Research

This implementation incorporates state-of-the-art research in text-to-SQL:

  • Multi-agent architectures for complex query decomposition
  • Schema linking with bidirectional context understanding
  • Error correction frameworks with multi-turn refinement
  • Semantic understanding via lightweight NER and role classification

For detailed research context, see .

๐Ÿ™ Acknowledgments

  • FastMCP for the excellent MCP server framework
  • SQLGlot for multi-dialect SQL parsing and transpilation
  • SQLAlchemy for robust database abstraction
  • The text-to-SQL research community for advancing the field