hubeet-queryhub-mcpserver by Hubeet-AI - MCP Server

Database Schema MCP Server

A Model Context Protocol (MCP) server that extracts, enriches, and indexes database schemas using OpenAI's LLM and vector embeddings. This server provides natural language querying capabilities across multiple database types (MySQL, PostgreSQL, SQL Server).

Features

Multi-Database Support: Connect to MySQL, PostgreSQL, and SQL Server databases
Schema Extraction: Automatically extract table and column metadata
LLM Enrichment: Use OpenAI to generate descriptions, examples, and hints
Vector Indexing: Store schema metadata in PostgreSQL with pgvector for semantic search
Natural Language Queries: Ask questions in plain English and get SQL results
MCP Integration: Expose tools through the Model Context Protocol
API Authentication: Secure all endpoints with API key authentication
Incremental Updates: Smart CRC-based change detection to avoid unnecessary LLM calls

Prerequisites

Node.js 18+
PostgreSQL with pgvector extension
OpenAI API key
API key for authentication
At least one source database (MySQL, PostgreSQL, or SQL Server)

Installation

Clone the repository:

git clone <repository-url>
cd hubeet-mysql-llmasisted-mcpserver

Install dependencies:

npm install

Set up PostgreSQL with pgvector:

Option A: Using Docker Compose (Recommended)

# Start all required databases
docker-compose up -d

# Wait for databases to be ready
docker-compose ps

Option B: Manual Setup

# Run the setup script
./scripts/setup.sh

# Or manually run the SQL script
psql -U postgres -f scripts/setup-database.sql

Configure environment variables:

cp env.example .env

Edit .env with your configuration:

Authentication

The server requires API key authentication for all endpoints except the health check. See for detailed authentication instructions.

Quick Start with Authentication

Set your API key in the .env file:

API_KEY=your_secure_api_key_here

Use the API with authentication:

# Using Authorization header
curl -X POST http://localhost:3002/api/query \
  -H "Authorization: Bearer your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"question": "show me all countries"}'

# Using X-API-Key header
curl -X POST http://localhost:3002/api/query \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"question": "show me all countries"}'

If using Docker Compose:

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Database URLs (using Docker Compose defaults)
POSTGRES_URL=postgres://postgres:postgres123@localhost:5433/source_db
MYSQL_URL=mysql://mysql_user:mysql123@localhost:3306/source_db
SQLSERVER_URL=mssql://sa:SqlServer123!@localhost:1433/source_db

# Vector Database (PostgreSQL with pgvector)
VECTOR_DB_URL=postgres://postgres:postgres123@localhost:5432/vector_db

# Server Configuration
PORT=3000
NODE_ENV=development

If using manual setup:

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Database URLs (configure at least one source database)
POSTGRES_URL=postgres://user:password@localhost:5432/source_db
MYSQL_URL=mysql://user:password@localhost:3306/source_db
SQLSERVER_URL=mssql://user:password@localhost:1433/source_db

# Vector Database (PostgreSQL with pgvector)
VECTOR_DB_URL=postgres://user:password@localhost:5432/vector_db

# Server Configuration
PORT=3000
NODE_ENV=development

Usage

Starting the Server

# Development mode
npm run dev

# Production mode
npm run build
npm start

Available MCP Tools

1. extractSchema

Extracts and enriches database schema from all connected databases.

Input: None

Output: JSON containing enriched schema with descriptions, examples, and hints.

Example:

{
  "tables": [
    {
      "name": "users",
      "description": "Table storing user information and authentication data",
      "columns": [
        {
          "name": "id",
          "type": "int",
          "description": "Unique identifier for each user",
          "isPrimaryKey": true,
          "isForeignKey": false
        },
        {
          "name": "email",
          "type": "varchar",
          "description": "User's email address, must be unique",
          "isPrimaryKey": false,
          "isForeignKey": false
        }
      ],
      "examples": [
        {"id": 1, "email": "john@example.com"},
        {"id": 2, "email": "jane@example.com"}
      ],
      "hints": "Contains user profile data and authentication information"
    }
  ],
  "extractedAt": "2024-01-15T10:30:00.000Z",
  "version": "1.0.0"
}

2. reindexSchema

Reindexes the current schema into the vector database.

Input: None

Output: Success message with number of tables indexed.

3. reindexAll

Clears and reindexes all schema data in the vector database.

Input: None

Output: Success message with number of tables reindexed.

4. queryNaturalLanguage

Processes a natural language question and returns relevant schema elements, proposed SQL, and execution results.

Input:

{
  "question": "Show me all users with their email addresses"
}

Output:

{
  "relevantElements": {
    "tables": [
      {
        "name": "users",
        "description": "Table storing user information",
        "columnCount": 3,
        "hints": "Contains user profile data"
      }
    ],
    "columns": [
      {
        "name": "email",
        "description": "User's email address",
        "type": "varchar",
        "hints": "Must be unique"
      }
    ]
  },
  "proposedSQL": "SELECT id, email FROM users",
  "executionResults": [
    {"id": 1, "email": "john@example.com"},
    {"id": 2, "email": "jane@example.com"}
  ],
  "executionTime": 150
}

Testing

Run the test suite:

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

API Examples

Using curl to interact with the MCP server

# Extract schema
curl -X POST http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "extractSchema",
      "arguments": {}
    }
  }'

# Query natural language
curl -X POST http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/call",
    "params": {
      "name": "queryNaturalLanguage",
      "arguments": {
        "question": "What tables contain user information?"
      }
    }
  }'

# Reindex all data
curl -X POST http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "reindexAll",
      "arguments": {}
    }
  }'

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Source DBs    │    │   OpenAI API    │    │  Vector DB      │
│                 │    │                 │    │  (PostgreSQL    │
│ • PostgreSQL    │    │ • GPT-4         │    │   + pgvector)   │
│ • MySQL         │    │ • Embeddings    │    │                 │
│ • SQL Server    │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────────────────────────────────────────────────────┐
│                    MCP Server                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐            │
│  │   Schema    │  │     LLM     │  │   Vector    │            │
│  │  Service    │  │   Service   │  │   Service   │            │
│  └─────────────┘  └─────────────┘  └─────────────┘            │
│  ┌─────────────┐  ┌─────────────┐                             │
│  │   Query     │  │   MCP       │                             │
│  │  Service    │  │  Server     │                             │
│  └─────────────┘  └─────────────┘                             │
└─────────────────────────────────────────────────────────────────┘

Security Considerations

SQL Injection Protection: All generated SQL is validated for dangerous operations
Input Validation: All inputs are validated and sanitized
Error Handling: Comprehensive error handling prevents system crashes
Connection Security: Database connections use SSL when configured
API Key Security: OpenAI API key is loaded from environment variables

Performance Optimization

Connection Pooling: Database connections are pooled for efficiency
Vector Indexing: pgvector indexes are created for fast similarity search
Caching: Schema metadata is cached in the vector database
Parallel Processing: Multiple database connections are processed in parallel

Troubleshooting

Common Issues

pgvector extension not found
```
CREATE EXTENSION IF NOT EXISTS vector;
```
Database connection failures
- Check database URLs in .env
- Verify database credentials
- Ensure databases are running and accessible
OpenAI API errors
- Verify API key is correct
- Check API quota and billing
- Ensure network connectivity
Vector indexing failures
- Check vector database permissions
- Verify pgvector extension is installed
- Check available disk space

Logs

The server logs important events and errors. Check the console output for:

Database connection status
Schema extraction progress
LLM enrichment results
Vector indexing status
Query processing results

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

MIT License - see LICENSE file for details.

Support

For issues and questions:

Check the troubleshooting section
Review the test files for usage examples
Open an issue on GitHub
Contact the development team