hubeet-queryhub-mcpserver

Hubeet-AI/hubeet-queryhub-mcpserver

3.2

If you are the rightful owner of hubeet-queryhub-mcpserver and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Database Schema MCP Server is a sophisticated tool designed to extract, enrich, and index database schemas using OpenAI's LLM and vector embeddings, enabling natural language querying across various database types.

Tools
4
Resources
0
Prompts
0

Database Schema MCP Server

A Model Context Protocol (MCP) server that extracts, enriches, and indexes database schemas using OpenAI's LLM and vector embeddings. This server provides natural language querying capabilities across multiple database types (MySQL, PostgreSQL, SQL Server).

Features

  • Multi-Database Support: Connect to MySQL, PostgreSQL, and SQL Server databases
  • Schema Extraction: Automatically extract table and column metadata
  • LLM Enrichment: Use OpenAI to generate descriptions, examples, and hints
  • Vector Indexing: Store schema metadata in PostgreSQL with pgvector for semantic search
  • Natural Language Queries: Ask questions in plain English and get SQL results
  • MCP Integration: Expose tools through the Model Context Protocol
  • API Authentication: Secure all endpoints with API key authentication
  • Incremental Updates: Smart CRC-based change detection to avoid unnecessary LLM calls

Prerequisites

  • Node.js 18+
  • PostgreSQL with pgvector extension
  • OpenAI API key
  • API key for authentication
  • At least one source database (MySQL, PostgreSQL, or SQL Server)

Installation

  1. Clone the repository:
git clone <repository-url>
cd hubeet-mysql-llmasisted-mcpserver
  1. Install dependencies:
npm install
  1. Set up PostgreSQL with pgvector:

Option A: Using Docker Compose (Recommended)

# Start all required databases
docker-compose up -d

# Wait for databases to be ready
docker-compose ps

Option B: Manual Setup

# Run the setup script
./scripts/setup.sh

# Or manually run the SQL script
psql -U postgres -f scripts/setup-database.sql
  1. Configure environment variables:
cp env.example .env

Edit .env with your configuration:

Authentication

The server requires API key authentication for all endpoints except the health check. See for detailed authentication instructions.

Quick Start with Authentication

  1. Set your API key in the .env file:
API_KEY=your_secure_api_key_here
  1. Use the API with authentication:
# Using Authorization header
curl -X POST http://localhost:3002/api/query \
  -H "Authorization: Bearer your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"question": "show me all countries"}'

# Using X-API-Key header
curl -X POST http://localhost:3002/api/query \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"question": "show me all countries"}'

If using Docker Compose:

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Database URLs (using Docker Compose defaults)
POSTGRES_URL=postgres://postgres:postgres123@localhost:5433/source_db
MYSQL_URL=mysql://mysql_user:mysql123@localhost:3306/source_db
SQLSERVER_URL=mssql://sa:SqlServer123!@localhost:1433/source_db

# Vector Database (PostgreSQL with pgvector)
VECTOR_DB_URL=postgres://postgres:postgres123@localhost:5432/vector_db

# Server Configuration
PORT=3000
NODE_ENV=development

If using manual setup:

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Database URLs (configure at least one source database)
POSTGRES_URL=postgres://user:password@localhost:5432/source_db
MYSQL_URL=mysql://user:password@localhost:3306/source_db
SQLSERVER_URL=mssql://user:password@localhost:1433/source_db

# Vector Database (PostgreSQL with pgvector)
VECTOR_DB_URL=postgres://user:password@localhost:5432/vector_db

# Server Configuration
PORT=3000
NODE_ENV=development

Usage

Starting the Server

# Development mode
npm run dev

# Production mode
npm run build
npm start

Available MCP Tools

1. extractSchema

Extracts and enriches database schema from all connected databases.

Input: None

Output: JSON containing enriched schema with descriptions, examples, and hints.

Example:

{
  "tables": [
    {
      "name": "users",
      "description": "Table storing user information and authentication data",
      "columns": [
        {
          "name": "id",
          "type": "int",
          "description": "Unique identifier for each user",
          "isPrimaryKey": true,
          "isForeignKey": false
        },
        {
          "name": "email",
          "type": "varchar",
          "description": "User's email address, must be unique",
          "isPrimaryKey": false,
          "isForeignKey": false
        }
      ],
      "examples": [
        {"id": 1, "email": "john@example.com"},
        {"id": 2, "email": "jane@example.com"}
      ],
      "hints": "Contains user profile data and authentication information"
    }
  ],
  "extractedAt": "2024-01-15T10:30:00.000Z",
  "version": "1.0.0"
}
2. reindexSchema

Reindexes the current schema into the vector database.

Input: None

Output: Success message with number of tables indexed.

3. reindexAll

Clears and reindexes all schema data in the vector database.

Input: None

Output: Success message with number of tables reindexed.

4. queryNaturalLanguage

Processes a natural language question and returns relevant schema elements, proposed SQL, and execution results.

Input:

{
  "question": "Show me all users with their email addresses"
}

Output:

{
  "relevantElements": {
    "tables": [
      {
        "name": "users",
        "description": "Table storing user information",
        "columnCount": 3,
        "hints": "Contains user profile data"
      }
    ],
    "columns": [
      {
        "name": "email",
        "description": "User's email address",
        "type": "varchar",
        "hints": "Must be unique"
      }
    ]
  },
  "proposedSQL": "SELECT id, email FROM users",
  "executionResults": [
    {"id": 1, "email": "john@example.com"},
    {"id": 2, "email": "jane@example.com"}
  ],
  "executionTime": 150
}

Testing

Run the test suite:

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

API Examples

Using curl to interact with the MCP server

# Extract schema
curl -X POST http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "extractSchema",
      "arguments": {}
    }
  }'

# Query natural language
curl -X POST http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/call",
    "params": {
      "name": "queryNaturalLanguage",
      "arguments": {
        "question": "What tables contain user information?"
      }
    }
  }'

# Reindex all data
curl -X POST http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "reindexAll",
      "arguments": {}
    }
  }'

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Source DBs    │    │   OpenAI API    │    │  Vector DB      │
│                 │    │                 │    │  (PostgreSQL    │
│ • PostgreSQL    │    │ • GPT-4         │    │   + pgvector)   │
│ • MySQL         │    │ • Embeddings    │    │                 │
│ • SQL Server    │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────────────────────────────────────────────────────┐
│                    MCP Server                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐            │
│  │   Schema    │  │     LLM     │  │   Vector    │            │
│  │  Service    │  │   Service   │  │   Service   │            │
│  └─────────────┘  └─────────────┘  └─────────────┘            │
│  ┌─────────────┐  ┌─────────────┐                             │
│  │   Query     │  │   MCP       │                             │
│  │  Service    │  │  Server     │                             │
│  └─────────────┘  └─────────────┘                             │
└─────────────────────────────────────────────────────────────────┘

Security Considerations

  • SQL Injection Protection: All generated SQL is validated for dangerous operations
  • Input Validation: All inputs are validated and sanitized
  • Error Handling: Comprehensive error handling prevents system crashes
  • Connection Security: Database connections use SSL when configured
  • API Key Security: OpenAI API key is loaded from environment variables

Performance Optimization

  • Connection Pooling: Database connections are pooled for efficiency
  • Vector Indexing: pgvector indexes are created for fast similarity search
  • Caching: Schema metadata is cached in the vector database
  • Parallel Processing: Multiple database connections are processed in parallel

Troubleshooting

Common Issues

  1. pgvector extension not found

    CREATE EXTENSION IF NOT EXISTS vector;
    
  2. Database connection failures

    • Check database URLs in .env
    • Verify database credentials
    • Ensure databases are running and accessible
  3. OpenAI API errors

    • Verify API key is correct
    • Check API quota and billing
    • Ensure network connectivity
  4. Vector indexing failures

    • Check vector database permissions
    • Verify pgvector extension is installed
    • Check available disk space

Logs

The server logs important events and errors. Check the console output for:

  • Database connection status
  • Schema extraction progress
  • LLM enrichment results
  • Vector indexing status
  • Query processing results

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Ensure all tests pass
  6. Submit a pull request

License

MIT License - see LICENSE file for details.

Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review the test files for usage examples
  3. Open an issue on GitHub
  4. Contact the development team