sql-context by johnhnguyen97 - MCP Server

SQL-Context: Semantic T-SQL Analysis & MCP Server

Intelligent semantic search and analysis for T-SQL stored procedures using hybrid search (BM25 + vector embeddings). Built as a Model Context Protocol (MCP) server for AI assistant integration.

🎯 Project Goal

Return only relevant context, not entire procedures - inspired by mem0ai's "90% lower token usage" through selective retrieval. Returns metadata + top 5 most relevant chunks per query.

✨ Features

Currently Working

✅ T-SQL Parser - Extracts procedure metadata (name, parameters, variables)
✅ Table Reference Detection - Identifies all tables used
✅ Linked Server Detection - Finds OPENQUERY calls
✅ Complexity Metrics - Line count, variable count, complexity scoring
✅ Categorized Variables - Groups by UTC, date/time, dynamic SQL, formatting
✅ Comprehensive Analysis Report - Beautiful CLI output with chalk formatting

In Progress (Fixed but Testing)

🔧 Dynamic SQL Extraction - Reconstructs @SQL variables (fixed infinite loop bug)
🔧 CTE Extraction - Parses Common Table Expressions
🔧 Template Placeholder Detection - Finds /*__PLACEHOLDER__*/ patterns

Planned Enhancements

📋 Control Flow Analysis - IF/ELSE, WHILE, BEGIN/END nesting depth
📋 Error Handling Detection - TRY/CATCH blocks, RAISERROR
📋 Transaction Analysis - BEGIN TRAN, COMMIT, ROLLBACK tracking
📋 Performance Anti-Patterns - SELECT *, N+1 queries, missing WHERE
📋 Security Vulnerability Detection - SQL injection risks
📋 Data Flow Visualization - Mermaid diagrams (input → processing → output)
📋 Code Quality Scoring - Cyclomatic complexity, maintainability index

🚀 Quick Start

Installation

npm install
npm run build

Run Analysis

node dist/test/analyze-procedure.js

This will analyze Stored_Procedure_Test.sql and generate a comprehensive report showing:

Executive summary of procedure purpose
Input parameters with types and defaults
Declared variables (categorized by function)
Data sources (tables, linked servers)
Special logic (UTC conversion, shift-aware filtering)
Complexity metrics

📊 Example Output

╔════════════════════════════════════════════════════════════════════════════════╗
║                               Executive Summary                                ║
╚════════════════════════════════════════════════════════════════════════════════╝

This stored procedure, [SSRS_Item_Status_Ver5], is a complex reporting query designed
to aggregate item status and production metrics from two primary data sources: a local
SQL Server database and a linked Apriso server (MES).

╔════════════════════════════════════════════════════════════════════════════════╗
║                                Input Parameters                                ║
╚════════════════════════════════════════════════════════════════════════════════╝
  • SubjectDate: DATETIME
  • Available_item: BIT
  • FromTime: TIME (Default: NULL)
  • ToTime: TIME (Default: NULL)
  • GetTimeList: BIT (Default: 0)

🏗️ Architecture

Parser Pipeline

SQL File → Parser → Extractors → Structured Data → Analysis
                      ├─ Procedure Header Extractor
                      ├─ Variable Extractor
                      ├─ Dynamic SQL Extractor
                      └─ CTE Extractor

Key Design Decisions

Mutable ParseContext - Extractors modify shared context instead of returning values
Ordered Extractor Pipeline - Header → Variables → Dynamic SQL → CTEs
Graceful Degradation - Fallback parser if main pipeline fails
ES Modules - Uses import.meta.url for __dirname compatibility
Regex-Based Parsing - No mature T-SQL AST parser exists for Node.js

🐛 Bugs Fixed in This Session

1. `__dirname is not defined` Error

Problem: ES modules don't provide __dirname global Fix: Added ES module pattern:

import { fileURLToPath } from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

2. Parser Returning "Unknown" Procedure Name

Problem: PROCEDURE_HEADER_PATTERN only had 1 capture group, but code accessed match[2] and match[3] Fix: Updated pattern to capture full name, procedure name, and parameters separately:

/(?:CREATE|ALTER)\s+PROCEDURE\s+((?:\[?[^\]]+\]?\.)?(\[?[^\s\]]+\]?))([\s\S]*?)(?=AS\s+BEGIN|AS\s*\n\s*BEGIN)/gi

3. `.match()` vs `.exec()` with Global Regex

Problem: Using .match() with /gi flag returns only full matches, no capture groups Fix: Changed to .exec() with lastIndex reset:

PROCEDURE_HEADER_PATTERN.lastIndex = 0;
const headerMatch = PROCEDURE_HEADER_PATTERN.exec(sql);

4. Dynamic SQL Extractor Infinite Loop

Problem: Variable name captured without @ prefix, causing regex mismatch and infinite loop Fix: Added @ prefix when storing variable names:

declaredVariables.add('@' + match[1]);

📂 Project Structure

sql-context/
├── src/
│   ├── parser/
│   │   ├── patterns.ts              # Regex patterns for T-SQL elements
│   │   ├── tsql-parser.ts           # Main parser orchestrator
│   │   ├── chunker.ts               # Semantic chunking logic
│   │   └── extractors/
│   │       ├── procedure-header.ts  # Extract name & parameters
│   │       ├── variable-extractor.ts # Extract DECLARE statements
│   │       ├── dynamic-sql-extractor.ts # Extract @SQL variables
│   │       └── cte-extractor.ts     # Extract CTEs
│   ├── types/
│   │   └── index.ts                 # TypeScript type definitions
│   ├── embeddings/                  # Vector embedding generation
│   ├── storage/                     # SQLite database layer
│   ├── search/                      # Hybrid search (BM25 + vector)
│   └── tools/                       # MCP tool implementations
├── test/
│   └── analyze-procedure.ts         # Comprehensive analysis script
├── Stored_Procedure_Test.sql        # Example 914-line procedure
├── package.json
├── tsconfig.json
└── README.md

🔧 Technical Stack

TypeScript - ES2022, strict mode, Node16 module resolution
ES Modules - "type": "module" in package.json
SQLite - better-sqlite3 (synchronous API)
Embeddings - @xenova/transformers (all-MiniLM-L6-v2, 384 dims)
Search - FTS5 for BM25, JavaScript cosine similarity for vectors
MCP - Model Context Protocol for AI assistant integration
Formatting - Chalk for beautiful CLI output

📝 Configuration

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ES2022",
    "moduleResolution": "Node16",
    "rootDir": ".",
    "outDir": "./dist",
    "strict": true
  }
}

🧪 Testing

The analyze-procedure.ts script provides comprehensive testing:

npm run build
node dist/test/analyze-procedure.js

Tests the parser against Stored_Procedure_Test.sql - a real 914-line procedure with:

5 input parameters
35+ declared variables
8 dynamic SQL variables (@SQL01-@SQL05, @SQL11-12, @SQL15)
Multiple CTEs
OPENQUERY for linked server access
UTC timezone conversion logic
Shift-aware time filtering (cross-midnight shifts)

🎨 Features Roadmap

Phase 1: Core Parsing (Complete ✅)

Phase 2: Advanced Extraction (In Progress 🔧)

Dynamic SQL reconstruction (fixed)
CTE extraction (implemented, needs testing)
Template placeholder detection
Control flow analysis
Error handling detection

Phase 3: Analysis & Insights (Planned 📋)

Phase 4: MCP Server Integration (Planned 📋)

📚 Documentation

CLAUDE.md - Project guidelines for AI assistants
ARCHITECTURE.md - Technical implementation details (planned)
MEM0-INSIGHTS.md - Design decisions from mem0ai research (planned)

🤝 Contributing

This project was developed through AI-assisted pair programming with Claude Code and Sage MCP, using:

Gemini CLI - Parser and chunker implementation
Claude CLI - Embeddings and search system
Debug workflows - Systematic troubleshooting

📄 License

MIT

🙏 Acknowledgments

Inspired by mem0ai's selective retrieval approach
Built with Sage MCP for multi-model orchestration
Uses Context7 for up-to-date library documentation