sql-context

johnhnguyen97/sql-context

3.1

If you are the rightful owner of sql-context and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The SQL-Context: Semantic T-SQL Analysis & MCP Server provides intelligent semantic search and analysis for T-SQL stored procedures using a hybrid search approach.

SQL-Context: Semantic T-SQL Analysis & MCP Server

Intelligent semantic search and analysis for T-SQL stored procedures using hybrid search (BM25 + vector embeddings). Built as a Model Context Protocol (MCP) server for AI assistant integration.

🎯 Project Goal

Return only relevant context, not entire procedures - inspired by mem0ai's "90% lower token usage" through selective retrieval. Returns metadata + top 5 most relevant chunks per query.

✨ Features

Currently Working

  • T-SQL Parser - Extracts procedure metadata (name, parameters, variables)
  • Table Reference Detection - Identifies all tables used
  • Linked Server Detection - Finds OPENQUERY calls
  • Complexity Metrics - Line count, variable count, complexity scoring
  • Categorized Variables - Groups by UTC, date/time, dynamic SQL, formatting
  • Comprehensive Analysis Report - Beautiful CLI output with chalk formatting

In Progress (Fixed but Testing)

  • 🔧 Dynamic SQL Extraction - Reconstructs @SQL variables (fixed infinite loop bug)
  • 🔧 CTE Extraction - Parses Common Table Expressions
  • 🔧 Template Placeholder Detection - Finds /*__PLACEHOLDER__*/ patterns

Planned Enhancements

  • 📋 Control Flow Analysis - IF/ELSE, WHILE, BEGIN/END nesting depth
  • 📋 Error Handling Detection - TRY/CATCH blocks, RAISERROR
  • 📋 Transaction Analysis - BEGIN TRAN, COMMIT, ROLLBACK tracking
  • 📋 Performance Anti-Patterns - SELECT *, N+1 queries, missing WHERE
  • 📋 Security Vulnerability Detection - SQL injection risks
  • 📋 Data Flow Visualization - Mermaid diagrams (input → processing → output)
  • 📋 Code Quality Scoring - Cyclomatic complexity, maintainability index

🚀 Quick Start

Installation

npm install
npm run build

Run Analysis

node dist/test/analyze-procedure.js

This will analyze Stored_Procedure_Test.sql and generate a comprehensive report showing:

  • Executive summary of procedure purpose
  • Input parameters with types and defaults
  • Declared variables (categorized by function)
  • Data sources (tables, linked servers)
  • Special logic (UTC conversion, shift-aware filtering)
  • Complexity metrics

📊 Example Output

╔════════════════════════════════════════════════════════════════════════════════╗
║                               Executive Summary                                ║
╚════════════════════════════════════════════════════════════════════════════════╝

This stored procedure, [SSRS_Item_Status_Ver5], is a complex reporting query designed
to aggregate item status and production metrics from two primary data sources: a local
SQL Server database and a linked Apriso server (MES).

╔════════════════════════════════════════════════════════════════════════════════╗
║                                Input Parameters                                ║
╚════════════════════════════════════════════════════════════════════════════════╝
  • SubjectDate: DATETIME
  • Available_item: BIT
  • FromTime: TIME (Default: NULL)
  • ToTime: TIME (Default: NULL)
  • GetTimeList: BIT (Default: 0)

🏗️ Architecture

Parser Pipeline

SQL File → Parser → Extractors → Structured Data → Analysis
                      ├─ Procedure Header Extractor
                      ├─ Variable Extractor
                      ├─ Dynamic SQL Extractor
                      └─ CTE Extractor

Key Design Decisions

  1. Mutable ParseContext - Extractors modify shared context instead of returning values
  2. Ordered Extractor Pipeline - Header → Variables → Dynamic SQL → CTEs
  3. Graceful Degradation - Fallback parser if main pipeline fails
  4. ES Modules - Uses import.meta.url for __dirname compatibility
  5. Regex-Based Parsing - No mature T-SQL AST parser exists for Node.js

🐛 Bugs Fixed in This Session

1. __dirname is not defined Error

Problem: ES modules don't provide __dirname global Fix: Added ES module pattern:

import { fileURLToPath } from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

2. Parser Returning "Unknown" Procedure Name

Problem: PROCEDURE_HEADER_PATTERN only had 1 capture group, but code accessed match[2] and match[3] Fix: Updated pattern to capture full name, procedure name, and parameters separately:

/(?:CREATE|ALTER)\s+PROCEDURE\s+((?:\[?[^\]]+\]?\.)?(\[?[^\s\]]+\]?))([\s\S]*?)(?=AS\s+BEGIN|AS\s*\n\s*BEGIN)/gi

3. .match() vs .exec() with Global Regex

Problem: Using .match() with /gi flag returns only full matches, no capture groups Fix: Changed to .exec() with lastIndex reset:

PROCEDURE_HEADER_PATTERN.lastIndex = 0;
const headerMatch = PROCEDURE_HEADER_PATTERN.exec(sql);

4. Dynamic SQL Extractor Infinite Loop

Problem: Variable name captured without @ prefix, causing regex mismatch and infinite loop Fix: Added @ prefix when storing variable names:

declaredVariables.add('@' + match[1]);

📂 Project Structure

sql-context/
├── src/
│   ├── parser/
│   │   ├── patterns.ts              # Regex patterns for T-SQL elements
│   │   ├── tsql-parser.ts           # Main parser orchestrator
│   │   ├── chunker.ts               # Semantic chunking logic
│   │   └── extractors/
│   │       ├── procedure-header.ts  # Extract name & parameters
│   │       ├── variable-extractor.ts # Extract DECLARE statements
│   │       ├── dynamic-sql-extractor.ts # Extract @SQL variables
│   │       └── cte-extractor.ts     # Extract CTEs
│   ├── types/
│   │   └── index.ts                 # TypeScript type definitions
│   ├── embeddings/                  # Vector embedding generation
│   ├── storage/                     # SQLite database layer
│   ├── search/                      # Hybrid search (BM25 + vector)
│   └── tools/                       # MCP tool implementations
├── test/
│   └── analyze-procedure.ts         # Comprehensive analysis script
├── Stored_Procedure_Test.sql        # Example 914-line procedure
├── package.json
├── tsconfig.json
└── README.md

🔧 Technical Stack

  • TypeScript - ES2022, strict mode, Node16 module resolution
  • ES Modules - "type": "module" in package.json
  • SQLite - better-sqlite3 (synchronous API)
  • Embeddings - @xenova/transformers (all-MiniLM-L6-v2, 384 dims)
  • Search - FTS5 for BM25, JavaScript cosine similarity for vectors
  • MCP - Model Context Protocol for AI assistant integration
  • Formatting - Chalk for beautiful CLI output

📝 Configuration

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ES2022",
    "moduleResolution": "Node16",
    "rootDir": ".",
    "outDir": "./dist",
    "strict": true
  }
}

🧪 Testing

The analyze-procedure.ts script provides comprehensive testing:

npm run build
node dist/test/analyze-procedure.js

Tests the parser against Stored_Procedure_Test.sql - a real 914-line procedure with:

  • 5 input parameters
  • 35+ declared variables
  • 8 dynamic SQL variables (@SQL01-@SQL05, @SQL11-12, @SQL15)
  • Multiple CTEs
  • OPENQUERY for linked server access
  • UTC timezone conversion logic
  • Shift-aware time filtering (cross-midnight shifts)

🎨 Features Roadmap

Phase 1: Core Parsing (Complete ✅)

  • Procedure header extraction
  • Parameter parsing
  • Variable extraction
  • Table reference detection
  • OPENQUERY detection

Phase 2: Advanced Extraction (In Progress 🔧)

  • Dynamic SQL reconstruction (fixed)
  • CTE extraction (implemented, needs testing)
  • Template placeholder detection
  • Control flow analysis
  • Error handling detection

Phase 3: Analysis & Insights (Planned 📋)

  • Performance anti-pattern detection
  • Security vulnerability scanning
  • Code quality metrics
  • Dependency graph visualization
  • Refactoring suggestions

Phase 4: MCP Server Integration (Planned 📋)

  • index_sql_files tool
  • search_procedures tool
  • get_procedure tool
  • analyze_dependencies tool
  • refresh_index tool

📚 Documentation

  • CLAUDE.md - Project guidelines for AI assistants
  • ARCHITECTURE.md - Technical implementation details (planned)
  • MEM0-INSIGHTS.md - Design decisions from mem0ai research (planned)

🤝 Contributing

This project was developed through AI-assisted pair programming with Claude Code and Sage MCP, using:

  • Gemini CLI - Parser and chunker implementation
  • Claude CLI - Embeddings and search system
  • Debug workflows - Systematic troubleshooting

📄 License

MIT

🙏 Acknowledgments

  • Inspired by mem0ai's selective retrieval approach
  • Built with Sage MCP for multi-model orchestration
  • Uses Context7 for up-to-date library documentation