johnhnguyen97/sql-context
If you are the rightful owner of sql-context and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The SQL-Context: Semantic T-SQL Analysis & MCP Server provides intelligent semantic search and analysis for T-SQL stored procedures using a hybrid search approach.
SQL-Context: Semantic T-SQL Analysis & MCP Server
Intelligent semantic search and analysis for T-SQL stored procedures using hybrid search (BM25 + vector embeddings). Built as a Model Context Protocol (MCP) server for AI assistant integration.
🎯 Project Goal
Return only relevant context, not entire procedures - inspired by mem0ai's "90% lower token usage" through selective retrieval. Returns metadata + top 5 most relevant chunks per query.
✨ Features
Currently Working
- ✅ T-SQL Parser - Extracts procedure metadata (name, parameters, variables)
- ✅ Table Reference Detection - Identifies all tables used
- ✅ Linked Server Detection - Finds OPENQUERY calls
- ✅ Complexity Metrics - Line count, variable count, complexity scoring
- ✅ Categorized Variables - Groups by UTC, date/time, dynamic SQL, formatting
- ✅ Comprehensive Analysis Report - Beautiful CLI output with chalk formatting
In Progress (Fixed but Testing)
- 🔧 Dynamic SQL Extraction - Reconstructs @SQL variables (fixed infinite loop bug)
- 🔧 CTE Extraction - Parses Common Table Expressions
- 🔧 Template Placeholder Detection - Finds
/*__PLACEHOLDER__*/patterns
Planned Enhancements
- 📋 Control Flow Analysis - IF/ELSE, WHILE, BEGIN/END nesting depth
- 📋 Error Handling Detection - TRY/CATCH blocks, RAISERROR
- 📋 Transaction Analysis - BEGIN TRAN, COMMIT, ROLLBACK tracking
- 📋 Performance Anti-Patterns - SELECT *, N+1 queries, missing WHERE
- 📋 Security Vulnerability Detection - SQL injection risks
- 📋 Data Flow Visualization - Mermaid diagrams (input → processing → output)
- 📋 Code Quality Scoring - Cyclomatic complexity, maintainability index
🚀 Quick Start
Installation
npm install
npm run build
Run Analysis
node dist/test/analyze-procedure.js
This will analyze Stored_Procedure_Test.sql and generate a comprehensive report showing:
- Executive summary of procedure purpose
- Input parameters with types and defaults
- Declared variables (categorized by function)
- Data sources (tables, linked servers)
- Special logic (UTC conversion, shift-aware filtering)
- Complexity metrics
📊 Example Output
╔════════════════════════════════════════════════════════════════════════════════╗
║ Executive Summary ║
╚════════════════════════════════════════════════════════════════════════════════╝
This stored procedure, [SSRS_Item_Status_Ver5], is a complex reporting query designed
to aggregate item status and production metrics from two primary data sources: a local
SQL Server database and a linked Apriso server (MES).
╔════════════════════════════════════════════════════════════════════════════════╗
║ Input Parameters ║
╚════════════════════════════════════════════════════════════════════════════════╝
• SubjectDate: DATETIME
• Available_item: BIT
• FromTime: TIME (Default: NULL)
• ToTime: TIME (Default: NULL)
• GetTimeList: BIT (Default: 0)
🏗️ Architecture
Parser Pipeline
SQL File → Parser → Extractors → Structured Data → Analysis
├─ Procedure Header Extractor
├─ Variable Extractor
├─ Dynamic SQL Extractor
└─ CTE Extractor
Key Design Decisions
- Mutable ParseContext - Extractors modify shared context instead of returning values
- Ordered Extractor Pipeline - Header → Variables → Dynamic SQL → CTEs
- Graceful Degradation - Fallback parser if main pipeline fails
- ES Modules - Uses
import.meta.urlfor__dirnamecompatibility - Regex-Based Parsing - No mature T-SQL AST parser exists for Node.js
🐛 Bugs Fixed in This Session
1. __dirname is not defined Error
Problem: ES modules don't provide __dirname global
Fix: Added ES module pattern:
import { fileURLToPath } from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
2. Parser Returning "Unknown" Procedure Name
Problem: PROCEDURE_HEADER_PATTERN only had 1 capture group, but code accessed match[2] and match[3]
Fix: Updated pattern to capture full name, procedure name, and parameters separately:
/(?:CREATE|ALTER)\s+PROCEDURE\s+((?:\[?[^\]]+\]?\.)?(\[?[^\s\]]+\]?))([\s\S]*?)(?=AS\s+BEGIN|AS\s*\n\s*BEGIN)/gi
3. .match() vs .exec() with Global Regex
Problem: Using .match() with /gi flag returns only full matches, no capture groups
Fix: Changed to .exec() with lastIndex reset:
PROCEDURE_HEADER_PATTERN.lastIndex = 0;
const headerMatch = PROCEDURE_HEADER_PATTERN.exec(sql);
4. Dynamic SQL Extractor Infinite Loop
Problem: Variable name captured without @ prefix, causing regex mismatch and infinite loop
Fix: Added @ prefix when storing variable names:
declaredVariables.add('@' + match[1]);
📂 Project Structure
sql-context/
├── src/
│ ├── parser/
│ │ ├── patterns.ts # Regex patterns for T-SQL elements
│ │ ├── tsql-parser.ts # Main parser orchestrator
│ │ ├── chunker.ts # Semantic chunking logic
│ │ └── extractors/
│ │ ├── procedure-header.ts # Extract name & parameters
│ │ ├── variable-extractor.ts # Extract DECLARE statements
│ │ ├── dynamic-sql-extractor.ts # Extract @SQL variables
│ │ └── cte-extractor.ts # Extract CTEs
│ ├── types/
│ │ └── index.ts # TypeScript type definitions
│ ├── embeddings/ # Vector embedding generation
│ ├── storage/ # SQLite database layer
│ ├── search/ # Hybrid search (BM25 + vector)
│ └── tools/ # MCP tool implementations
├── test/
│ └── analyze-procedure.ts # Comprehensive analysis script
├── Stored_Procedure_Test.sql # Example 914-line procedure
├── package.json
├── tsconfig.json
└── README.md
🔧 Technical Stack
- TypeScript - ES2022, strict mode, Node16 module resolution
- ES Modules -
"type": "module"in package.json - SQLite - better-sqlite3 (synchronous API)
- Embeddings - @xenova/transformers (all-MiniLM-L6-v2, 384 dims)
- Search - FTS5 for BM25, JavaScript cosine similarity for vectors
- MCP - Model Context Protocol for AI assistant integration
- Formatting - Chalk for beautiful CLI output
📝 Configuration
{
"compilerOptions": {
"target": "ES2022",
"module": "ES2022",
"moduleResolution": "Node16",
"rootDir": ".",
"outDir": "./dist",
"strict": true
}
}
🧪 Testing
The analyze-procedure.ts script provides comprehensive testing:
npm run build
node dist/test/analyze-procedure.js
Tests the parser against Stored_Procedure_Test.sql - a real 914-line procedure with:
- 5 input parameters
- 35+ declared variables
- 8 dynamic SQL variables (@SQL01-@SQL05, @SQL11-12, @SQL15)
- Multiple CTEs
- OPENQUERY for linked server access
- UTC timezone conversion logic
- Shift-aware time filtering (cross-midnight shifts)
🎨 Features Roadmap
Phase 1: Core Parsing (Complete ✅)
- Procedure header extraction
- Parameter parsing
- Variable extraction
- Table reference detection
- OPENQUERY detection
Phase 2: Advanced Extraction (In Progress 🔧)
- Dynamic SQL reconstruction (fixed)
- CTE extraction (implemented, needs testing)
- Template placeholder detection
- Control flow analysis
- Error handling detection
Phase 3: Analysis & Insights (Planned 📋)
- Performance anti-pattern detection
- Security vulnerability scanning
- Code quality metrics
- Dependency graph visualization
- Refactoring suggestions
Phase 4: MCP Server Integration (Planned 📋)
-
index_sql_filestool -
search_procedurestool -
get_proceduretool -
analyze_dependenciestool -
refresh_indextool
📚 Documentation
CLAUDE.md- Project guidelines for AI assistantsARCHITECTURE.md- Technical implementation details (planned)MEM0-INSIGHTS.md- Design decisions from mem0ai research (planned)
🤝 Contributing
This project was developed through AI-assisted pair programming with Claude Code and Sage MCP, using:
- Gemini CLI - Parser and chunker implementation
- Claude CLI - Embeddings and search system
- Debug workflows - Systematic troubleshooting
📄 License
MIT
🙏 Acknowledgments
- Inspired by mem0ai's selective retrieval approach
- Built with Sage MCP for multi-model orchestration
- Uses Context7 for up-to-date library documentation