timlawrenz/expert-enigma
If you are the rightful owner of expert-enigma and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A next-generation, local-first Model Context Protocol (MCP) server for Ruby repositories, leveraging Graph Neural Networks (GNNs) for deep code understanding.
expert-enigma: A GNN-Powered Model Context Protocol Server for Ruby
A next-generation, local-first Model Context Protocol (MCP) server for Ruby repositories. It uses Graph Neural Networks (GNNs) to provide LLMs and coding agents with a deep, structural understanding of code, far beyond simple text analysis.
The Problem
Modern LLM-based coding agents are powerful, but they often lack a true understanding of a project's architecture. When analyzing dynamic languages like Ruby, they rely on text-based heuristics and miss the rich structural relationships within the code (inheritance, method calls, composition). This leads to shallow, context-poor responses.
The Solution
This project provides a highly intelligent context server that speaks the standard MCP language. Instead of just parsing text, it transforms Ruby code into a graph and uses a Graph Neural Network to create sophisticated embeddings that capture the code's structure and intent.
The core innovation is leveraging the research and models from the jubilant-palm-tree project, which demonstrated that GNNs can learn meaningful representations of Ruby ASTs.
Core Concepts
- AST to Graph Transformation: Ruby files are parsed into Abstract Syntax Trees (ASTs), which are then converted into rich graph structures where nodes represent code entities (classes, methods) and edges represent their relationships (calls, inherits, includes).
- GNN-Powered Embeddings: We use a pre-trained GNN model (in ONNX format) to generate vector embeddings for each code symbol. Unlike text embeddings, these vectors capture the structural similarity and complexity of the code, allowing for powerful semantic search.
- Lightweight & Local-First: The entire engine is designed to run with minimal overhead on a developer's machine. It uses an embedded database solution (SQLite with the
sqlite-vss
extension for vector search) that requires no external services.
Architecture & Implementation Details
The data pipeline is designed for a rich, offline-first experience:
Ruby Files -> AST Parser -> Symbol/Reference Extractor -> GNN Inference (ONNX) -> SQLite DB -> MCP API -> LLM Agent
Key Components
scripts/05_build_database.rb
: This is the main script for indexing a repository. It scans for Ruby files, extracts symbols and references, generates embeddings, and populates the SQLite database.lib/expert_enigma/symbol_extractor.rb
: A class that uses theparser
gem to traverse the AST of a Ruby file and extract definitions (classes, modules, methods) and references (usages) of symbols.lib/expert_enigma/embedding_generator.rb
: This class loads the pre-trained GNN model (in.onnx
format) and uses theonnxruntime
gem to generate vector embeddings for method ASTs.lib/expert_enigma/ast_explorer.rb
: A utility class for querying and navigating the AST of a file, with methods to find nodes by type, ID, and to get ancestors.lib/mcp_server.rb
: A Sinatra-based web server that exposes the MCP API endpoints. It queries the SQLite database to provide information about the codebase.expert_enigma.db
: An SQLite database containing the indexed data for the repository, including file ASTs, symbols, references, and vector embeddings for methods.
Progress & Implemented Features
The project has made significant progress and has a functional core.
Completed
- Phase 1: Core Integration
- Port the graph and embedding generation logic from
jubilant-palm-tree
. - Set up the
SQLite
database schema (for symbols, files, relations). - Integrate
sqlite-vss
for vector storage and search.
- Port the graph and embedding generation logic from
- Phase 2: Indexer & API
- Build the main indexer process (full scan).
- Implement the core MCP API server with the following endpoints:
/list_files
: Lists all indexed files./get_symbols
: Returns all symbols (classes, modules, methods) for a given file./get_ast
: Returns the full AST for a given file./query_nodes
: Finds nodes of a specific type within a file's AST./get_node_details
: Retrieves details for a specific node./get_ancestors
: Returns the ancestor nodes for a given node./find_definition
: Finds the definition of a symbol./find_references
: Finds all references to a symbol./get_call_hierarchy
: Returns the inbound and outbound calls for a method./search
: (Placeholder) A vector-based semantic search for methods.
Next Steps
- Phase 3: Code Transformation & Real-time Indexing
- Implement the code transformation endpoints (
/replace_node_text
, etc.). - Add a file watcher for real-time, incremental indexing.
- Implement the code transformation endpoints (
- Phase 4: Tooling & DX
- Create a simple CLI for starting the server and managing the index.
- Develop a GitHub Actions workflow for CI-based index generation.
Testing Approach
The server's endpoints are tested using a set of controlled Ruby files in the test/
directory. The testing process is as follows:
- Create Test Files: The
test/
directory contains Ruby files with a known structure of classes, modules, methods, and references. - Build Test Database: The
scripts/05_build_database.rb
script is configured to scan only thetest/
directory, creating a cleanexpert_enigma.db
with only the test data. - Verify with
curl
: The MCP server is started, andcurl
commands are used to systematically test each endpoint against the known content of the test files, verifying the JSON output.
This approach ensures that the core functionality of the server is working as expected before moving on to more complex features.
License
This project is licensed under the MIT License. See the file for details.