mark-hingston/query-tool
If you are the rightful owner of query-tool and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
An MCP server for querying vector stores and knowledge graphs created by the lance-embedder project.
Embeddings Query Server
An MCP (Model Context Protocol) server for querying vector stores and knowledge graphs created by the lance-embedder project.
Features
- Vector Search: Query LanceDB vector stores with semantic search
- Graph Search: Query knowledge graphs built with GraphRAG for relationship-aware retrieval
- Auto Mode: Automatically use graph search when available, fallback to vector search
- Stdio Transport: Compatible with MCP clients like Claude Desktop, Cursor, and Windsurf
Installation
npm install
npm run build
Usage
Required Arguments
--index-path <path>- Path to the embeddings directory created by embedder--base-url <url>- LM Studio base URL for generating query embeddings--model <name>- Embedding model name (must match what was used for indexing)
Optional Arguments
Index Settings:
--table-name <name>- LanceDB table name (default: "embeddings")--dimensions <number>- Embedding dimensions (default: 1024)
Query Settings:
--top-k <number>- Number of results to return (default: 10)
Graph Settings:
--enable-graph- Enable graph search mode (default: disabled for fast startup)--graph-threshold <number>- Similarity threshold for graph queries (default: 0.7)--random-walk-steps <number>- Steps for graph traversal (default: 100)--restart-prob <number>- Restart probability for random walk (default: 0.15)--preload-graph- Start building graph in background at startup (allows immediate queries via vector search)
Performance Settings:
--verbose- Enable verbose logging for debugging and performance monitoring
Examples
Basic vector search (fast startup):
node dist/index.js \
--index-path ./embeddings \
--base-url http://localhost:1234/v1 \
--model text-embedding-model \
--top-k 5
With graph search (lazy loading - builds on first query):
node dist/index.js \
--index-path ./embeddings \
--base-url http://localhost:1234/v1 \
--model text-embedding-model \
--enable-graph \
--top-k 5
With background graph preloading (best for production):
node dist/index.js \
--index-path ./embeddings \
--base-url http://localhost:1234/v1 \
--model text-embedding-model \
--enable-graph \
--preload-graph \
--verbose
For large indexes (10K+ files) - recommended settings:
node dist/index.js \
--index-path ./embeddings \
--base-url http://localhost:1234/v1 \
--model text-embedding-model \
--enable-graph \
--preload-graph \
--top-k 10 \
--verbose
MCP Tool: query_index
The server exposes a single MCP tool called query_index.
Input Parameters
query(string, required) - The search query textmode(enum: "auto" | "vector" | "graph", default: "auto") - Search modeauto: Use graph search if graph data exists, otherwise vector searchvector: Force vector-only search via LanceDBgraph: Force graph search (error if no graph data)
Output
Returns an array of results sorted by score (highest first), each containing:
text- The chunk contentsource- File path where the chunk originatedscore- Similarity score (embedding similarity)chunkIndex- Position of chunk within the source file
Project Structure
query-tool-server/
├── src/
│ ├── index.ts # Main entry point and MCP server setup
│ ├── lib/
│ │ └── graph-store.ts # GraphRAG data loading
│ └── types/
│ └── index.ts # TypeScript type definitions
├── dist/ # Compiled JavaScript output
├── package.json
├── tsconfig.json
└── README.md
Performance Optimizations
The server implements several optimizations for handling large indexes (10K+ files):
1. Lazy Loading (Default)
- Graph construction is deferred until the first query
- Server starts instantly, even with large indexes
- First query triggers graph building (may take 30s-2min for 10K files)
- Subsequent queries use the cached graph
2. Parallel Batch Loading
- All batch files are loaded in parallel using
Promise.all() - Significantly reduces I/O time for large datasets
- Progress logging shows loading status for large indexes
3. Automatic Fallback
- If a query arrives while graph is building, it automatically falls back to vector search
- No query failures due to graph build timing
- Warning logged to console for debugging
4. Persistent Graph Cache
- Graph metadata is cached to disk after first build
- Future optimizations could serialize the full graph structure
- Currently limited by Mastra GraphRAG serialization API
5. Background Preloading (Optional)
- Use
--preload-graphto start building graph at startup - Server remains responsive via vector search during build
- Graph becomes available when ready, no blocking
6. Worker Thread Support
- Graph building can be offloaded to worker threads
- Keeps main thread responsive for queries
- Currently experimental (see
graph-worker.ts)
How It Works
-
Startup:
- Validates configuration and loads LanceDB index
- Loads graph metadata (but not the full graph - lazy loading)
- Optionally starts background graph build if
--preload-graphis set - Server is ready immediately
-
Query Processing: When a query is received:
- Generates query embedding using the specified model
- Chooses search mode (vector or graph) based on the
modeparameter - For vector search: queries LanceDB directly (always fast)
- For graph search:
- Checks graph build state
- If building, falls back to vector search with warning
- If ready, uses graph for relationship-aware search
- If idle, triggers lazy build then uses graph
- Sorts results by score and returns top K
Integration with Embedder
This server is designed to work with indexes created by the project:
-
Use embedder to create an index:
embedder -d ./my-repo -o ./embeddings --enable-graph -
Start this query server pointing to that index:
node dist/index.js \ --index-path ./embeddings \ --base-url http://localhost:1234/v1 \ --model text-embedding-model \ --enable-graph
Migration Guide: Before vs After Optimizations
Before (Slow Startup for Large Indexes)
- Server would build entire graph at startup
- For 10K files: could take 2-5 minutes before server is ready
- No queries could be served during this time
- Graph was always loaded even if not used
After (Fast Startup + Lazy Loading)
- Server starts instantly (< 1 second)
- Graph is built on first query (or in background with
--preload-graph) - Vector search is always immediately available
- Graph search becomes available after build completes
- Parallel loading reduces build time by ~50-70%
Behavioral Changes
Without --enable-graph:
- No change - vector search only, instant startup
With --enable-graph (default - lazy loading):
- Server starts instantly
- First graph query triggers build (30s-2min for large indexes)
- Queries during build automatically fall back to vector search
- Subsequent queries use cached graph
With --enable-graph --preload-graph (recommended for production):
- Server starts instantly
- Graph building starts in background immediately
- All queries use vector search until graph is ready
- No user-facing delays, graph becomes available when ready
Error Handling
The server validates on startup:
- Index path exists
- LanceDB table exists
- If graph search is requested, graph data must be available
All errors are returned as MCP error responses with descriptive messages.
Development
# Run in development mode
npm run dev -- --index-path ./embeddings --base-url http://localhost:1234/v1 --model text-embedding-model
# Build for production
npm run build
# Run built version
npm start -- --index-path ./embeddings --base-url http://localhost:1234/v1 --model text-embedding-model
License
ISC