ali1k/mcp-query-assistant
If you are the rightful owner of mcp-query-assistant and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Query Assistant MCP Server is a tool designed to facilitate query generation using semantic search over training examples, providing few-shot learning capabilities for various query languages.
find_similar_queries
Find similar query examples based on a natural language question.
add_training_example
Add a new question-query pair to the training dataset.
list_training_examples
List all training examples in the dataset.
find_duplicates
Find duplicate training examples based on question and query.
remove_duplicates
Remove duplicate training examples, keeping only the first occurrence of each unique question-query pair.
Query Assistant MCP Server
A Model Context Protocol (MCP) server that helps generate queries using semantic search over training examples. This server provides few-shot learning capabilities by finding similar questions and their corresponding queries to guide query generation for various query languages (Cypher, SPARQL, SQL, etc.).
Features
- Semantic Search: Uses OpenAI embeddings to find similar questions in your training dataset
- Few-Shot Learning: Returns relevant examples to help generate accurate queries
- Training Data Management: Add, list, and manage question-query pairs with duplicate detection
- Vector Storage: Efficient similarity search using HNSW (Hierarchical Navigable Small World) algorithm
- Metadata Support: Organize examples by domain, complexity, and tags
- Multi-Language Support: Works with various query languages (Cypher, SPARQL, SQL, etc.)
Installation
-
Prerequisites: Node.js 18+ and npm
-
Build the server:
cd /Users/alkhalili/Documents/Cline/MCP/mcp-query-assistant npm install npm run build
-
Configure OpenAI API Key:
- Get your API key from OpenAI Platform
- Update the MCP settings file at:
/Users/alkhalili/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
- Replace
your_openai_api_key_here
with your actual API key
-
Configure Data Directory (Optional):
- By default, data is stored in the
data/
folder relative to the project - To use a custom data directory, you can:
- Set the
DATA_DIR
environment variable, or - Use the
--data-dir
command line argument
- Set the
- By default, data is stored in the
-
Server Configuration: The server is already configured in your MCP settings as:
"query-assistant": { "command": "npx", "args": ["mcp-query-assistant"], "env": { "OPENAI_API_KEY": "your_openai_api_key_here", "DATA_DIR": "/optional/custom/data/path" }, "disabled": false, "autoApprove": [] }
Alternative with command line arguments:
"query-assistant": { "command": "npx", "args": ["mcp-query-assistant", "--openai-key", "your_openai_api_key_here", "--data-dir", "/custom/data/path"], "disabled": false, "autoApprove": [] }
Available Tools
1. find_similar_queries
Find similar query examples based on a natural language question.
Parameters:
question
(required): The natural language question to find similar examples forlimit
(optional): Maximum number of similar examples to return (default: 3, max: 10)threshold
(optional): Minimum similarity threshold 0-1 (default: 0.7)
Example Usage:
Use the find_similar_queries tool to find examples for: "Give me the list of CDEs in the lineage"
2. add_training_example
Add a new question-query pair to the training dataset.
Parameters:
question
(required): The natural language questionquery
(required): The corresponding query (Cypher, SPARQL, SQL, etc.)metadata
(optional): Additional metadata (domain, complexity, tags)
Example Usage:
Add a training example:
Question: "Find users who bought expensive products"
Query: "MATCH (u:User)-[:PURCHASED]->(p:Product) WHERE p.price > 1000 RETURN u"
Metadata: {"domain": "user_analytics", "complexity": "medium"}
3. list_training_examples
List all training examples in the dataset.
Parameters:
limit
(optional): Maximum number of examples to return (default: 10, max: 100)domain
(optional): Filter by domain
4. find_duplicates
Find duplicate training examples based on question and query.
Parameters: None
Example Usage:
Use find_duplicates to identify duplicate training examples in your dataset.
5. remove_duplicates
Remove duplicate training examples, keeping only the first occurrence of each unique question-query pair.
Parameters:
confirm
(optional): Set to true to confirm removal of duplicates (default: false)
Example Usage:
Use remove_duplicates with confirm=true to clean up duplicate examples.
Default Training Examples
The server comes with 1 default example covering data lineage patterns:
- Data Lineage: "Give me the list of CDEs in the lineage"
- Query:
MATCH (cde:CDE) RETURN cde.name, cde.description, cde.layer, cde.fqn ORDER BY cde.name
- Domain: Data Lineage
- Complexity: simple
- Query:
Usage Workflow
- Ask for similar queries: When you need to write a query, use
find_similar_queries
with your natural language question - Get few-shot examples: The server returns similar questions and their queries with similarity scores
- Generate your query: Use the examples as guidance to write your specific query
- Add new examples: Use
add_training_example
to expand your training dataset with new patterns - Manage duplicates: Use
find_duplicates
andremove_duplicates
to keep your dataset clean
Data Storage
- Training Data: Stored in
{DATA_DIR}/training_data.json
(default:data/training_data.json
) - Vector Index: Stored in
{DATA_DIR}/vector_index.bin
(default:data/vector_index.bin
) - Embeddings: Generated using OpenAI's
text-embedding-3-small
model (1536 dimensions) - Data Directory: Configurable via
DATA_DIR
environment variable or--data-dir
command line argument
Example Interaction
User: "Show me all data elements with their descriptions"
Agent: Let me find similar examples for you.
[Uses find_similar_queries tool]
Server Response:
Found 1 similar examples for: "Show me all data elements with their descriptions"
Example 1 (similarity: 0.823):
Question: Give me the list of CDEs in the lineage
Query: MATCH (cde:CDE) RETURN cde.name, cde.description, cde.layer, cde.fqn ORDER BY cde.name
Domain: Data Lineage
Complexity: simple
Agent: Based on this example, here's a query for showing data elements with descriptions:
MATCH (cde:CDE)
RETURN cde.name, cde.description
ORDER BY cde.name
Troubleshooting
- "OpenAI API key not configured": Make sure you've set your API key in the MCP settings
- "No similar examples found": Try lowering the similarity threshold or adding more training data
- Server not connecting: Check that the build path is correct and the server compiled successfully
Contributing
To add more training examples or improve the server:
- Use the
add_training_example
tool to add new question-query pairs - Organize examples with appropriate metadata (domain, complexity, tags)
- Test similarity search with various question phrasings
- Consider adding domain-specific examples for your use case
Technical Details
- Vector Database: HNSW (Hierarchical Navigable Small World) for efficient similarity search
- Embedding Model: OpenAI text-embedding-3-small (1536 dimensions)
- Similarity Metric: Cosine similarity
- Storage Format: JSON for training data, binary for vector index
- Max Capacity: 10,000 training examples (configurable)