jesse-merhi/rag-anything-mcp
If you are the rightful owner of rag-anything-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
An MCP server providing RAG capabilities for document processing and querying with multimodal support.
process_directory
Processes all files in a directory for comprehensive RAG indexing with multimodal support.
process_single_document
Processes a single document with full multimodal analysis.
query_directory
Executes a pure text query against processed documents using LightRAG.
query_with_multimodal_content
Performs enhanced queries with additional multimodal content.
list_processed_directories
Lists all directories that have been processed and are available for querying.
get_rag_info
Retrieves detailed information about the RAG configuration and status for a directory.
RAG Anything MCP Server
An MCP (Model Context Protocol) server that provides comprehensive RAG (Retrieval-Augmented Generation) capabilities for processing and querying directories of documents using the raganything
library with full multimodal support.
Features
- End-to-End Document Processing: Complete document parsing with multimodal content extraction
- Multimodal RAG: Support for images, tables, equations, and text processing
- Batch Processing: Process entire directories with multiple file types
- Advanced Querying: Both pure text and multimodal-enhanced queries
- Multiple Query Modes: hybrid, local, global, naive, mix, and bypass modes
- Vision Processing: Advanced image analysis using GPT-4V
- Persistent Storage: RAG instances maintained per directory for efficient querying
Available Tools
process_directory
Process all files in a directory for comprehensive RAG indexing with multimodal support.
Required Parameters:
directory_path
: Path to the directory containing files to processapi_key
: OpenAI API key for LLM and embedding functions
Optional Parameters:
working_dir
: Custom working directory for RAG storagebase_url
: OpenAI API base URL (for custom endpoints)file_extensions
: List of file extensions to process (default: ['.pdf', '.docx', '.pptx', '.txt', '.md'])recursive
: Process subdirectories (default: True)enable_image_processing
: Enable image analysis (default: True)enable_table_processing
: Enable table extraction (default: True)enable_equation_processing
: Enable equation processing (default: True)max_workers
: Concurrent processing workers (default: 4)
process_single_document
Process a single document with full multimodal analysis.
Required Parameters:
file_path
: Path to the document to processapi_key
: OpenAI API key
Optional Parameters:
working_dir
: Custom working directory for RAG storagebase_url
: OpenAI API base URLoutput_dir
: Output directory for parsed contentparse_method
: Document parsing method (default: "auto")enable_image_processing
: Enable image analysis (default: True)enable_table_processing
: Enable table extraction (default: True)enable_equation_processing
: Enable equation processing (default: True)
query_directory
Pure text query against processed documents using LightRAG.
Parameters:
directory_path
: Path to the processed directoryquery
: The question to ask about the documentsmode
: Query mode - "hybrid", "local", "global", "naive", "mix", or "bypass" (default: "hybrid")
query_with_multimodal_content
Enhanced query with additional multimodal content (tables, equations, etc.).
Parameters:
directory_path
: Path to the processed directoryquery
: The question to askmultimodal_content
: List of multimodal content dictionariesmode
: Query mode (default: "hybrid")
Example multimodal_content:
[
{
"type": "table",
"table_data": "Method,Accuracy\\nRAGAnything,95.2%\\nBaseline,87.3%",
"table_caption": "Performance comparison"
},
{
"type": "equation",
"latex": "P(d|q) = \\frac{P(q|d) \\cdot P(d)}{P(q)}",
"equation_caption": "Document relevance probability"
}
]
list_processed_directories
List all directories that have been processed and are available for querying.
get_rag_info
Get detailed information about the RAG configuration and status for a directory.
Usage Examples
1. Basic Directory Processing
process_directory(
directory_path="/path/to/documents",
api_key="your-openai-api-key"
)
2. Advanced Directory Processing
process_directory(
directory_path="/path/to/research_papers",
api_key="your-openai-api-key",
file_extensions=[".pdf", ".docx"],
enable_image_processing=true,
enable_table_processing=true,
max_workers=6
)
3. Pure Text Query
query_directory(
directory_path="/path/to/documents",
query="What are the main findings in these research papers?",
mode="hybrid"
)
4. Multimodal Query with Table Data
query_with_multimodal_content(
directory_path="/path/to/documents",
query="Compare these results with the document findings",
multimodal_content=[{
"type": "table",
"table_data": "Method,Accuracy,Speed\\nRAGAnything,95.2%,120ms\\nBaseline,87.3%,180ms",
"table_caption": "Performance comparison"
}],
mode="hybrid"
)
5. Single Document Processing
process_single_document(
file_path="/path/to/important_paper.pdf",
api_key="your-openai-api-key",
enable_image_processing=true
)
Setup Requirements
1. Environment Variables
export OPENAI_API_KEY="your-openai-api-key-here"
2. Install Dependencies
uv sync
3. Run the MCP Server
python main.py
Query Modes Explained
- hybrid: Combines local and global search (recommended for most use cases)
- local: Focuses on local context and entity relationships
- global: Provides broader, document-level insights and summaries
- naive: Simple keyword-based search without graph reasoning
- mix: Combines multiple approaches for comprehensive results
- bypass: Direct access without RAG processing
Multimodal Content Types
The server supports processing and querying with:
- Images: Automatic caption generation and visual analysis
- Tables: Structure extraction and content analysis
- Equations: LaTeX parsing and mathematical reasoning
- Charts/Graphs: Visual data interpretation
- Mixed Content: Combined analysis of multiple content types
API Configuration
The server uses OpenAI's APIs by default:
- LLM: GPT-4o-mini for text processing
- Vision: GPT-4o for image analysis
- Embeddings: text-embedding-3-large (3072 dimensions)
You can customize the base_url
parameter to use:
- Azure OpenAI
- OpenAI-compatible APIs
- Custom model endpoints
File Support
Supported file formats include:
- PDF documents
- Microsoft Word (.docx)
- PowerPoint presentations (.pptx)
- Text files (.txt)
- Markdown files (.md)
- And more via the raganything library
Performance Notes
- Concurrent Processing: Use
max_workers
to control parallel document processing - Memory Usage: Large documents with many images may require significant memory
- API Costs: Vision processing (GPT-4o) is more expensive than text processing
- Storage: Processed data is stored locally for efficient re-querying