rag-anything-mcp

jesse-merhi/rag-anything-mcp

3.3

If you are the rightful owner of rag-anything-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

An MCP server providing RAG capabilities for document processing and querying with multimodal support.

Tools
  1. process_directory

    Processes all files in a directory for comprehensive RAG indexing with multimodal support.

  2. process_single_document

    Processes a single document with full multimodal analysis.

  3. query_directory

    Executes a pure text query against processed documents using LightRAG.

  4. query_with_multimodal_content

    Performs enhanced queries with additional multimodal content.

  5. list_processed_directories

    Lists all directories that have been processed and are available for querying.

  6. get_rag_info

    Retrieves detailed information about the RAG configuration and status for a directory.

RAG Anything MCP Server

An MCP (Model Context Protocol) server that provides comprehensive RAG (Retrieval-Augmented Generation) capabilities for processing and querying directories of documents using the raganything library with full multimodal support.

Features

  • End-to-End Document Processing: Complete document parsing with multimodal content extraction
  • Multimodal RAG: Support for images, tables, equations, and text processing
  • Batch Processing: Process entire directories with multiple file types
  • Advanced Querying: Both pure text and multimodal-enhanced queries
  • Multiple Query Modes: hybrid, local, global, naive, mix, and bypass modes
  • Vision Processing: Advanced image analysis using GPT-4V
  • Persistent Storage: RAG instances maintained per directory for efficient querying

Available Tools

process_directory

Process all files in a directory for comprehensive RAG indexing with multimodal support.

Required Parameters:

  • directory_path: Path to the directory containing files to process
  • api_key: OpenAI API key for LLM and embedding functions

Optional Parameters:

  • working_dir: Custom working directory for RAG storage
  • base_url: OpenAI API base URL (for custom endpoints)
  • file_extensions: List of file extensions to process (default: ['.pdf', '.docx', '.pptx', '.txt', '.md'])
  • recursive: Process subdirectories (default: True)
  • enable_image_processing: Enable image analysis (default: True)
  • enable_table_processing: Enable table extraction (default: True)
  • enable_equation_processing: Enable equation processing (default: True)
  • max_workers: Concurrent processing workers (default: 4)

process_single_document

Process a single document with full multimodal analysis.

Required Parameters:

  • file_path: Path to the document to process
  • api_key: OpenAI API key

Optional Parameters:

  • working_dir: Custom working directory for RAG storage
  • base_url: OpenAI API base URL
  • output_dir: Output directory for parsed content
  • parse_method: Document parsing method (default: "auto")
  • enable_image_processing: Enable image analysis (default: True)
  • enable_table_processing: Enable table extraction (default: True)
  • enable_equation_processing: Enable equation processing (default: True)

query_directory

Pure text query against processed documents using LightRAG.

Parameters:

  • directory_path: Path to the processed directory
  • query: The question to ask about the documents
  • mode: Query mode - "hybrid", "local", "global", "naive", "mix", or "bypass" (default: "hybrid")

query_with_multimodal_content

Enhanced query with additional multimodal content (tables, equations, etc.).

Parameters:

  • directory_path: Path to the processed directory
  • query: The question to ask
  • multimodal_content: List of multimodal content dictionaries
  • mode: Query mode (default: "hybrid")

Example multimodal_content:

[
  {
    "type": "table",
    "table_data": "Method,Accuracy\\nRAGAnything,95.2%\\nBaseline,87.3%",
    "table_caption": "Performance comparison"
  },
  {
    "type": "equation",
    "latex": "P(d|q) = \\frac{P(q|d) \\cdot P(d)}{P(q)}",
    "equation_caption": "Document relevance probability"
  }
]

list_processed_directories

List all directories that have been processed and are available for querying.

get_rag_info

Get detailed information about the RAG configuration and status for a directory.

Usage Examples

1. Basic Directory Processing

process_directory(
  directory_path="/path/to/documents",
  api_key="your-openai-api-key"
)

2. Advanced Directory Processing

process_directory(
  directory_path="/path/to/research_papers",
  api_key="your-openai-api-key",
  file_extensions=[".pdf", ".docx"],
  enable_image_processing=true,
  enable_table_processing=true,
  max_workers=6
)

3. Pure Text Query

query_directory(
  directory_path="/path/to/documents",
  query="What are the main findings in these research papers?",
  mode="hybrid"
)

4. Multimodal Query with Table Data

query_with_multimodal_content(
  directory_path="/path/to/documents",
  query="Compare these results with the document findings",
  multimodal_content=[{
    "type": "table",
    "table_data": "Method,Accuracy,Speed\\nRAGAnything,95.2%,120ms\\nBaseline,87.3%,180ms",
    "table_caption": "Performance comparison"
  }],
  mode="hybrid"
)

5. Single Document Processing

process_single_document(
  file_path="/path/to/important_paper.pdf",
  api_key="your-openai-api-key",
  enable_image_processing=true
)

Setup Requirements

1. Environment Variables

export OPENAI_API_KEY="your-openai-api-key-here"

2. Install Dependencies

uv sync

3. Run the MCP Server

python main.py

Query Modes Explained

  • hybrid: Combines local and global search (recommended for most use cases)
  • local: Focuses on local context and entity relationships
  • global: Provides broader, document-level insights and summaries
  • naive: Simple keyword-based search without graph reasoning
  • mix: Combines multiple approaches for comprehensive results
  • bypass: Direct access without RAG processing

Multimodal Content Types

The server supports processing and querying with:

  • Images: Automatic caption generation and visual analysis
  • Tables: Structure extraction and content analysis
  • Equations: LaTeX parsing and mathematical reasoning
  • Charts/Graphs: Visual data interpretation
  • Mixed Content: Combined analysis of multiple content types

API Configuration

The server uses OpenAI's APIs by default:

  • LLM: GPT-4o-mini for text processing
  • Vision: GPT-4o for image analysis
  • Embeddings: text-embedding-3-large (3072 dimensions)

You can customize the base_url parameter to use:

  • Azure OpenAI
  • OpenAI-compatible APIs
  • Custom model endpoints

File Support

Supported file formats include:

  • PDF documents
  • Microsoft Word (.docx)
  • PowerPoint presentations (.pptx)
  • Text files (.txt)
  • Markdown files (.md)
  • And more via the raganything library

Performance Notes

  • Concurrent Processing: Use max_workers to control parallel document processing
  • Memory Usage: Large documents with many images may require significant memory
  • API Costs: Vision processing (GPT-4o) is more expensive than text processing
  • Storage: Processed data is stored locally for efficient re-querying