calibre-rag-mcp-nodejs by ispyridis - MCP Server

Calibre RAG MCP Server

Enhanced Calibre MCP server with RAG (Retrieval-Augmented Generation) capabilities for project-based vector search and contextual conversations.

Features

RAG-Enhanced Search: Vector-based semantic search using FAISS and Transformers
Project-Based Organization: Create isolated vector search projects for different contexts
Multi-Format Support: Process books in various formats (EPUB, PDF, MOBI, etc.)
OCR Capabilities: Extract text from images and scanned PDFs using Tesseract
Advanced Text Processing: Natural language processing for better content understanding
Windows Compatible: Designed specifically for Windows environments

Technologies Used

Vector Search: FAISS for efficient similarity search
Embeddings: Xenova Transformers for local embedding generation
OCR: Tesseract for optical character recognition
PDF Processing: Multiple PDF parsing libraries (pdf-parse, pdf-poppler, pdf2pic)
Image Processing: Sharp for image manipulation
NLP: Natural language processing with multiple libraries

Prerequisites

Node.js >= 16.0.0
Calibre installed on Windows
ImageMagick (for enhanced image processing)
Tesseract OCR (for text extraction from images)

Installation

Clone this repository:

git clone https://github.com/yourusername/calibre-rag-mcp-nodejs.git
cd calibre-rag-mcp-nodejs

Install dependencies:

npm install

Run setup (Windows):

setup.bat

Configuration

The server automatically detects your Calibre library location. For custom configurations, modify the settings in server.js.

Usage

Starting the Server

npm start

Available Tools

search: Semantic search across your ebook library
fetch: Retrieve specific content from books
list_projects: List all RAG projects
create_project: Create a new RAG project
add_books_to_project: Add books to a project for vectorization
search_project_context: Search within specific projects

Example MCP Configuration

Add to your MCP client configuration:

{
  "mcpServers": {
    "calibre-rag": {
      "command": "node",
      "args": ["path/to/calibre-rag-mcp-nodejs/server.js"]
    }
  }
}

Project Structure

calibre-rag-mcp-nodejs/
├── server.js              # Main MCP server
├── package.json           # Dependencies and scripts
├── setup.bat              # Windows setup script
├── test-*.js              # Various test files
├── projects/              # RAG projects storage
├── CONFIG.md              # Configuration documentation
├── USAGE_EXAMPLES.md      # Usage examples
└── QUICK_TEST.md          # Quick testing guide

Testing

Run the test suite:

npm test

Individual test files:

test-enhanced-server.js - Enhanced server functionality
test-ocr-full.js - OCR capabilities
test-pdf-approaches.js - PDF processing
test-enhanced-auto.js - Automated testing

Documentation

Requirements

System Requirements

Windows 10/11
Node.js 16+
Calibre installed
At least 4GB RAM (8GB+ recommended for large libraries)

Optional Dependencies

ImageMagick (for enhanced image processing)
Tesseract OCR (for text extraction from scanned documents)

Troubleshooting

Common Issues

FAISS Installation: If FAISS fails to install, ensure you have proper build tools
Tesseract Not Found: Install Tesseract and add to PATH
Memory Issues: Reduce batch sizes for large document processing

Debug Mode

Enable verbose logging by setting environment variable:

set DEBUG=calibre-rag:*
npm start

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

License

Licensed under the Apache License 2.0. See LICENSE file for details.

Support

For issues and questions, please open an issue on GitHub.

Changelog

v1.0.0

Initial release with RAG capabilities
Project-based vector search
Multi-format document support
OCR integration
Windows optimization