shahviransh/Textbook-MCP-Server
If you are the rightful owner of Textbook-MCP-Server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Textbook PDF Analysis MCP Server is a specialized server designed to provide advanced PDF analysis capabilities, including OCR, table of contents extraction, and more, for AI assistants.
Textbook PDF Analysis MCP Server
A Model Context Protocol (MCP) server that provides comprehensive PDF analysis capabilities including OCR, table of contents extraction, summary generation, flashcard creation, and quiz generation.
Purpose
This MCP server provides a secure interface for AI assistants to analyze textbook PDFs with advanced text processing capabilities. It supports both regular PDF text extraction and OCR for scanned documents.
Features
Current Implementation
extract_toc- Extract table of contents from PDF files with pattern recognitionchapter_summary- Generate comprehensive summaries for chapter page rangessection_summary- Create individual page summaries within a sectionpage_summary- Generate detailed summary for a specific pageflashcards- Create study flashcards from PDF contentquiz_gen- Generate fill-in-the-blank quiz questions from text
Security & Configuration
- File type validation and sanitization
- Configurable page limits and file size restrictions
- Rate limiting (50 requests per hour per tool)
- Non-root container execution
- Environment-based configuration
Prerequisites
- Docker Desktop with MCP Toolkit enabled
- Docker MCP CLI plugin (
docker mcpcommand) - Sufficient disk space for PDF processing
- Optional: CUDA support for faster AI processing
Configuration Options
Set these environment variables when running:
MODEL_PATH: AI model path (default: "microsoft/DialoGPT-medium")MAX_PAGES: Maximum pages to process for regular text extraction (default: 500)MAX_OCR_PAGES: Maximum pages for forced OCR mode only (default: 20, does NOT apply to auto mode)OCR_LANG: OCR language code (default: "eng")ALLOWED_UPLOAD_DIR: Upload directory (default: "/app/uploads")MAX_FILE_SIZE_MB: Maximum file size in MB (default: 100)OCR_TIMEOUT_PER_PAGE: Timeout in seconds for OCR per page (default: 30)OCR_DPI: Image resolution for OCR processing (default: 200, lower = faster)
Installation
See the step-by-step instructions provided with the files.
Usage Examples
In LLM Desktop, you can ask:
- "Extract the table of contents from textbook.pdf" (auto-detects if OCR is needed)
- "Summarize chapter 3 which spans pages 45-67" (automatically OCRs pages with images)
- "Create flashcards from pages 10-20 of the biology textbook"
- "Generate quiz questions from the first 5 pages"
- "Read combined_lectures.pdf" (intelligently OCRs only image pages, fast text extraction for text pages)
NEW: Intelligent OCR - The server automatically detects which pages contain images and applies OCR only to those pages, making it fast for text pages and accurate for scanned/image pages!
Tool Parameters
All tools now support intelligent OCR by default!
OCR Modes (applies to all tools):
"auto"(default): Automatically detects pages with images and applies OCR only to those pages"true": Force OCR on all specified pages (limited to MAX_OCR_PAGES)"false": Disable OCR, use fast text extraction only
extract_toc:
file_path: Name of PDF file in upload directoryuse_ocr: OCR mode (default: "auto")
chapter_summary:
file_path: Name of PDF filechapter_pages: Page range (e.g., "10-25" or "10,15,20-25")use_ocr: OCR mode (default: "auto")
flashcards:
file_path: Name of PDF filepages: Page range to processcount: Number of flashcards to generate (max 20)use_ocr: OCR mode (default: "auto")
Architecture
LLM Desktop → MCP Gateway → Textbook MCP Server → PDF Processing
↓
OCR (Tesseract) + NLP Models
(Text extraction, summarization, Q&A generation)
Development
Local Testing
# Set environment variables for testing
export ALLOWED_UPLOAD_DIR="/tmp/pdfs"
export MAX_PAGES="100"
export OCR_LANG="eng"
# Run directly
python textbook_server.py
# Test MCP protocol
echo '{"jsonrpc":"2.0","method":"tools/list","id":1}' | python textbook_server.py
Adding New Tools
- Add the function to
textbook_server.py - Decorate with
@mcp.tool() - Update the catalog entry with the new tool name
- Rebuild the Docker image
Supported Languages
OCR supports multiple languages via Tesseract:
- English (eng) - Default
- French (fra)
- Spanish (spa)
- German (deu)
- Additional languages can be added by modifying the Dockerfile
Troubleshooting
Tools Not Appearing
- Verify Docker image built successfully
- Check catalog and registry files
- Ensure LLM Desktop config includes custom catalog
- Restart LLM Desktop
PDF Processing Errors
- Ensure PDF file is in the upload directory
- Check file permissions and size limits
- For scanned PDFs, use
use_ocr: "true" - Verify sufficient memory for large files
OCR Issues
- Check if Tesseract language packs are installed
- Verify image quality for scanned documents
- Consider preprocessing images for better OCR results
- Intelligent OCR (NEW): By default, the server uses
"auto"mode which:- Analyzes each page to detect if it contains images or is scanned
- Only applies OCR to pages that need it (images/scanned pages)
- OCRs ALL pages with images - no artificial limits!
- Uses fast text extraction for regular text pages
- No need to manually specify OCR - it's automatic!
- Can process large documents efficiently (text pages are fast, only image pages use OCR)
- Example: 741-page PDF with 539 image pages - all 539 will be OCR'd, remaining 202 use fast text extraction
- OCR Page Limits: Only applies to forced
"true"mode (limited toMAX_OCR_PAGES= 20 pages)"auto"mode has NO limit - it will OCR all pages that contain images- Use forced mode (
use_ocr="true") only when you want to limit OCR to a specific number of pages
- EOF Errors during OCR: If you encounter EOF or timeout errors:
- Reduce
MAX_OCR_PAGESto 10 or fewer for slower systems - Increase
OCR_TIMEOUT_PER_PAGE(default: 30 seconds) - Lower
OCR_DPIfor faster processing (default: 200, try 150 or 100) - Check system resources (memory/CPU)
- Always specify a page range when using OCR on large files
- Reduce
Performance Issues
- Reduce page ranges for large documents
- Lower flashcard/quiz counts for faster processing
- Consider using GPU acceleration for AI models
Security Considerations
- All files processed within container sandbox
- Input sanitization prevents path traversal
- Rate limiting prevents abuse
- No persistent storage of uploaded content
- Running as non-root user
- File type validation
API Rate Limits
Each tool has a rate limit of 50 requests per hour to prevent abuse and ensure fair usage.
File Format Support
- PDF files (native text and scanned images)
- Maximum file size: 100MB (configurable)
- Maximum pages: 500 (configurable)
License
MIT License