nanonets-mcp

ArneJanning/nanonets-mcp

3.2

If you are the rightful owner of nanonets-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Nanonets MCP Server is a Model Context Protocol server that provides advanced OCR capabilities for converting various document formats into structured markdown.

Tools
  1. ocr_image_to_markdown

    Convert an image to structured markdown format.

  2. ocr_pdf_to_markdown

    Convert an entire PDF document to structured markdown format.

  3. process_word_to_markdown

    Convert a Word document (.docx) to structured markdown format.

  4. process_excel_to_markdown

    Convert an Excel file (.xlsx) to structured markdown format.

  5. get_supported_formats

    Get information about supported formats and capabilities.

Nanonets MCP Server

An MCP (Model Context Protocol) server that exposes Nanonets OCR functionality for converting images to structured markdown.

Features

  • Advanced OCR: Convert documents to structured markdown using Nanonets-OCR-s (3.75B parameter model)
  • Multi-format Support: Handles images, PDFs, Word documents, and Excel spreadsheets
    • Images: PNG, JPEG, BMP, TIFF, WEBP
    • Documents: PDF, DOCX, XLSX
  • PDF Processing: Complete multi-page PDF document processing with page-by-page OCR
  • Office Document Processing: Direct text extraction from Word and Excel files
  • Intelligent Recognition: Detects and converts:
    • Text and paragraphs
    • Tables with structure preservation
    • LaTeX equations
    • Images with descriptions
    • Signatures and watermarks
    • Checkboxes
    • Complex layouts
    • Multi-page documents with proper page separation
    • Word document headings and formatting
    • Excel worksheets and data tables

Installation

Option 1: Docker (Recommended with GPU)

# Clone the repository
git clone <repository-url>
cd nanonets_mcp

# Build and run with Docker Compose (requires NVIDIA Docker runtime)
docker-compose up --build

Prerequisites for GPU support:

Option 2: Local Installation

# Clone the repository
git clone <repository-url>
cd nanonets_mcp

# Install dependencies with uv
uv pip install -e .

Usage

Running the Server

With Docker:
# Start with Docker Compose
docker-compose up

# Or run directly with Docker
docker run --gpus all -p 8000:8000 nanonets-mcp:latest
Local Installation:
# Start the MCP server
nanonets-mcp

# Or run directly
python -m nanonets_mcp.server

Available Tools

ocr_image_to_markdown

Convert an image to structured markdown format.

Parameters:

  • image_data (string): Image data as base64 string, data URL, or file path
  • image_format (optional string): Format hint (png, jpg, etc.)

Returns: Structured markdown representation of the document

ocr_pdf_to_markdown

Convert an entire PDF document to structured markdown format.

Parameters:

  • pdf_data (string): PDF data as base64 string, data URL, or file path

Returns: Structured markdown representation of the entire PDF document with page separators

process_word_to_markdown

Convert a Word document (.docx) to structured markdown format.

Parameters:

  • docx_data (string): Word document data as base64 string, data URL, or file path

Returns: Structured markdown representation of the Word document with headings and tables

process_excel_to_markdown

Convert an Excel file (.xlsx) to structured markdown format.

Parameters:

  • excel_data (string): Excel file data as base64 string, data URL, or file path

Returns: Structured markdown representation of all worksheets in the Excel workbook

get_supported_formats

Get information about supported formats and capabilities.

Returns: Dictionary with supported formats, input methods, capabilities, and processing options

Available Resources

nanonets://model-info

Provides detailed information about the Nanonets OCR model, including capabilities and specifications.

Examples

Basic OCR Usage

Image Processing
# Using file path
result = await ocr_image_to_markdown("/path/to/document.png")

# Using base64 data
with open("document.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()
result = await ocr_image_to_markdown(image_b64)

# Using data URL
data_url = "..."
result = await ocr_image_to_markdown(data_url)
PDF Processing
# Process entire PDF document
result = await ocr_pdf_to_markdown("/path/to/document.pdf")

# Using base64 PDF data
with open("document.pdf", "rb") as f:
    pdf_b64 = base64.b64encode(f.read()).decode()
result = await ocr_pdf_to_markdown(pdf_b64)

# Result includes all pages with separators
# Example output:
# # PDF Document
# *Total pages: 3*
# 
# ---
# # Page 1
# [Content of page 1]
# 
# ---
# # Page 2
# [Content of page 2]
# ...
Word Document Processing
# Process Word document
result = await process_word_to_markdown("/path/to/document.docx")

# Using base64 Word document data
with open("document.docx", "rb") as f:
    docx_b64 = base64.b64encode(f.read()).decode()
result = await process_word_to_markdown(docx_b64)

# Result includes text, headings, and tables
# Example output:
# # Word Document
# 
# # Main Title
# 
# This is a paragraph of text.
# 
# ## Section Header
# 
# More content here.
# 
# | Name | Age | City |
# | --- | --- | --- |
# | John | 30 | NYC |
Excel Spreadsheet Processing
# Process Excel file
result = await process_excel_to_markdown("/path/to/spreadsheet.xlsx")

# Using base64 Excel data
with open("spreadsheet.xlsx", "rb") as f:
    excel_b64 = base64.b64encode(f.read()).decode()
result = await process_excel_to_markdown(excel_b64)

# Result includes all worksheets as tables
# Example output:
# # Excel Workbook
# 
# ## Sheet: Employee Data
# 
# | Name | Department | Salary |
# | --- | --- | --- |
# | Alice | Engineering | 75000 |
# | Bob | Marketing | 65000 |
# 
# ## Sheet: Financial Data
# 
# | Quarter | Revenue | Expenses |
# | --- | --- | --- |
# | Q1 | 150000 | 120000 |

Integration with Claude Desktop

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "nanonets-ocr": {
      "command": "nanonets-mcp"
    }
  }
}

Model Information

  • Model: nanonets/Nanonets-OCR-s
  • Parameters: 3.75B (based on Qwen2.5-VL-3B-Instruct)
  • Input: Images up to 2048x2048 pixels (recommended) and PDF documents
  • Output: Structured markdown with semantic tagging
  • PDF Processing: 200 DPI conversion, all pages processed sequentially

Requirements

Core Dependencies

  • Python ≥3.10
  • PyTorch ≥2.0.0
  • Transformers =4.53.0
  • PIL/Pillow ≥10.0.0
  • MCP ≥1.0.0

Optional Dependencies

  • pdf2image ≥1.16.0 (for PDF support)
  • PyMuPDF ≥1.23.0 (for PDF support)
  • python-docx ≥0.8.11 (for Word document support)
  • openpyxl ≥3.1.0 (for Excel support)
  • pandas ≥2.0.0 (for Excel support)

Development

Testing

Docker Testing:
# Test Docker build
docker-compose build

# Run health check
docker-compose up -d
docker-compose ps

# View logs
docker-compose logs -f nanonets-mcp

# Stop services
docker-compose down
Local Testing:
# Test with MCP Inspector
mcp dev nanonets_mcp/server.py

# Install for development
uv pip install -e .

Docker Management

# Rebuild image after changes
docker-compose build --no-cache

# View resource usage
docker stats nanonets-mcp-server

# Access container shell
docker-compose exec nanonets-mcp bash

# Clean up volumes and images
docker-compose down -v
docker image prune -f

License

[Add your license information here]