mcp-server-mistral-ocr by svngoku - MCP Server

MarkThat OCR MCP Server

A powerful MCP server for converting images and PDFs to Markdown using state-of-the-art multimodal LLMs. Now with support for both local files and URLs!

Overview

This MCP server provides OCR capabilities through multimodal language models, converting images and PDFs to well-formatted Markdown. It supports multiple providers including OpenAI, Anthropic, Google Gemini, Mistral, and OpenRouter.

Prerequisites

Install uv (https://docs.astral.sh/uv/getting-started/installation/)

Installation

Clone the repository:

git clone git@github.com:alpic-ai/mcp-server-template-python.git
cd mcp-server-template-python

Install python version & dependencies:

uv python install
uv sync --locked

Features

✅ URL Support: Process files directly from URLs without manual downloading
✅ Multi-Provider Support: OpenAI, Anthropic, Gemini, Mistral, OpenRouter
✅ Advanced Figure Extraction: Extract and process figures from PDFs
✅ Image Description Generation: Generate detailed descriptions for accessibility
✅ Async Processing: Fast, non-blocking file processing
✅ Automatic Cleanup: Temporary files are cleaned up automatically

Usage

Start the server on port 3000:

uv run main.py

Running the Inspector

Requirements

Node.js: ^22.7.5

Quick Start (UI mode)

To get up and running right away with the UI, just execute the following:

npx @modelcontextprotocol/inspector

The inspector server will start up and the UI will be accessible at http://localhost:6274.

You can test your server locally by selecting:

Transport Type: Streamable HTTP
URL: http://127.0.0.1:3000/mcp

Available Tools

1. Convert Image/PDF to Markdown

Converts images or PDFs to Markdown. Supports both local files and URLs.

Example with URL:

{
  "file_path": "https://example.com/document.pdf",
  "model": "gemini-2.5-flash"
}

Example with local file:

{
  "file_path": "/path/to/local/document.pdf",
  "model": "gpt-4o"
}

2. Advanced OCR with Figure Extraction

Extracts and processes figures from PDF documents.

{
  "file_path": "https://arxiv.org/pdf/2301.00001.pdf",
  "model": "claude-3-5-sonnet-20241022",
  "figure_detector_model": "gemini-2.5-flash",
  "coordinate_model": "gemini-2.5-flash",
  "parsing_model": "gemini-2.5-flash-lite"
}

3. Generate Image Description

Generates detailed descriptions of images for accessibility.

{
  "file_path": "https://example.com/image.jpg",
  "model": "gemini-2.5-flash",
  "additional_instructions": "Focus on colors and composition"
}

Environment Variables

Set the appropriate API keys based on the models you plan to use:

export GEMINI_API_KEY="your_gemini_api_key"
export OPENAI_API_KEY="your_openai_api_key"
export ANTHROPIC_API_KEY="your_anthropic_api_key"
export MISTRAL_API_KEY="your_mistral_api_key"
export OPENROUTER_API_KEY="your_openrouter_api_key"

Development

Adding New Tools

To add a new tool, modify main.py:

@mcp.tool(
    title="Your Tool Name",
    description="Tool Description for the LLM",
)
async def new_tool(
    tool_param1: str = Field(description="The description of the param1 for the LLM"), 
    tool_param2: float = Field(description="The description of the param2 for the LLM") 
)-> str:
    """The new tool underlying method"""
    result = await some_api_call(tool_param1, tool_param2)
    return result

Adding New Resources

To add a new resource, modify main.py:

@mcp.resource(
    uri="your-scheme://{param1}/{param2}",
    description="Description of what this resource provides",
    name="Your Resource Name",
)
def your_resource(param1: str, param2: str) -> str:
    """The resource template implementation"""
    # Your resource logic here
    return f"Resource content for {param1} and {param2}"

The URI template uses {param_name} syntax to define parameters that will be extracted from the resource URI and passed to your function.

Adding New Prompts

To add a new prompt , modify main.py:

@mcp.prompt("")
async def your_prompt(
    prompt_param: str = Field(description="The description of the param for the user")
) -> str:
    """Generate a helpful prompt"""

    return f"You are a friendly assistant, help the user and don't forget to {prompt_param}."