mcp-server-mistral-ocr-warp

svngoku/mcp-server-mistral-ocr-warp

3.2

If you are the rightful owner of mcp-server-mistral-ocr-warp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Mistral OCR MCP Server provides OCR capabilities using Mistral's Pixtral vision models, enabling AI assistants to extract text, analyze documents, and describe images.

Tools
3
Resources
0
Prompts
0

Mistral OCR MCP Server

A Model Context Protocol (MCP) server that provides OCR (Optical Character Recognition) capabilities using Mistral's Pixtral vision models. This server enables AI assistants to extract text, analyze documents, and describe images through MCP tools.

Features

  • Text Extraction: Extract all readable text from images while preserving layout and formatting
  • Document Analysis: Extract specific fields from documents (receipts, invoices, forms)
  • Image Description: Generate detailed descriptions of image contents
  • Multiple Input Formats: Support for both image URLs and base64-encoded images
  • Recent Results Cache: Access previously processed images
  • Customizable: Override model selection and provide custom prompts

Requirements

  • Python 3.10 or higher
  • Node.js 18+ (for MCP Inspector)
  • Mistral API key (Get one here)

Installation

  1. Clone or navigate to the repository:

    cd /path/to/mcp-server-mistral-ocr
    
  2. Create and activate a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
  3. Install dependencies:

    pip install -e .
    
  4. Configure environment:

    cp .env.example .env
    # Edit .env and add your MISTRAL_API_KEY
    

Configuration

Create a .env file based on .env.example:

# Required
MISTRAL_API_KEY=your_mistral_api_key_here

# Optional
PORT=3000
MISTRAL_MODEL=pixtral-12b-latest
RECENT_MAX=50
LOG_LEVEL=info

Available Models

  • pixtral-12b-latest (default) - Fast and efficient
  • pixtral-large-latest - Highest accuracy
  • mistral-small-latest - With vision capabilities
  • mistral-medium-latest - With vision capabilities

Usage

Starting the Server

python main.py

The server will start on http://localhost:3000 (or the port specified in .env).

Using MCP Inspector

The easiest way to test the server:

  1. Start the MCP Inspector:

    npx @modelcontextprotocol/inspector
    
  2. Open your browser to http://localhost:6274

  3. Connect to your server:

    • Transport Type: Streamable HTTP
    • URL: http://127.0.0.1:3000/mcp
  4. Try the tools with example payloads (see examples below)

Configuring with MCP Clients

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "mistral-ocr": {
      "command": "python",
      "args": ["/path/to/mcp-server-mistral-ocr/main.py"],
      "env": {
        "MISTRAL_API_KEY": "your_key_here"
      }
    }
  }
}
Cline (VS Code Extension)

In Cline settings, add the MCP server:

{
  "mistral-ocr": {
    "command": "python",
    "args": ["/path/to/mcp-server-mistral-ocr/main.py"],
    "env": {"MISTRAL_API_KEY": "your_key_here"}
  }
}

API Reference

Tools

1. extract_text_from_image

Extract text from an image while preserving layout and formatting.

Parameters:

  • image_url (optional): HTTP(S) URL of the image
  • image_base64 (optional): Base64-encoded image data
  • prompt (optional): Custom extraction prompt
  • model (optional): Model to use (default: pixtral-12b-latest)
  • mime_type (optional): MIME type (e.g., image/png, image/jpeg)

Example:

{
  "image_url": "https://example.com/receipt.jpg",
  "prompt": "Extract all text from this receipt"
}

Response:

{
  "id": "a3f2c1b8d9e7",
  "text": "Store Name\n123 Main St\nTotal: $45.99\n..."
}
2. analyze_document

Extract specific fields from a document.

Parameters:

  • image_url (optional): HTTP(S) URL of the document
  • image_base64 (optional): Base64-encoded document image
  • fields (optional): List of fields to extract (default: ["title", "date", "total", "address"])
  • model (optional): Model to use
  • mime_type (optional): MIME type

Example:

{
  "image_url": "https://example.com/invoice.pdf",
  "fields": ["invoice_number", "date", "total", "vendor"]
}

Response:

{
  "id": "b4e3d2c1a0f9",
  "data": {
    "invoice_number": "INV-2024-001",
    "date": "2024-01-15",
    "total": "$1,250.00",
    "vendor": "Acme Corp"
  }
}
3. describe_image

Generate a detailed description of image contents.

Parameters:

  • image_url (optional): HTTP(S) URL of the image
  • image_base64 (optional): Base64-encoded image data
  • model (optional): Model to use
  • mime_type (optional): MIME type

Example:

{
  "image_url": "https://example.com/photo.jpg"
}

Response:

{
  "id": "c5f4e3d2b1a0",
  "description": "The image shows a modern office space with large windows..."
}

Resources

recent://results

List all recent OCR results (up to RECENT_MAX items).

Response:

[
  {
    "id": "a3f2c1b8d9e7",
    "ts": 1736776543000,
    "tool": "extract_text_from_image",
    "model": "pixtral-12b-latest"
  }
]
recent://results/{result_id}

Get full details of a specific result.

Response:

{
  "id": "a3f2c1b8d9e7",
  "ts": 1736776543000,
  "tool": "extract_text_from_image",
  "inputs": {...},
  "output": {...},
  "model": "pixtral-12b-latest"
}

Prompts

ocr_task_guidance

Get OCR task guidance for specific task types.

Parameters:

  • task: Type of OCR task (extract, analyze, describe)

Examples

Extract Text from a Receipt

{
  "tool": "extract_text_from_image",
  "arguments": {
    "image_url": "https://example.com/receipt.jpg"
  }
}

Analyze an Invoice

{
  "tool": "analyze_document",
  "arguments": {
    "image_url": "https://example.com/invoice.pdf",
    "fields": ["invoice_number", "date", "total", "vendor", "due_date"]
  }
}

Extract Text from Base64 Image

{
  "tool": "extract_text_from_image",
  "arguments": {
    "image_base64": "iVBORw0KGgoAAAANSUhEUg...",
    "mime_type": "image/png"
  }
}

Describe a Complex Image

{
  "tool": "describe_image",
  "arguments": {
    "image_url": "https://example.com/diagram.png",
    "model": "pixtral-large-latest"
  }
}

Error Handling

The server validates inputs and provides helpful error messages:

  • Missing image: "Provide either image_url or image_base64"
  • Both inputs: "Provide only one of image_url or image_base64"
  • Invalid URL: "Only http(s) URLs are allowed"
  • API errors: Full error details from Mistral API

All errors are returned in JSON format:

{
  "error": "Error message here"
}

Security Considerations

  • Only HTTP(S) URLs are allowed (no file://, ftp://, etc.)
  • API key is never exposed in logs or responses
  • Base64 payloads are validated before processing
  • Consider setting RECENT_MAX based on memory constraints

Development

Install Development Dependencies

pip install -e ".[dev]"

Run Linter

ruff check .
ruff format .

Run Type Checker

mypy main.py

Run Tests

pytest

Troubleshooting

"MISTRAL_API_KEY is required"

Make sure you've created a .env file with your API key, or set the environment variable:

export MISTRAL_API_KEY=your_key_here
python main.py

"Connection refused" in MCP Inspector

Check that:

  1. The server is running (python main.py)
  2. The port matches (default: 3000)
  3. You're connecting to http://127.0.0.1:3000/mcp

"Model not found" errors

Verify the model name is correct. Available vision models:

  • pixtral-12b-latest
  • pixtral-large-latest
  • mistral-small-latest
  • mistral-medium-latest

Memory Issues

If processing many large images, consider:

  • Reducing RECENT_MAX in .env
  • Using image URLs instead of base64 for large files
  • Restarting the server periodically

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests and linters
  5. Submit a pull request

License

MIT License - see LICENSE file for details

Acknowledgments

Support

For issues or questions: