mcp-server-mistral-ocr-warp by svngoku - MCP Server

Mistral OCR MCP Server

A Model Context Protocol (MCP) server that provides OCR (Optical Character Recognition) capabilities using Mistral's Pixtral vision models. This server enables AI assistants to extract text, analyze documents, and describe images through MCP tools.

Features

Text Extraction: Extract all readable text from images while preserving layout and formatting
Document Analysis: Extract specific fields from documents (receipts, invoices, forms)
Image Description: Generate detailed descriptions of image contents
Multiple Input Formats: Support for both image URLs and base64-encoded images
Recent Results Cache: Access previously processed images
Customizable: Override model selection and provide custom prompts

Requirements

Python 3.10 or higher
Node.js 18+ (for MCP Inspector)
Mistral API key (Get one here)

Installation

Clone or navigate to the repository:
```
cd /path/to/mcp-server-mistral-ocr
```

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:
```
pip install -e .
```

Configure environment:

cp .env.example .env
# Edit .env and add your MISTRAL_API_KEY

Configuration

Create a .env file based on .env.example:

# Required
MISTRAL_API_KEY=your_mistral_api_key_here

# Optional
PORT=3000
MISTRAL_MODEL=pixtral-12b-latest
RECENT_MAX=50
LOG_LEVEL=info

Available Models

pixtral-12b-latest (default) - Fast and efficient
pixtral-large-latest - Highest accuracy
mistral-small-latest - With vision capabilities
mistral-medium-latest - With vision capabilities

Usage

Starting the Server

python main.py

The server will start on http://localhost:3000 (or the port specified in .env).

Using MCP Inspector

The easiest way to test the server:

Start the MCP Inspector:
```
npx @modelcontextprotocol/inspector
```
Open your browser to http://localhost:6274
Connect to your server:
- Transport Type: Streamable HTTP
- URL: http://127.0.0.1:3000/mcp
Try the tools with example payloads (see examples below)

Configuring with MCP Clients

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "mistral-ocr": {
      "command": "python",
      "args": ["/path/to/mcp-server-mistral-ocr/main.py"],
      "env": {
        "MISTRAL_API_KEY": "your_key_here"
      }
    }
  }
}

Cline (VS Code Extension)

In Cline settings, add the MCP server:

{
  "mistral-ocr": {
    "command": "python",
    "args": ["/path/to/mcp-server-mistral-ocr/main.py"],
    "env": {"MISTRAL_API_KEY": "your_key_here"}
  }
}

API Reference

Tools

1. `extract_text_from_image`

Extract text from an image while preserving layout and formatting.

Parameters:

image_url (optional): HTTP(S) URL of the image
image_base64 (optional): Base64-encoded image data
prompt (optional): Custom extraction prompt
model (optional): Model to use (default: pixtral-12b-latest)
mime_type (optional): MIME type (e.g., image/png, image/jpeg)

Example:

{
  "image_url": "https://example.com/receipt.jpg",
  "prompt": "Extract all text from this receipt"
}

Response:

{
  "id": "a3f2c1b8d9e7",
  "text": "Store Name\n123 Main St\nTotal: $45.99\n..."
}

2. `analyze_document`

Extract specific fields from a document.

Parameters:

image_url (optional): HTTP(S) URL of the document
image_base64 (optional): Base64-encoded document image
fields (optional): List of fields to extract (default: ["title", "date", "total", "address"])
model (optional): Model to use
mime_type (optional): MIME type

Example:

{
  "image_url": "https://example.com/invoice.pdf",
  "fields": ["invoice_number", "date", "total", "vendor"]
}

Response:

{
  "id": "b4e3d2c1a0f9",
  "data": {
    "invoice_number": "INV-2024-001",
    "date": "2024-01-15",
    "total": "$1,250.00",
    "vendor": "Acme Corp"
  }
}

3. `describe_image`

Generate a detailed description of image contents.

Parameters:

image_url (optional): HTTP(S) URL of the image
image_base64 (optional): Base64-encoded image data
model (optional): Model to use
mime_type (optional): MIME type

Example:

{
  "image_url": "https://example.com/photo.jpg"
}

Response:

{
  "id": "c5f4e3d2b1a0",
  "description": "The image shows a modern office space with large windows..."
}

Resources

`recent://results`

List all recent OCR results (up to RECENT_MAX items).

Response:

[
  {
    "id": "a3f2c1b8d9e7",
    "ts": 1736776543000,
    "tool": "extract_text_from_image",
    "model": "pixtral-12b-latest"
  }
]

`recent://results/{result_id}`

Get full details of a specific result.

Response:

{
  "id": "a3f2c1b8d9e7",
  "ts": 1736776543000,
  "tool": "extract_text_from_image",
  "inputs": {...},
  "output": {...},
  "model": "pixtral-12b-latest"
}

Prompts

`ocr_task_guidance`

Get OCR task guidance for specific task types.

Parameters:

task: Type of OCR task (extract, analyze, describe)

Examples

Extract Text from a Receipt

{
  "tool": "extract_text_from_image",
  "arguments": {
    "image_url": "https://example.com/receipt.jpg"
  }
}

Analyze an Invoice

{
  "tool": "analyze_document",
  "arguments": {
    "image_url": "https://example.com/invoice.pdf",
    "fields": ["invoice_number", "date", "total", "vendor", "due_date"]
  }
}

Extract Text from Base64 Image

{
  "tool": "extract_text_from_image",
  "arguments": {
    "image_base64": "iVBORw0KGgoAAAANSUhEUg...",
    "mime_type": "image/png"
  }
}

Describe a Complex Image

{
  "tool": "describe_image",
  "arguments": {
    "image_url": "https://example.com/diagram.png",
    "model": "pixtral-large-latest"
  }
}

Error Handling

The server validates inputs and provides helpful error messages:

Missing image: "Provide either image_url or image_base64"
Both inputs: "Provide only one of image_url or image_base64"
Invalid URL: "Only http(s) URLs are allowed"
API errors: Full error details from Mistral API

All errors are returned in JSON format:

{
  "error": "Error message here"
}

Security Considerations

Only HTTP(S) URLs are allowed (no file://, ftp://, etc.)
API key is never exposed in logs or responses
Base64 payloads are validated before processing
Consider setting RECENT_MAX based on memory constraints

Development

Install Development Dependencies

pip install -e ".[dev]"

Run Linter

ruff check .
ruff format .

Run Type Checker

mypy main.py

Run Tests

pytest

Troubleshooting

"MISTRAL_API_KEY is required"

Make sure you've created a .env file with your API key, or set the environment variable:

export MISTRAL_API_KEY=your_key_here
python main.py

"Connection refused" in MCP Inspector

Check that:

The server is running (python main.py)
The port matches (default: 3000)
You're connecting to http://127.0.0.1:3000/mcp

"Model not found" errors

Verify the model name is correct. Available vision models:

pixtral-12b-latest
pixtral-large-latest
mistral-small-latest
mistral-medium-latest

Memory Issues

If processing many large images, consider:

Reducing RECENT_MAX in .env
Using image URLs instead of base64 for large files
Restarting the server periodically

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Run tests and linters
Submit a pull request

License

MIT License - see LICENSE file for details

Acknowledgments

Built with Model Context Protocol
Powered by Mistral AI
Based on MCP Server Template

Support

For issues or questions:

Open an issue on GitHub
Check Mistral AI Documentation
Review MCP Documentation

svngoku/mcp-server-mistral-ocr-warp

Mistral OCR MCP Server

Features

Requirements

Installation

Configuration

Available Models

Usage

Starting the Server

Using MCP Inspector

Configuring with MCP Clients

Claude Desktop

Cline (VS Code Extension)

API Reference

Tools

1. extract_text_from_image

2. analyze_document

3. describe_image

Resources

recent://results

recent://results/{result_id}

Prompts

ocr_task_guidance

Examples

Extract Text from a Receipt

Analyze an Invoice

Extract Text from Base64 Image

Describe a Complex Image

Error Handling

Security Considerations

Development

Install Development Dependencies

Run Linter

Run Type Checker

Run Tests

Troubleshooting

"MISTRAL_API_KEY is required"

"Connection refused" in MCP Inspector

"Model not found" errors

Memory Issues

Contributing

License

Acknowledgments

Support

1. `extract_text_from_image`

2. `analyze_document`

3. `describe_image`

`recent://results`

`recent://results/{result_id}`

`ocr_task_guidance`