svngoku/mcp-server-mistral-ocr-warp
If you are the rightful owner of mcp-server-mistral-ocr-warp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Mistral OCR MCP Server provides OCR capabilities using Mistral's Pixtral vision models, enabling AI assistants to extract text, analyze documents, and describe images.
Mistral OCR MCP Server
A Model Context Protocol (MCP) server that provides OCR (Optical Character Recognition) capabilities using Mistral's Pixtral vision models. This server enables AI assistants to extract text, analyze documents, and describe images through MCP tools.
Features
- Text Extraction: Extract all readable text from images while preserving layout and formatting
- Document Analysis: Extract specific fields from documents (receipts, invoices, forms)
- Image Description: Generate detailed descriptions of image contents
- Multiple Input Formats: Support for both image URLs and base64-encoded images
- Recent Results Cache: Access previously processed images
- Customizable: Override model selection and provide custom prompts
Requirements
- Python 3.10 or higher
- Node.js 18+ (for MCP Inspector)
- Mistral API key (Get one here)
Installation
-
Clone or navigate to the repository:
cd /path/to/mcp-server-mistral-ocr
-
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
pip install -e .
-
Configure environment:
cp .env.example .env # Edit .env and add your MISTRAL_API_KEY
Configuration
Create a .env
file based on .env.example
:
# Required
MISTRAL_API_KEY=your_mistral_api_key_here
# Optional
PORT=3000
MISTRAL_MODEL=pixtral-12b-latest
RECENT_MAX=50
LOG_LEVEL=info
Available Models
pixtral-12b-latest
(default) - Fast and efficientpixtral-large-latest
- Highest accuracymistral-small-latest
- With vision capabilitiesmistral-medium-latest
- With vision capabilities
Usage
Starting the Server
python main.py
The server will start on http://localhost:3000
(or the port specified in .env
).
Using MCP Inspector
The easiest way to test the server:
-
Start the MCP Inspector:
npx @modelcontextprotocol/inspector
-
Open your browser to
http://localhost:6274
-
Connect to your server:
- Transport Type: Streamable HTTP
- URL:
http://127.0.0.1:3000/mcp
-
Try the tools with example payloads (see examples below)
Configuring with MCP Clients
Claude Desktop
Add to your claude_desktop_config.json
:
{
"mcpServers": {
"mistral-ocr": {
"command": "python",
"args": ["/path/to/mcp-server-mistral-ocr/main.py"],
"env": {
"MISTRAL_API_KEY": "your_key_here"
}
}
}
}
Cline (VS Code Extension)
In Cline settings, add the MCP server:
{
"mistral-ocr": {
"command": "python",
"args": ["/path/to/mcp-server-mistral-ocr/main.py"],
"env": {"MISTRAL_API_KEY": "your_key_here"}
}
}
API Reference
Tools
1. extract_text_from_image
Extract text from an image while preserving layout and formatting.
Parameters:
image_url
(optional): HTTP(S) URL of the imageimage_base64
(optional): Base64-encoded image dataprompt
(optional): Custom extraction promptmodel
(optional): Model to use (default:pixtral-12b-latest
)mime_type
(optional): MIME type (e.g.,image/png
,image/jpeg
)
Example:
{
"image_url": "https://example.com/receipt.jpg",
"prompt": "Extract all text from this receipt"
}
Response:
{
"id": "a3f2c1b8d9e7",
"text": "Store Name\n123 Main St\nTotal: $45.99\n..."
}
2. analyze_document
Extract specific fields from a document.
Parameters:
image_url
(optional): HTTP(S) URL of the documentimage_base64
(optional): Base64-encoded document imagefields
(optional): List of fields to extract (default:["title", "date", "total", "address"]
)model
(optional): Model to usemime_type
(optional): MIME type
Example:
{
"image_url": "https://example.com/invoice.pdf",
"fields": ["invoice_number", "date", "total", "vendor"]
}
Response:
{
"id": "b4e3d2c1a0f9",
"data": {
"invoice_number": "INV-2024-001",
"date": "2024-01-15",
"total": "$1,250.00",
"vendor": "Acme Corp"
}
}
3. describe_image
Generate a detailed description of image contents.
Parameters:
image_url
(optional): HTTP(S) URL of the imageimage_base64
(optional): Base64-encoded image datamodel
(optional): Model to usemime_type
(optional): MIME type
Example:
{
"image_url": "https://example.com/photo.jpg"
}
Response:
{
"id": "c5f4e3d2b1a0",
"description": "The image shows a modern office space with large windows..."
}
Resources
recent://results
List all recent OCR results (up to RECENT_MAX
items).
Response:
[
{
"id": "a3f2c1b8d9e7",
"ts": 1736776543000,
"tool": "extract_text_from_image",
"model": "pixtral-12b-latest"
}
]
recent://results/{result_id}
Get full details of a specific result.
Response:
{
"id": "a3f2c1b8d9e7",
"ts": 1736776543000,
"tool": "extract_text_from_image",
"inputs": {...},
"output": {...},
"model": "pixtral-12b-latest"
}
Prompts
ocr_task_guidance
Get OCR task guidance for specific task types.
Parameters:
task
: Type of OCR task (extract
,analyze
,describe
)
Examples
Extract Text from a Receipt
{
"tool": "extract_text_from_image",
"arguments": {
"image_url": "https://example.com/receipt.jpg"
}
}
Analyze an Invoice
{
"tool": "analyze_document",
"arguments": {
"image_url": "https://example.com/invoice.pdf",
"fields": ["invoice_number", "date", "total", "vendor", "due_date"]
}
}
Extract Text from Base64 Image
{
"tool": "extract_text_from_image",
"arguments": {
"image_base64": "iVBORw0KGgoAAAANSUhEUg...",
"mime_type": "image/png"
}
}
Describe a Complex Image
{
"tool": "describe_image",
"arguments": {
"image_url": "https://example.com/diagram.png",
"model": "pixtral-large-latest"
}
}
Error Handling
The server validates inputs and provides helpful error messages:
- Missing image: "Provide either image_url or image_base64"
- Both inputs: "Provide only one of image_url or image_base64"
- Invalid URL: "Only http(s) URLs are allowed"
- API errors: Full error details from Mistral API
All errors are returned in JSON format:
{
"error": "Error message here"
}
Security Considerations
- Only HTTP(S) URLs are allowed (no
file://
,ftp://
, etc.) - API key is never exposed in logs or responses
- Base64 payloads are validated before processing
- Consider setting
RECENT_MAX
based on memory constraints
Development
Install Development Dependencies
pip install -e ".[dev]"
Run Linter
ruff check .
ruff format .
Run Type Checker
mypy main.py
Run Tests
pytest
Troubleshooting
"MISTRAL_API_KEY is required"
Make sure you've created a .env
file with your API key, or set the environment variable:
export MISTRAL_API_KEY=your_key_here
python main.py
"Connection refused" in MCP Inspector
Check that:
- The server is running (
python main.py
) - The port matches (default: 3000)
- You're connecting to
http://127.0.0.1:3000/mcp
"Model not found" errors
Verify the model name is correct. Available vision models:
pixtral-12b-latest
pixtral-large-latest
mistral-small-latest
mistral-medium-latest
Memory Issues
If processing many large images, consider:
- Reducing
RECENT_MAX
in.env
- Using image URLs instead of base64 for large files
- Restarting the server periodically
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linters
- Submit a pull request
License
MIT License - see LICENSE file for details
Acknowledgments
- Built with Model Context Protocol
- Powered by Mistral AI
- Based on MCP Server Template
Support
For issues or questions:
- Open an issue on GitHub
- Check Mistral AI Documentation
- Review MCP Documentation