demomagic/mineru-mcp-server
If you are the rightful owner of mineru-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Mineru MCP Server is a document parsing server that leverages the Mineru API to offer advanced document parsing capabilities.
create_parsing_task
Create a document parsing task for a single file
get_task_status
Query parsing task status
create_batch_parsing_task
Create a batch file upload parsing task
create_batch_url_parsing_task
Create a batch URL parsing task
get_batch_task_results
Query batch parsing task results
Mineru MCP Server
A Model Context Protocol (MCP) document parsing server that integrates with Mineru API to provide powerful document parsing capabilities.
Features
- Single File Parsing: Create document parsing tasks via URL
- Batch File Parsing: Support multiple file batch upload and parsing
- Task Status Monitoring: Real-time query of parsing progress and results
- Multi-format Support: Support PDF, DOC, DOCX, PPT, PPTX, PNG, JPG, JPEG and other formats
- OCR Functionality: Optional OCR text recognition
- Formula Recognition: Support mathematical formula recognition
- Table Recognition: Support table structure recognition
- Multi-language Support: Support Chinese, English and other languages
Installation
npm install
Configuration
Before using, you need to configure the Mineru API key:
const config = {
mineruApiKey: "your-mineru-api-bearer-token", // Mineru API Bearer token
mineruBaseUrl: "https://mineru.net/api/v4" // Mineru API base URL
};
Available Tools
1. create_parsing_task
Create a document parsing task for a single file
Parameters:
-
url
(required): File URL -
is_ocr
(optional): Enable OCR, default false -
enable_formula
(optional): Enable formula recognition, default true -
enable_table
(optional): Enable table recognition, default true -
language
(optional): Document language, default "ch" -
page_ranges
(optional): Page ranges, e.g., "1-10,15-20" -
model_version
(optional): Model version, "v1" or "v2" -
extra_formats
(optional): Additional export formats, ["docx", "html", "latex"]
2. get_task_status
Query parsing task status
Parameters:
task_id
(required): Task ID
3. create_batch_parsing_task
Create a batch file upload parsing task (for local file uploads)
Parameters:
files
(required): File array, each file contains name, is_ocr, page_ranges and other propertiesenable_formula
(optional): Enable formula recognitionenable_table
(optional): Enable table recognitionlanguage
(optional): Document languagemodel_version
(optional): Model versionextra_formats
(optional): Additional export formats
4. create_batch_url_parsing_task
Create a batch URL parsing task (for remote file URLs)
Parameters:
files
(required): File array, each file contains url, is_ocr, page_ranges and other propertiesenable_formula
(optional): Enable formula recognitionenable_table
(optional): Enable table recognitionlanguage
(optional): Document languagemodel_version
(optional): Model versionextra_formats
(optional): Additional export formats
5. get_batch_task_results
Query batch parsing task results (supports both URL batch parsing and local upload batch parsing)
Parameters:
batch_id
(required): Batch task ID (from create_batch_url_parsing_task or create_batch_parsing_task)
Usage Examples
Single File Parsing
// Create parsing task
const taskResult = await create_parsing_task({
url: "https://example.com/document.pdf",
is_ocr: true,
enable_formula: true,
language: "en"
});
// Query task status
const status = await get_task_status({
task_id: taskResult.task_id
});
Batch File Upload Parsing
// Create batch upload task
const batchResult = await create_batch_parsing_task({
files: [
{ name: "document1.pdf", is_ocr: true },
{ name: "document2.docx" }
],
enable_formula: true,
language: "ch"
});
// Query batch task results (applicable to both batch parsing methods)
const batchStatus = await get_batch_task_results({
batch_id: batchResult.batch_id
});
Batch URL Parsing
// Create batch URL parsing task
const batchUrlResult = await create_batch_url_parsing_task({
files: [
{ url: "https://example.com/doc1.pdf", is_ocr: true },
{ url: "https://example.com/doc2.docx" }
],
enable_formula: true,
language: "en"
});
// Query batch task results (applicable to both batch parsing methods)
const batchUrlStatus = await get_batch_task_results({
batch_id: batchUrlResult.batch_id
});
Development
npm run dev
Important Notes
- Single file size cannot exceed 200MB, page count cannot exceed 600 pages
- Each account has 2000 pages of highest priority parsing quota per day
- Due to network restrictions, foreign URLs like GitHub and AWS may timeout
- Batch upload file links are valid for 24 hours
- No need to set Content-Type header when uploading files
Common Error Codes
Error Code | Description | Solution |
---|---|---|
A0202 | Token error | Check if the Token is correct, or replace with a new Token |
A0211 | Token expired | Replace with a new Token |
-500 | Parameter error | Ensure parameter types and Content-Type are correct |
-10001 | Service exception | Please try again later |
-10002 | Request parameter error | Check request parameter format |
-60001 | Failed to generate upload URL | Please try again later |
-60002 | Failed to get matching file format | File type detection failed, ensure the requested filename and link have correct extensions, and the file is one of pdf, doc, docx, ppt, pptx, png, jp(e)g |
-60003 | File read failed | Check if the file is corrupted and re-upload |
-60004 | Empty file | Please upload a valid file |
-60005 | File size exceeds limit | Check file size, maximum support 200MB |
-60006 | File page count exceeds limit | Please split the file and try again |
-60007 | Model service temporarily unavailable | Please try again later or contact technical support |
-60008 | File read timeout | Check if URL is accessible |
-60009 | Task submission queue is full | Please try again later |
-60010 | Parsing failed | Please try again later |
-60011 | Failed to get valid file | Please ensure the file has been uploaded |
-60012 | Task not found | Please ensure task_id is valid and not deleted |
-60013 | No permission to access this task | Can only access tasks submitted by yourself |
-60014 | Delete running task | Running tasks do not support deletion |
-60015 | File conversion failed | Can manually convert to PDF and upload |
-60016 | File conversion failed | File conversion to specified format failed, can try other format export or retry |
License
ISC