wognsths/universal-file-reader-mcp
If you are the rightful owner of universal-file-reader-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Universal File Reader MCP is a server designed to extract text and structural information from various file formats, including PDF, CSV, Excel, and images, with automatic processor selection and OCR fallback.
Universal File Reader MCP
Universal File Reader is an MCP server for extracting text and structural information from PDF, CSV, Excel and image files. The server automatically selects the appropriate processor and can fall back to OCR when needed.
Installation
pip install -e .
Running the server
universal-file-reader
Running over SSH
You can also run the server on a remote machine via SSH:
ssh user@remote-host universal-file-reader
In Googleβs ADK this command can be used with ToolSubprocess
to
communicate with the remote server.
The server exposes three tools: read_file
, get_supported_formats
and validate_file
. See src/document_reader/mcp_server.py
for detailed schemas.
Running the API server
You can also start a REST API that wraps the same functionality. The service exposes three endpoints:
POST /mcp
β accept MCP JSON messagesPOST /upload
β upload a file and receive the server pathPOST /test
β upload a file and process it with optional parameters
Start the server locally:
universal-file-reader-api
Using Docker Compose:
docker-compose up
The API will be available on http://localhost:8000.
MCP message format
POST /mcp
expects a JSON body with the following structure:
{
"tool": "read_file",
"arguments": {
"file_path": "/path/to/file.pdf",
"output_format": "markdown"
}
}
The tool
field corresponds to one of the MCP tools (read_file
,
get_supported_formats
, or validate_file
). arguments
contains the
parameters for that tool.
Environment variables
MODEL_NAME
β Vision capable model name (e.g.gpt-4o
,gemini-2.0-flash
)MODEL_API_KEY
β API key for the selected modelMAX_PAGE_PER_PROCESS
β Maximum number of PDF pages processed in one OCR batch.OCR_TIMEOUT_SECONDS
β Processing timeout in seconds per PDF page (default 30)TIMEOUT_SECONDS
β Global processing timeoutEXTRACT_IMAGES
β When set totrue
, extracted PDF images are OCR processed and appended to the output
Running tests
Install dependencies and run the test suite:
pip install -r requirements.txt
pytest
Development
pip install -e .[development]
ruff check