Migueel0/mcp-pdf-reader
If you are the rightful owner of mcp-pdf-reader and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
MCP-PDF-Reader is a server designed to extract data from PDF files using Model Context Protocol (MCP) technology.
mcp-pdf-reader
MCP server that extracts text from PDF files using
pypdf.
Optionally, if Tesseract OCR is installed, the server can also
extract text from images embedded inside PDFs.
The server exposes a single MCP tool: read_pdf(file_path), which
returns a JSON-serializable dictionary containing the original file path
and the extracted text.
Project Structure
- server/ --- MCP tool definitions used by the server
- tools/ --- Utility modules;
pdf_reader.pycontains PDF extraction logic - main.py --- MCP server launcher
- requirements.txt --- Runtime dependencies
Quick Summary
Tool: read_pdf(file_path)
Returns:
{
"file_path": "path/to/file.pdf",
"extracted_text": "Sample extracted text..."
}
Requirements
- Python ≥ 3.13 (recommended)
- virtualenv (recommended)
- Tesseract OCR (optional, required only for OCR on images inside PDFs)
Installing Dependencies (Windows)
- Create and activate a virtual environment:
python -m venv venv
.venv\Scripts\activate
- Verify your Python version:
python --version
- Install dependencies:
python -m pip install --upgrade pip
pip install -r requirements.txt
Installing Dependencies (Linux)
- Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate
- Verify your Python version:
python3 --version
- Install dependencies:
python3 -m pip install --upgrade pip
pip install -r requirements.txt
Installing Tesseract (Optional)
Follow the official installation instructions:
https://tesseract-ocr.github.io/tessdoc/Installation.html
You may need to manually add tesseract to your system PATH.
More details available in the Tesseract User Manual:
https://github.com/tesseract-ocr/tessdoc
Configuring Environment Variables
Rename env.example to .env and set your Tesseract binary path.
Example:
# Windows
TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe
# Linux
TESSERACT_CMD=/usr/bin/tesseract
Running the Server
Launch the MCP server using:
python main.py
If everything is configured correctly, the server will start and expose
the read_pdf tool.
MCP Usage Example
Call the tool from your MCP client:
Tool exposed:
read_pdf(file_path)
Example response:
{
"file_path": "sample/path/sample_file.pdf",
"extracted_text": "Sample text
..."
}
MCP Launcher Configuration
Windows
{
"mcpServers": {
"pdf-reader": {
"command": "PATH/TO/VENV/Scripts/python.exe",
"args": [
"PATH/TO/REPO/mcp-pdf-reader/main.py"
]
}
}
}
Linux
{
"mcpServers": {
"pdf-reader": {
"command": "PATH/TO/VENV/bin/python",
"args": [
"PATH/TO/REPO/mcp-pdf-reader/main.py"
]
}
}
}
Notes
This project is an early test version. Future updates will extend available tools and improve installation and deployment workflows.