nutrient-pdf-mcp-server
If you are the rightful owner of nutrient-pdf-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A powerful Model Context Protocol server for LLM-driven PDF document analysis and exploration.
Nutrient PDF MCP Server
A powerful Model Context Protocol server for LLM-driven PDF document analysis and exploration
A Model Context Protocol (MCP) server for investigating PDF object trees with lazy loading support. This tool allows LLMs to efficiently explore PDF document structure without overwhelming token limits.
Features
- Lazy Loading: Explore PDF structure without loading entire object trees
- Path Navigation: Navigate through PDF objects using dot notation (e.g.,
Pages.Kids.0
) - Selective Resolution: Resolve specific indirect objects on demand
- Token Efficient: Massive reduction in response sizes compared to full tree dumps
- Type Safe: Comprehensive type hints and error handling
Installation
Quick Start
git clone https://github.com/PSPDFKit/nutrient-pdf-mcp-server.git
cd nutrient-pdf-mcp-server
make install-dev # Sets up development environment
For Claude Code CLI
Recommended: Build and Install
pip install build
make build
pipx install dist/nutrient_pdf_mcp-1.0.0-py3-none-any.whl
claude mcp add nutrient-pdf-mcp nutrient-pdf-mcp
Development Mode
make install-dev
claude mcp add nutrient-pdf-mcp "$(pwd)/venv/bin/python" -m pdf_mcp.server
Manual Configuration
{
"mcpServers": {
"nutrient-pdf-mcp": {
"command": "python",
"args": ["-m", "pdf_mcp.server"]
}
}
}
Available Tools
get_pdf_object_tree
Nutrient PDF MCP Server - Get JSON representation of PDF object tree with lazy loading.
Parameters:
pdf_path
(required): Path to the PDF fileobject_id
(optional): Specific object ID to retrieve (e.g., '1 0')path
(optional): Object path to navigate (e.g., 'Pages.Kids.0')mode
(optional): Parsing mode - 'lazy' (default) or 'full'
Examples:
{
"pdf_path": "document.pdf",
"mode": "lazy"
}
{
"pdf_path": "document.pdf",
"path": "Pages.Kids.0",
"mode": "lazy"
}
resolve_indirect_object
Nutrient PDF MCP Server - Resolve a specific indirect object by its object and generation numbers.
Parameters:
pdf_path
(required): Path to the PDF fileobjnum
(required): PDF object number (e.g., 3)gennum
(optional): PDF generation number (defaults to 0)depth
(optional): Resolution depth - 'shallow' (default) or 'deep'
Examples:
{
"pdf_path": "document.pdf",
"objnum": 3,
"gennum": 0,
"depth": "shallow"
}
Command Line Usage
# Run the server
make serve
# Or run with debug logging
make serve-debug
Architecture
Core Components
parser.py
: Main PDF parsing logic with lazy loading supportserver.py
: MCP server implementationtypes.py
: Type definitions for PDF objects and responsesexceptions.py
: Custom exception classes
Response Types
All PDF objects are serialized into a consistent JSON format:
{
"type": "dict",
"value": {
"/Type": {"type": "name", "value": "/Pages"},
"/Kids": {
"type": "array",
"value": [
{"type": "indirect_ref", "objnum": 2, "gennum": 0}
]
}
}
}
Token Efficiency
The lazy loading system provides massive token savings:
- Lazy mode: ~5-50 lines (minimal tokens)
- Shallow resolution: ~50-100 lines (reasonable tokens)
- Deep resolution: 500+ lines (use sparingly)
Examples
Exploring PDF Structure
- Get overview:
get_pdf_object_tree(path="document.pdf", mode="lazy")
- Navigate to pages:
get_pdf_object_tree(path="document.pdf", path="Pages", mode="lazy")
- Resolve specific page:
resolve_indirect_object(objnum=3, gennum=0, depth="shallow")
- Deep dive when needed:
resolve_indirect_object(objnum=3, gennum=0, depth="deep")
Path Navigation Examples
"Pages"
- Navigate to Pages object"Pages.Kids"
- Get Kids array from Pages"Pages.Kids.0"
- Get first page"Pages.Kids.0.MediaBox.2"
- Get width from MediaBox array
Development
Quick Start
# Set up development environment
make install-dev
# Run all quality checks (format, lint, typecheck, test)
make quality
# Or run individual commands
make test # Run tests
make format # Format code
make lint # Run linter
make typecheck # Type checking
Project Structure
nutrient-pdf-mcp-server/
āāā pdf_mcp/
ā āāā __init__.py
ā āāā server.py # MCP server
ā āāā parser.py # PDF parsing logic
ā āāā types.py # Type definitions
ā āāā exceptions.py # Custom exceptions
āāā tests/ # Test suite
āāā res/ # Sample PDFs
āāā pyproject.toml # Project configuration
āāā README.md
Publishing to PyPI
# Build the package
make build
# Upload to test PyPI first
twine upload --repository testpypi dist/*
# Upload to production PyPI
twine upload dist/*
After publishing, users can install with:
pipx install nutrient-pdf-mcp
# or
pip install --user nutrient-pdf-mcp
Contributing
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Ensure code quality checks pass
- Submit a pull request
License
MIT License - see LICENSE file for details.