baldawsari/pdf-downloader-mcp
If you are the rightful owner of pdf-downloader-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The PDF Downloader MCP is a specialized server designed for efficient and reliable downloading of PDF files, featuring advanced retry logic and error handling.
PDF Downloader MCP
A robust Model Context Protocol (MCP) server for downloading PDF files with advanced retry logic, error handling, and partial download recovery.
Features
? Core Functionality
- Single Purpose Tool: One tool (
download_pdf
) that does PDF downloading exceptionally well - Robust Retry Logic: Exponential backoff with jitter to handle network issues gracefully
- Partial Download Recovery: Resume interrupted downloads using HTTP Range requests
- Comprehensive Error Handling: Smart classification of errors to determine retry strategies
? Advanced Retry Strategies
Exponential Backoff
- First retry: Wait 5 seconds
- Second retry: Wait 10 seconds
- Third retry: Wait 20 seconds
- Prevents overwhelming servers with rapid retry attempts
Smart Error Handling
- Network timeouts: Retry with longer timeout
- HTTP 429 (Rate Limited): Wait longer, respect rate limits
- HTTP 503 (Server Unavailable): Retry after delay
- HTTP 404/403: Don't retry, return error immediately
- SSL/Certificate errors: Retry with different SSL settings
Partial Download Recovery
- Use HTTP Range requests to resume interrupted downloads
- Check if server supports
Accept-Ranges: bytes
- Resume from last downloaded byte position
- Fallback to full download if resume fails
Download Verification
- Verify file size matches
Content-Length
header - Basic PDF header validation (starts with
%PDF
) - Comprehensive PDF structure validation
- Retry if file appears corrupted
Fallback Strategies
- Try different User-Agent strings if blocked
- Attempt with/without SSL verification
- Multiple connection strategies for problematic servers
Installation
From Source
git clone https://github.com/baldawsari/pdf-downloader-mcp.git
cd pdf-downloader-mcp
pip install -e .
Development Installation
git clone https://github.com/baldawsari/pdf-downloader-mcp.git
cd pdf-downloader-mcp
pip install -e ".[dev]"
Usage
As an MCP Server
- Run the server:
pdf-downloader-mcp
- Or run as a Python module:
python -m pdf_downloader_mcp
Tool Parameters
The download_pdf
tool accepts the following parameters:
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
url | string | ? | - | Direct PDF URL |
destination_path | string | ? | - | Local folder path |
filename | string | ? | URL filename | Custom filename |
max_retries | integer | ? | 3 | Number of retry attempts (0-10) |
retry_delay | number | ? | 5.0 | Base delay between retries (0.1-60.0s) |
timeout | number | ? | 30.0 | Request timeout (5.0-300.0s) |
Example Usage
{
"tool": "download_pdf",
"arguments": {
"url": "https://example.com/document.pdf",
"destination_path": "/home/user/downloads",
"filename": "important_document.pdf",
"max_retries": 5,
"retry_delay": 10.0,
"timeout": 60.0
}
}
Response Format
Success Response
{
"success": true,
"local_path": "/home/user/downloads/important_document.pdf",
"file_size": 1234567,
"attempts_used": 2,
"max_retries": 5,
"download_time": 15.3,
"total_time": 25.8,
"average_speed": "4.85",
"resumed": false,
"bytes_downloaded": 1234567,
"error_message": null
}
Error Response
{
"success": false,
"local_path": null,
"file_size": 0,
"attempts_used": 6,
"max_retries": 5,
"download_time": 0,
"total_time": 45.2,
"average_speed": "0.00",
"resumed": false,
"bytes_downloaded": 0,
"error_message": "Failed after 6 attempts. Last error: HTTP 404: Not Found"
}
Error Categories
The downloader classifies errors into three categories for optimal retry behavior:
RETRY Errors
- Network timeouts
- Server errors (5xx)
- Rate limits (429)
- Connection errors
- SSL errors
NO_RETRY Errors
- File not found (404)
- Access denied (403)
- Bad request (400)
- Authentication required (401)
PARTIAL_RETRY Errors
- Incomplete downloads
- Corruption detected
- Validation failures
Configuration
MCP Server Configuration
Add to your MCP configuration file:
{
"mcpServers": {
"pdf-downloader": {
"command": "pdf-downloader-mcp",
"args": []
}
}
}
Environment Variables
Variable | Description | Default |
---|---|---|
PDF_DOWNLOADER_LOG_LEVEL | Logging level (DEBUG, INFO, WARNING, ERROR) | INFO |
PDF_DOWNLOADER_MAX_CONCURRENT | Maximum concurrent downloads | 5 |
Development
Running Tests
pytest
Code Formatting
black src/ tests/
isort src/ tests/
Type Checking
mypy src/
Linting
flake8 src/ tests/
Architecture
src/pdf_downloader_mcp/
??? __init__.py # Package initialization
??? __main__.py # CLI entry point
??? server.py # MCP server implementation
??? downloader.py # Core PDF downloader
??? exceptions.py # Custom exception classes
??? validators.py # PDF validation logic
??? utils.py # Utility functions
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the MIT License - see the file for details.
Acknowledgments
- Built with the Model Context Protocol (MCP)
- Uses aiohttp for robust HTTP operations
- Inspired by enterprise-grade download managers
Made with ?? for reliable PDF downloading