NimbleBrainInc/mcp-pdfco
If you are the rightful owner of mcp-pdfco and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
PDF.co MCP Server provides a comprehensive suite of tools for PDF manipulation, conversion, and automation, supporting a wide range of document processing needs.
PDF.co MCP Server
MCP Server - Abstract NimbleTools Registry NimbleBrain Platform Discord
Python 3.13+ License: MIT CI
MCP server for PDF.co API. Comprehensive PDF manipulation, conversion, OCR, text extraction, and document automation with support for barcodes, watermarks, and security features.
Features
- Text Extraction: Extract text from PDFs
- Format Conversion: PDF ↔ HTML, images, CSV
- OCR: Optical Character Recognition for scanned documents
- Merging & Splitting: Combine or separate PDFs
- Watermarking: Add text or image watermarks
- Security: Password protection and removal
- Compression: Reduce PDF file sizes
- Table Extraction: Extract tables to structured CSV
- Barcode Operations: Generate and read barcodes
- Web to PDF: Convert URLs to PDF documents
Setup
Prerequisites
- PDF.co account (free or paid)
- API key
Environment Variables
PDFCO_API_KEY
(required): Your PDF.co API key
How to get credentials:
- Go to pdf.co
- Sign up or log in
- Go to Dashboard (app.pdf.co/dashboard)
- Find your API key in the API section
- Copy the key and store as
PDFCO_API_KEY
Direct link: https://app.pdf.co/dashboard
Rate Limits
- Free Tier: 150 requests per month
- Paid Plans: Unlimited requests based on plan
- Monitor usage in dashboard
- Consider caching results for frequently accessed documents
Input Formats
PDF.co supports multiple input methods:
- Public URL: Direct link to a PDF or image
- Base64: Base64-encoded file content
- Temporary Storage: Upload to PDF.co temp storage first
Available Tools
Text Extraction
pdf_to_text
Extract text from PDF.
Parameters:
url
(string, required): PDF URL or base64pages
(string, optional): Page range (e.g., "1-3" or "1,3,5")async_mode
(bool, optional): Async processing (default: false)
Example:
# Extract all text
result = await pdf_to_text(
url="https://example.com/document.pdf"
)
# Extract specific pages
result = await pdf_to_text(
url="https://example.com/document.pdf",
pages="1-5"
)
# Async for large files
result = await pdf_to_text(
url="https://example.com/large.pdf",
async_mode=True
)
# Check job status with returned job ID
pdf_to_json
Extract structured data from PDF.
Parameters:
url
(string, required): PDF URL or base64pages
(string, optional): Page range
Example:
result = await pdf_to_json(
url="https://example.com/invoice.pdf"
)
# Returns structured JSON with:
# - Text content
# - Positioning information
# - Formatting details
Format Conversion
pdf_to_html
Convert PDF to HTML.
Parameters:
url
(string, required): PDF URL or base64pages
(string, optional): Page rangesimple
(bool, optional): Simple HTML mode (default: false)
Example:
# Full HTML with formatting
result = await pdf_to_html(
url="https://example.com/report.pdf"
)
# Simple HTML without complex formatting
result = await pdf_to_html(
url="https://example.com/report.pdf",
simple=True
)
html_to_pdf
Convert HTML to PDF.
Parameters:
html
(string, required): HTML content or URLname
(string, optional): Output filename (default: document.pdf)margins
(string, optional): Margins "top,right,bottom,left"orientation
(string, optional): Portrait or Landscape (default: Portrait)page_size
(string, optional): Letter, A4, Legal, etc. (default: Letter)
Page Sizes:
- Letter (8.5 x 11 inches)
- Legal (8.5 x 14 inches)
- A4 (210 x 297 mm)
- A3 (297 x 420 mm)
Example:
# From HTML string
result = await html_to_pdf(
html="<html><body><h1>Invoice</h1><p>Total: $100</p></body></html>",
name="invoice.pdf",
page_size="Letter",
orientation="Portrait"
)
# From URL
result = await html_to_pdf(
html="https://example.com/report.html",
margins="10mm,10mm,10mm,10mm"
)
url_to_pdf
Convert web page URL to PDF.
Parameters:
url
(string, required): Web page URLname
(string, optional): Output filename (default: webpage.pdf)orientation
(string, optional): Portrait or Landscapepage_size
(string, optional): Letter, A4, Legal, etc.
Example:
result = await url_to_pdf(
url="https://example.com/article",
name="article.pdf",
page_size="A4"
)
# Useful for:
# - Archiving web pages
# - Generating reports from dashboards
# - Saving online content
image_to_pdf
Convert images to PDF.
Parameters:
images
(list, required): List of image URLs or base64name
(string, optional): Output filename (default: images.pdf)
Example:
# Multiple images to single PDF
result = await image_to_pdf(
images=[
"https://example.com/page1.jpg",
"https://example.com/page2.jpg",
"https://example.com/page3.jpg"
],
name="scanned-doc.pdf"
)
# Each image becomes a page
Table Extraction
pdf_to_csv
Extract tables from PDF to CSV.
Parameters:
url
(string, required): PDF URL or base64pages
(string, optional): Page range
Example:
result = await pdf_to_csv(
url="https://example.com/financial-report.pdf",
pages="1-10"
)
# Returns CSV with table data
# Preserves:
# - Column structure
# - Row alignment
# - Cell values
PDF Manipulation
pdf_merge
Merge multiple PDFs into one.
Parameters:
urls
(list, required): List of PDF URLs or base64name
(string, optional): Output filename (default: merged.pdf)async_mode
(bool, optional): Async processing (default: false)
Example:
result = await pdf_merge(
urls=[
"https://example.com/part1.pdf",
"https://example.com/part2.pdf",
"https://example.com/part3.pdf"
],
name="complete-document.pdf"
)
# Useful for:
# - Combining reports
# - Merging invoices
# - Creating compilations
pdf_split
Split PDF into separate pages or ranges.
Parameters:
url
(string, required): PDF URL or base64pages
(string, optional): Page ranges (e.g., "1-3,5-7")split_by_pages
(bool, optional): Split into individual pages (default: false)
Example:
# Extract specific pages
result = await pdf_split(
url="https://example.com/document.pdf",
pages="1-3,10-12"
)
# Split into individual pages
result = await pdf_split(
url="https://example.com/document.pdf",
split_by_pages=True
)
pdf_rotate
Rotate PDF pages.
Parameters:
url
(string, required): PDF URL or base64angle
(int, required): Rotation angle (90, 180, 270, -90)pages
(string, optional): Page range
Example:
# Rotate all pages
result = await pdf_rotate(
url="https://example.com/sideways.pdf",
angle=90
)
# Rotate specific pages
result = await pdf_rotate(
url="https://example.com/mixed.pdf",
angle=180,
pages="2,4,6"
)
Information & Metadata
pdf_info
Get PDF metadata (pages, size, etc.).
Parameters:
url
(string, required): PDF URL or base64
Example:
info = await pdf_info(url="https://example.com/document.pdf")
# Returns:
# {
# "pageCount": 25,
# "info": {
# "Title": "Annual Report",
# "Author": "Company Name",
# "CreationDate": "2025-01-01",
# "Producer": "Adobe PDF",
# "fileSize": 1024000
# }
# }
Watermarking
pdf_add_watermark
Add text watermark to PDF.
Parameters:
url
(string, required): PDF URL or base64text
(string, required): Watermark textx
(int, optional): X position (default: 100)y
(int, optional): Y position (default: 100)font_size
(int, optional): Font size (default: 24)color
(string, optional): Hex color without # (default: FF0000)opacity
(float, optional): Opacity 0.0-1.0 (default: 0.5)
Example:
# Add "CONFIDENTIAL" watermark
result = await pdf_add_watermark(
url="https://example.com/document.pdf",
text="CONFIDENTIAL",
x=200,
y=400,
font_size=48,
color="FF0000",
opacity=0.3
)
# Add "DRAFT" watermark
result = await pdf_add_watermark(
url="https://example.com/draft.pdf",
text="DRAFT",
font_size=72,
color="808080",
opacity=0.5
)
Compression
pdf_compress
Compress PDF file size.
Parameters:
url
(string, required): PDF URL or base64compression_level
(string, optional): low, balanced, high, extreme (default: balanced)
Compression Levels:
- low: Minimal compression, best quality
- balanced: Good balance of size and quality
- high: Aggressive compression, smaller files
- extreme: Maximum compression, may reduce quality
Example:
# Balanced compression
result = await pdf_compress(
url="https://example.com/large.pdf",
compression_level="balanced"
)
# Extreme compression for email
result = await pdf_compress(
url="https://example.com/photos.pdf",
compression_level="extreme"
)
# Typical size reductions:
# - low: 10-20%
# - balanced: 30-50%
# - high: 50-70%
# - extreme: 70-90%
Security
pdf_protect
Add password protection to PDF.
Parameters:
url
(string, required): PDF URL or base64owner_password
(string, required): Owner password (full access)user_password
(string, optional): User password (restricted access)allow_print
(bool, optional): Allow printing (default: true)allow_copy
(bool, optional): Allow copying text (default: false)
Example:
# Protect with owner password
result = await pdf_protect(
url="https://example.com/sensitive.pdf",
owner_password="admin123",
user_password="user123",
allow_print=True,
allow_copy=False
)
# Owner password: Full permissions
# User password: View-only or restricted permissions
pdf_unlock
Remove password from PDF.
Parameters:
url
(string, required): PDF URL or base64password
(string, required): PDF password
Example:
result = await pdf_unlock(
url="https://example.com/protected.pdf",
password="secret123"
)
OCR
ocr_pdf
OCR scanned PDFs to searchable text.
Parameters:
url
(string, required): PDF URL or base64pages
(string, optional): Page rangelang
(string, optional): Language code (default: eng)
Supported Languages:
- eng: English
- spa: Spanish
- fra: French
- deu: German
- ita: Italian
- por: Portuguese
- rus: Russian
- chi_sim: Chinese Simplified
- jpn: Japanese
- kor: Korean
Example:
# OCR scanned document
result = await ocr_pdf(
url="https://example.com/scanned-invoice.pdf",
lang="eng"
)
# OCR specific pages
result = await ocr_pdf(
url="https://example.com/mixed.pdf",
pages="1-5",
lang="eng"
)
# Useful for:
# - Scanned invoices
# - Old documents
# - Faxes
# - Screenshots
Barcode Operations
barcode_generate
Generate barcode images.
Parameters:
value
(string, required): Barcode value/textbarcode_type
(string, optional): Barcode type (default: QRCode)format
(string, optional): png, jpg, svg (default: png)
Barcode Types:
- QRCode: 2D QR codes
- Code128: Common 1D barcode
- Code39: Alphanumeric barcode
- EAN13: 13-digit product barcode
- EAN8: 8-digit product barcode
- UPC-A: 12-digit product barcode
- ITF: Interleaved 2 of 5
- Codabar: Numeric with special chars
Example:
# Generate QR code
result = await barcode_generate(
value="https://example.com/product/12345",
barcode_type="QRCode",
format="png"
)
# Generate product barcode
result = await barcode_generate(
value="012345678905",
barcode_type="EAN13",
format="png"
)
# Generate Code128
result = await barcode_generate(
value="ABC-123-XYZ",
barcode_type="Code128"
)
barcode_read
Read barcodes from images.
Parameters:
url
(string, required): Image URL or base64barcode_types
(list, optional): Types to detect
Example:
# Auto-detect all barcode types
result = await barcode_read(
url="https://example.com/product-image.jpg"
)
# Detect specific types only
result = await barcode_read(
url="https://example.com/qr-code.png",
barcode_types=["QRCode", "Code128"]
)
# Returns:
# {
# "barcodes": [
# {
# "type": "QRCode",
# "value": "https://example.com/product",
# "position": {...}
# }
# ]
# }
Common Workflows
Invoice Text Extraction
# 1. Extract text from invoice
text_result = await pdf_to_text(
url="https://example.com/invoice.pdf"
)
# 2. Extract tables for line items
csv_result = await pdf_to_csv(
url="https://example.com/invoice.pdf"
)
# 3. Process extracted data
invoice_text = text_result["body"]
table_data = csv_result["body"]
# Parse amounts, dates, etc.
Document Archival
# 1. Compress for storage
compressed = await pdf_compress(
url="https://example.com/document.pdf",
compression_level="high"
)
# 2. Add password protection
protected = await pdf_protect(
url=compressed["url"],
owner_password="archive2025",
allow_print=True,
allow_copy=False
)
# 3. Add watermark
archived = await pdf_add_watermark(
url=protected["url"],
text="ARCHIVED 2025",
opacity=0.2
)
Receipt OCR and Parsing
# 1. OCR scanned receipt
ocr_result = await ocr_pdf(
url="https://example.com/receipt-scan.pdf",
lang="eng"
)
# 2. Extract text
text = ocr_result["body"]
# 3. Parse for:
# - Total amount
# - Date
# - Vendor name
# - Line items
Report Generation
# 1. Convert HTML template to PDF
report = await html_to_pdf(
html=f"""
<html>
<body>
<h1>Monthly Report</h1>
<p>Generated: {date}</p>
<table>...</table>
</body>
</html>
""",
page_size="Letter"
)
# 2. Add watermark
watermarked = await pdf_add_watermark(
url=report["url"],
text="CONFIDENTIAL"
)
Contract Merging
# 1. Merge contract sections
merged = await pdf_merge(
urls=[
"https://example.com/contract-cover.pdf",
"https://example.com/terms.pdf",
"https://example.com/signature-page.pdf"
],
name="complete-contract.pdf"
)
# 2. Add password protection
protected = await pdf_protect(
url=merged["url"],
owner_password="legal2025",
user_password="client123"
)
Best Practices
- Use async for large files: Enable async mode for PDFs over 10MB
- Specify page ranges: Extract only needed pages to save time
- Choose compression wisely: Balance quality and file size
- Cache results: Store processed files to avoid repeated operations
- Use OCR selectively: Only OCR scanned documents (check with pdf_info first)
- Validate barcodes: Check barcode format before generation
- Secure sensitive documents: Use password protection
- Monitor usage: Track API calls in dashboard
- Handle timeouts: Use appropriate timeout values for large operations
- Test watermark positioning: Preview watermark placement
Error Handling
Common errors:
- 401 Unauthorized: Invalid API key
- 402 Payment Required: Quota exceeded (free tier limit)
- 400 Bad Request: Invalid parameters or file format
- 404 Not Found: URL not accessible
- 500 Internal Server Error: Processing error
- Timeout: File too large, use async mode
Async Processing
For large files or batch operations:
# Start async job
result = await pdf_to_text(
url="https://example.com/large.pdf",
async_mode=True
)
job_id = result["jobId"]
# Check status (implement polling)
# GET /v1/job/check/{job_id}