pdfbox-mcp

amannm/pdfbox-mcp

3.2

If you are the rightful owner of pdfbox-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

PDFBox MCP Server is a Model Context Protocol server that leverages Apache PDFBox to provide PDF processing capabilities.

Tools
  1. extract_text

    Extract text content from a PDF file with optional page range.

  2. get_metadata

    Extract metadata from a PDF file including title, author, creation date, etc.

  3. get_page_count

    Get the number of pages in a PDF file.

PDFBox MCP Server

A Model Context Protocol (MCP) server that provides PDF processing capabilities using Apache PDFBox.

Features

  • extract_text - Extract text content from PDF files with optional page range support
  • get_metadata - Extract PDF metadata (title, author, creation date, etc.)
  • get_page_count - Get the total number of pages in a PDF file

Dependencies

  • Java 17+
  • Apache PDFBox 3.0.5
  • MCP SDK 0.10.0
  • Maven for build management

Usage

Build the project:

mvn compile

Run the server:

mvn exec:java

Tools

extract_text

Extract text content from a PDF file.

Parameters:

  • file_path (required): Path to the PDF file
  • page_range (optional): Page range (e.g., '1-5' or 'all')

get_metadata

Extract metadata from a PDF file including title, author, creation date, etc.

Parameters:

  • file_path (required): Path to the PDF file

get_page_count

Get the number of pages in a PDF file.

Parameters:

  • file_path (required): Path to the PDF file

Status

Currently in development. The core functionality is implemented but compilation requires MCP SDK API adjustments.