pdf-mcp-server by an1shthomas - MCP Server

PDF Tools MCP Server

Overview

This repository contains a Spring Boot application that implements the Model Context Protocol (MCP) server for PDF processing operations. The application creates a lightweight server that exposes PDF manipulation tools through the Spring AI MCP framework, allowing AI models to interact with PDF files using your Docker container API.

The server exposes five main PDF processing tools:

Get page count from PDF files
Extract text content from PDFs
Retrieve PDF metadata
Compress PDF files
Split PDFs into individual pages

This implementation serves as an excellent foundation for integrating PDF processing capabilities with AI models through the Model Context Protocol.

Project Requirements

Java 21+
Maven 3.8+
Spring Boot 3.4.4
Spring AI 1.0.0-M6
Docker container running PDF processing API on port 8081

Dependencies

The project relies on the following key dependencies:

Spring AI MCP Server: Provides the foundation for creating MCP-compatible servers

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-mcp-server-spring-boot-starter</artifactId>
</dependency>

Spring Boot WebFlux: For reactive HTTP client operations

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-webflux</artifactId>
</dependency>

Spring Boot Test: For testing the application

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
</dependency>

Getting Started

Prerequisites

Before running the application, make sure you have:

Java 21+ installed on your system
Maven installed for dependency management
Docker container with PDF processing API running on port 8081
Basic understanding of Spring Boot applications

Docker Container Requirements

Your PDF processing Docker container must expose the following API endpoints:

POST /api/v1/analysis/page-count - Get page count
POST /api/v1/convert/pdf/txt - Extract text content
POST /api/v1/analysis/metadata - Get PDF metadata
POST /api/v1/misc/compress-pdf - Compress PDF files
POST /api/v1/general/split-pages - Split PDF into pages

Setting Up the Project

Review the project structure to understand the components:
- PdfAnalysisResult.java: Record for standardizing PDF operation responses
- PdfService.java: Service with MCP tool annotations for PDF operations
- CoursesApplication.java: Main application class with tool registration
- application.properties: Configuration for the MCP server and PDF API

The application is configured to run as a non-web application using STDIO transport for MCP communication:

spring.main.web-application-type=none
spring.ai.mcp.server.name=pdf-tools-mcp
spring.ai.mcp.server.version=1.0.0

# PDF API Configuration
pdf.api.base-url=http://localhost:8081

# These settings are critical for STDIO transport
spring.main.banner-mode=off
logging.pattern.console=

How to Build and Run

Building the Application

mvn clean package

Running the Application

mvn spring-boot:run

Or run the JAR directly:

java -jar target/pdf-tools-mcp-1.0.0-SNAPSHOT.jar

The application will start as a Model Control Protocol server accessible via standard input/output. It doesn't open any network ports or provide a web interface, as indicated by the spring.main.web-application-type=none configuration.

When running, the server registers five PDF processing tools with the MCP:

pdf_get_page_count: Get the number of pages in a PDF file
pdf_extract_text: Extract text content from a PDF file
pdf_get_metadata: Get metadata information from a PDF file
pdf_compress: Compress a PDF file to reduce its size
pdf_split_pages: Split a PDF file into individual pages

Understanding the Code

PDF Data Model

The application uses a record to represent PDF operation results:

public record PdfAnalysisResult(
    String operation,
    String filename,
    Object result,
    boolean success,
    String message
) {
    // Factory methods for success and error responses
}

Implementing PDF Tools

The PdfService class demonstrates how to create MCP tools using the @Tool annotation:

@Service
public class PdfService {
    @Tool(name = "pdf_get_page_count", description = "Get the number of pages in a PDF file")
    public PdfAnalysisResult getPageCount(String filePath) {
        // Implementation that calls Docker container API
    }
    
    @Tool(name = "pdf_extract_text", description = "Extract text content from a PDF file")
    public PdfAnalysisResult extractText(String filePath) {
        // Implementation that calls Docker container API
    }
    
    // Additional PDF tools...
}

The @Tool annotation transforms regular methods into MCP-compatible tools with:

A unique name for identification
A description that helps AI models understand the tool's purpose

Registering Tools with MCP

In the main application class, tools are registered with the MCP framework:

@SpringBootApplication
public class CoursesApplication {
    @Bean
    public List<ToolCallback> pdfTools(PdfService pdfService) {
        return List.of(ToolCallbacks.from(pdfService));
    }
}

Configuration

PDF API Configuration

Configure the Docker container API endpoint in application.properties:

pdf.api.base-url=http://localhost:8081

To use a different host or port, update this property accordingly.

Using the MCP Server with AI Models

To utilize this MCP server with AI models:

Ensure your AI framework supports the Model Control Protocol
Ensure your Docker container is running and accessible on port 8081
Connect the AI model to the MCP server using STDIO transport
The AI model can then invoke the exposed PDF tools

Configuration for LM Studio

To use this MCP server with LM Studio, add the following configuration to your mcp_servers.json:

{
  "servers": {
    "pdf-tools-mcp": {
      "command": "java",
      "args": [
        "-jar",
        "/Users/anishthomas/Downloads/dv-courses-mcp-master/target/pdf-tools-mcp-1.0.0-SNAPSHOT.jar"
      ],
      "description": "PDF Processing Tools MCP Server"
    }
  }
}

Important Notes:

Make sure to use the correct JAR file name: pdf-tools-mcp-1.0.0-SNAPSHOT.jar
Adjust the full path to match your specific environment
If you previously had a configuration pointing to courses-0.0.1-SNAPSHOT.jar, update it to the new JAR name
After updating the configuration, restart LM Studio or reload the MCP server

Common Configuration Error: If you see an error like "Unable to access jarfile", ensure you're using the correct JAR file path. The project was renamed from courses to pdf-tools-mcp, so the JAR file name changed accordingly.

Configuration for Claude Desktop

To use this MCP server with Claude Desktop, add the following to your claude_desktop_config.json:

{
  "mcpServers": {
    "pdf-tools-mcp": {
      "command": "java",
      "args": [
        "-jar",
        "/path/to/pdf-tools-mcp-1.0.0-SNAPSHOT.jar"
      ]
    }
  }
}

Sample Usage

Once integrated with an MCP client, you can use prompts like:

"How many pages are in the PDF file /Users/username/Documents/report.pdf?"
"Extract text from /path/to/contract.pdf and summarize the key points"
"Get metadata from /Users/username/Documents/presentation.pdf"
"Compress /path/to/large-file.pdf and save it as /path/to/compressed.pdf"
"Split /Users/username/Documents/multi-page.pdf into individual pages in /Users/username/Documents/output/"

Error Handling

The PDF tools include comprehensive error handling for:

File not found errors
HTTP errors from the Docker container API
Network timeouts
Invalid file paths
API unavailability

Performance Considerations

File size limits: WebClient configured to handle files up to 50MB
Timeouts: 30-120 seconds depending on operation complexity
Memory usage: Large PDF operations may require additional JVM memory

Troubleshooting

Common Issues

Docker container not accessible: Ensure your container is running on port 8081
File not found errors: Use absolute file paths
Timeout errors: Large PDFs may require longer processing time
Connection refused: Verify Docker container port mapping

Testing Docker Container

Test your Docker container directly:

curl -X POST http://localhost:8081/api/v1/analysis/page-count \
  -F "fileInput=@/path/to/test.pdf"

Extending the Project

You can extend this project by:

Adding more PDF operations: Implement additional @Tool annotated methods
Supporting more file formats: Extend to handle other document types
Adding batch processing: Implement tools for processing multiple files
Enhanced error handling: Add more sophisticated error recovery

Conclusion

This PDF Tools MCP Server provides a clean, extensible framework for exposing PDF processing capabilities through the Model Context Protocol. By leveraging Spring AI conventions and the tool annotation system, you can create powerful integrations between AI models and PDF processing services.

The project demonstrates how to structure your code for MCP compatibility while maintaining good software design practices and integrating with external Docker-based services.

For more information about Spring AI and the Model Context Protocol, refer to the official documentation.