an1shthomas/pdf-mcp-server
If you are the rightful owner of pdf-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
This repository contains a Spring Boot application implementing a Model Context Protocol (MCP) server for PDF processing operations.
pdf_get_page_count
Get the number of pages in a PDF file
pdf_extract_text
Extract text content from a PDF file
pdf_get_metadata
Get metadata information from a PDF file
pdf_compress
Compress a PDF file to reduce its size
pdf_split_pages
Split a PDF file into individual pages
PDF Tools MCP Server
Overview
This repository contains a Spring Boot application that implements the Model Context Protocol (MCP) server for PDF processing operations. The application creates a lightweight server that exposes PDF manipulation tools through the Spring AI MCP framework, allowing AI models to interact with PDF files using your Docker container API.
The server exposes five main PDF processing tools:
- Get page count from PDF files
- Extract text content from PDFs
- Retrieve PDF metadata
- Compress PDF files
- Split PDFs into individual pages
This implementation serves as an excellent foundation for integrating PDF processing capabilities with AI models through the Model Context Protocol.
Project Requirements
- Java 21+
- Maven 3.8+
- Spring Boot 3.4.4
- Spring AI 1.0.0-M6
- Docker container running PDF processing API on port 8081
Dependencies
The project relies on the following key dependencies:
-
Spring AI MCP Server: Provides the foundation for creating MCP-compatible servers
<dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-mcp-server-spring-boot-starter</artifactId> </dependency>
-
Spring Boot WebFlux: For reactive HTTP client operations
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-webflux</artifactId> </dependency>
-
Spring Boot Test: For testing the application
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency>
Getting Started
Prerequisites
Before running the application, make sure you have:
- Java 21+ installed on your system
- Maven installed for dependency management
- Docker container with PDF processing API running on port 8081
- Basic understanding of Spring Boot applications
Docker Container Requirements
Your PDF processing Docker container must expose the following API endpoints:
POST /api/v1/analysis/page-count
- Get page countPOST /api/v1/convert/pdf/txt
- Extract text contentPOST /api/v1/analysis/metadata
- Get PDF metadataPOST /api/v1/misc/compress-pdf
- Compress PDF filesPOST /api/v1/general/split-pages
- Split PDF into pages
Setting Up the Project
-
Review the project structure to understand the components:
PdfAnalysisResult.java
: Record for standardizing PDF operation responsesPdfService.java
: Service with MCP tool annotations for PDF operationsCoursesApplication.java
: Main application class with tool registrationapplication.properties
: Configuration for the MCP server and PDF API
-
The application is configured to run as a non-web application using STDIO transport for MCP communication:
spring.main.web-application-type=none spring.ai.mcp.server.name=pdf-tools-mcp spring.ai.mcp.server.version=1.0.0 # PDF API Configuration pdf.api.base-url=http://localhost:8081 # These settings are critical for STDIO transport spring.main.banner-mode=off logging.pattern.console=
How to Build and Run
Building the Application
mvn clean package
Running the Application
mvn spring-boot:run
Or run the JAR directly:
java -jar target/pdf-tools-mcp-1.0.0-SNAPSHOT.jar
The application will start as a Model Control Protocol server accessible via standard input/output. It doesn't open any network ports or provide a web interface, as indicated by the spring.main.web-application-type=none
configuration.
When running, the server registers five PDF processing tools with the MCP:
pdf_get_page_count
: Get the number of pages in a PDF filepdf_extract_text
: Extract text content from a PDF filepdf_get_metadata
: Get metadata information from a PDF filepdf_compress
: Compress a PDF file to reduce its sizepdf_split_pages
: Split a PDF file into individual pages
Understanding the Code
PDF Data Model
The application uses a record to represent PDF operation results:
public record PdfAnalysisResult(
String operation,
String filename,
Object result,
boolean success,
String message
) {
// Factory methods for success and error responses
}
Implementing PDF Tools
The PdfService
class demonstrates how to create MCP tools using the @Tool
annotation:
@Service
public class PdfService {
@Tool(name = "pdf_get_page_count", description = "Get the number of pages in a PDF file")
public PdfAnalysisResult getPageCount(String filePath) {
// Implementation that calls Docker container API
}
@Tool(name = "pdf_extract_text", description = "Extract text content from a PDF file")
public PdfAnalysisResult extractText(String filePath) {
// Implementation that calls Docker container API
}
// Additional PDF tools...
}
The @Tool
annotation transforms regular methods into MCP-compatible tools with:
- A unique name for identification
- A description that helps AI models understand the tool's purpose
Registering Tools with MCP
In the main application class, tools are registered with the MCP framework:
@SpringBootApplication
public class CoursesApplication {
@Bean
public List<ToolCallback> pdfTools(PdfService pdfService) {
return List.of(ToolCallbacks.from(pdfService));
}
}
Configuration
PDF API Configuration
Configure the Docker container API endpoint in application.properties
:
pdf.api.base-url=http://localhost:8081
To use a different host or port, update this property accordingly.
Using the MCP Server with AI Models
To utilize this MCP server with AI models:
- Ensure your AI framework supports the Model Control Protocol
- Ensure your Docker container is running and accessible on port 8081
- Connect the AI model to the MCP server using STDIO transport
- The AI model can then invoke the exposed PDF tools
Configuration for LM Studio
To use this MCP server with LM Studio, add the following configuration to your mcp_servers.json
:
{
"servers": {
"pdf-tools-mcp": {
"command": "java",
"args": [
"-jar",
"/Users/anishthomas/Downloads/dv-courses-mcp-master/target/pdf-tools-mcp-1.0.0-SNAPSHOT.jar"
],
"description": "PDF Processing Tools MCP Server"
}
}
}
Important Notes:
- Make sure to use the correct JAR file name:
pdf-tools-mcp-1.0.0-SNAPSHOT.jar
- Adjust the full path to match your specific environment
- If you previously had a configuration pointing to
courses-0.0.1-SNAPSHOT.jar
, update it to the new JAR name - After updating the configuration, restart LM Studio or reload the MCP server
Common Configuration Error:
If you see an error like "Unable to access jarfile", ensure you're using the correct JAR file path. The project was renamed from courses
to pdf-tools-mcp
, so the JAR file name changed accordingly.
Configuration for Claude Desktop
To use this MCP server with Claude Desktop, add the following to your claude_desktop_config.json
:
{
"mcpServers": {
"pdf-tools-mcp": {
"command": "java",
"args": [
"-jar",
"/path/to/pdf-tools-mcp-1.0.0-SNAPSHOT.jar"
]
}
}
}
Sample Usage
Once integrated with an MCP client, you can use prompts like:
- "How many pages are in the PDF file /Users/username/Documents/report.pdf?"
- "Extract text from /path/to/contract.pdf and summarize the key points"
- "Get metadata from /Users/username/Documents/presentation.pdf"
- "Compress /path/to/large-file.pdf and save it as /path/to/compressed.pdf"
- "Split /Users/username/Documents/multi-page.pdf into individual pages in /Users/username/Documents/output/"
Error Handling
The PDF tools include comprehensive error handling for:
- File not found errors
- HTTP errors from the Docker container API
- Network timeouts
- Invalid file paths
- API unavailability
Performance Considerations
- File size limits: WebClient configured to handle files up to 50MB
- Timeouts: 30-120 seconds depending on operation complexity
- Memory usage: Large PDF operations may require additional JVM memory
Troubleshooting
Common Issues
- Docker container not accessible: Ensure your container is running on port 8081
- File not found errors: Use absolute file paths
- Timeout errors: Large PDFs may require longer processing time
- Connection refused: Verify Docker container port mapping
Testing Docker Container
Test your Docker container directly:
curl -X POST http://localhost:8081/api/v1/analysis/page-count \
-F "fileInput=@/path/to/test.pdf"
Extending the Project
You can extend this project by:
- Adding more PDF operations: Implement additional
@Tool
annotated methods - Supporting more file formats: Extend to handle other document types
- Adding batch processing: Implement tools for processing multiple files
- Enhanced error handling: Add more sophisticated error recovery
Conclusion
This PDF Tools MCP Server provides a clean, extensible framework for exposing PDF processing capabilities through the Model Context Protocol. By leveraging Spring AI conventions and the tool annotation system, you can create powerful integrations between AI models and PDF processing services.
The project demonstrates how to structure your code for MCP compatibility while maintaining good software design practices and integrating with external Docker-based services.
For more information about Spring AI and the Model Context Protocol, refer to the official documentation.