xxx87/pdf-to-text-mcp
If you are the rightful owner of pdf-to-text-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A Model Context Protocol (MCP) server for converting PDF files to text, designed for seamless integration with Cursor IDE and other MCP-compatible applications.
š PDF to Text MCP Server
A Model Context Protocol (MCP) server for converting PDF files to text, designed for seamless integration with Cursor IDE and other MCP-compatible applications.
š Quick Start
# Clone the repository
git clone https://github.com/xxx87/pdf-to-text-mcp.git
cd pdf-to-text-mcp-server
# Install dependencies
yarn install
# Build the project
yarn build
# Test the server
yarn test
⨠Features
- š Multi-file Support - Convert one or multiple PDF files simultaneously
- š Text Extraction - Extract text while preserving document structure
- ā” Fast Processing - Efficient PDF parsing with
pdf-parse
library - š§ MCP Protocol - Full Model Context Protocol compliance
- šÆ Cursor Integration - Designed specifically for Cursor IDE
- š”ļø TypeScript - Fully typed for better development experience
- ā Testing - Comprehensive test suite included
š Table of Contents
š ļø Installation
Prerequisites
- Node.js 18+
- Yarn package manager
- Cursor IDE (for MCP integration)
Local Installation
-
Clone the repository
git clone https://github.com/xxx87/pdf-to-text-mcp.git cd pdf-to-text-mcp-server
-
Install dependencies
yarn install
-
Build the project
yarn build
-
Verify installation
yarn test
šÆ Usage
Running as Standalone Server
yarn start
Integration with Cursor IDE
-
Add to Cursor Configuration
Add the following to your Cursor MCP settings:
{ "mcpServers": { "pdf-to-text": { "command": "node", "args": ["/absolute/path/to/pdf-to-text-mcp-server/dist/index.js"], "cwd": "/absolute/path/to/pdf-to-text-mcp-server" } } }
ā ļø Important: Replace
/absolute/path/to/pdf-to-text-mcp-server
with your actual project path. -
Using in Cursor
- Add PDFs: Drag and drop PDF files into Cursor
- Convert: Use the
pdf_to_text
tool for automatic conversion - Analyze: The extracted text becomes available for AI analysis
Manual MCP Usage
// Example MCP JSON-RPC request
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "pdf_to_text",
"arguments": {
"file_paths": ["document1.pdf", "document2.pdf"]
}
}
}
āļø Configuration
Environment Variables
Variable | Description | Default |
---|---|---|
NODE_ENV | Environment mode | production |
LOG_LEVEL | Logging level | info |
Custom Options
The server automatically handles PDF parsing with optimized settings. For custom configurations, modify the pdf-parse
options in src/index.ts
.
š API Reference
Tools
pdf_to_text
Converts PDF files to readable text format.
Parameters:
file_paths
(string[]): Array of PDF file paths to convert
Returns:
{
content: [
{
type: "text",
text: string // Extracted text with file separators
}
];
}
Example Response:
{
"content": [
{
"type": "text",
"text": "Successfully converted 2 PDF file(s) to text:\n\n=== document1.pdf ===\nExtracted content here...\n\n=== document2.pdf ===\nMore content here..."
}
]
}
šļø Development
Project Structure
pdf-to-text-mcp-server/
āāā src/
ā āāā index.ts # Main MCP server implementation
ā āāā types/
ā āāā pdf-parse.d.ts # Type definitions
āāā dist/ # Compiled JavaScript output
āāā test-server.js # Test utilities
āāā package.json # Project configuration
āāā tsconfig.json # TypeScript configuration
āāā cursor-config.json # Example Cursor configuration
āāā README.md # This file
Available Scripts
Script | Description |
---|---|
yarn build | Compile TypeScript to JavaScript |
yarn start | Run the compiled server |
yarn dev | Run in development mode with hot reload |
yarn test | Execute test suite |
yarn lint | Run code linting |
Building from Source
# Development mode with file watching
yarn dev
# Production build
yarn build
# Run tests
yarn test
Dependencies
Package | Purpose | Version |
---|---|---|
@modelcontextprotocol/sdk | MCP protocol implementation | ^0.5.0 |
pdf-parse | PDF text extraction | ^1.1.1 |
zod | Runtime type validation | ^3.22.4 |
typescript | TypeScript compiler | ^5.0.0 |
š Troubleshooting
Common Issues
Issue | Cause | Solution |
---|---|---|
ENOENT: no such file or directory | Invalid file path | Verify PDF file exists and path is correct |
File is not a PDF | Wrong file format | Ensure file has .pdf extension and is valid |
Empty text output | Image-based PDF | This tool only extracts text-based content |
Build errors | Missing dependencies | Run yarn install to install all dependencies |
Debug Mode
Enable verbose logging:
NODE_ENV=development yarn start
Testing
Run the comprehensive test suite:
# Run all tests
yarn test
# Test with specific PDF
echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {"name": "pdf_to_text", "arguments": {"file_paths": ["your-file.pdf"]}}}' | node dist/index.js
š¤ Contributing
We welcome contributions! Please see our for details.
Development Setup
- Fork the repository
- Clone your fork
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes
- Test thoroughly:
yarn test
- Commit changes:
git commit -m 'Add amazing feature'
- Push to branch:
git push origin feature/amazing-feature
- Open a Pull Request
Code Style
- Follow existing TypeScript conventions
- Add tests for new features
- Update documentation as needed
- Ensure all tests pass
š License
This project is licensed under the MIT License - see the file for details.
š Acknowledgments
- Model Context Protocol for the excellent MCP specification
- pdf-parse for reliable PDF text extraction
- Cursor IDE for MCP integration support
š Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
Made with ā¤ļø for the MCP community
ā Star this repo ⢠š Report Bug ⢠š” Request Feature