xxx87/pdf-to-text-mcp
If you are the rightful owner of pdf-to-text-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
A Model Context Protocol (MCP) server for converting PDF files to text, designed for seamless integration with Cursor IDE and other MCP-compatible applications.
📄 PDF to Text MCP Server
A Model Context Protocol (MCP) server for converting PDF files to text, designed for seamless integration with Cursor IDE and other MCP-compatible applications.
🚀 Quick Start
# Clone the repository
git clone https://github.com/xxx87/pdf-to-text-mcp.git
cd pdf-to-text-mcp-server
# Install dependencies
yarn install
# Build the project
yarn build
# Test the server
yarn test
✨ Features
- 📑 Multi-file Support - Convert one or multiple PDF files simultaneously
- 🔍 Text Extraction - Extract text while preserving document structure
- ⚡ Fast Processing - Efficient PDF parsing with
pdf-parselibrary - 🔧 MCP Protocol - Full Model Context Protocol compliance
- 🎯 Cursor Integration - Designed specifically for Cursor IDE
- 🛡️ TypeScript - Fully typed for better development experience
- ✅ Testing - Comprehensive test suite included
📋 Table of Contents
🛠️ Installation
Prerequisites
- Node.js 18+
- Yarn package manager
- Cursor IDE (for MCP integration)
Local Installation
-
Clone the repository
git clone https://github.com/xxx87/pdf-to-text-mcp.git cd pdf-to-text-mcp-server -
Install dependencies
yarn install -
Build the project
yarn build -
Verify installation
yarn test
🎯 Usage
Running as Standalone Server
yarn start
Integration with Cursor IDE
-
Add to Cursor Configuration
Add the following to your Cursor MCP settings:
{ "mcpServers": { "pdf-to-text": { "command": "node", "args": ["/absolute/path/to/pdf-to-text-mcp-server/dist/index.js"], "cwd": "/absolute/path/to/pdf-to-text-mcp-server" } } }⚠️ Important: Replace
/absolute/path/to/pdf-to-text-mcp-serverwith your actual project path. -
Using in Cursor
- Add PDFs: Drag and drop PDF files into Cursor
- Convert: Use the
pdf_to_texttool for automatic conversion - Analyze: The extracted text becomes available for AI analysis
Manual MCP Usage
// Example MCP JSON-RPC request
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "pdf_to_text",
"arguments": {
"file_paths": ["document1.pdf", "document2.pdf"]
}
}
}
⚙️ Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
NODE_ENV | Environment mode | production |
LOG_LEVEL | Logging level | info |
Custom Options
The server automatically handles PDF parsing with optimized settings. For custom configurations, modify the pdf-parse options in src/index.ts.
📚 API Reference
Tools
pdf_to_text
Converts PDF files to readable text format.
Parameters:
file_paths(string[]): Array of PDF file paths to convert
Returns:
{
content: [
{
type: "text",
text: string // Extracted text with file separators
}
];
}
Example Response:
{
"content": [
{
"type": "text",
"text": "Successfully converted 2 PDF file(s) to text:\n\n=== document1.pdf ===\nExtracted content here...\n\n=== document2.pdf ===\nMore content here..."
}
]
}
🏗️ Development
Project Structure
pdf-to-text-mcp-server/
├── src/
│ ├── index.ts # Main MCP server implementation
│ └── types/
│ └── pdf-parse.d.ts # Type definitions
├── dist/ # Compiled JavaScript output
├── test-server.js # Test utilities
├── package.json # Project configuration
├── tsconfig.json # TypeScript configuration
├── cursor-config.json # Example Cursor configuration
└── README.md # This file
Available Scripts
| Script | Description |
|---|---|
yarn build | Compile TypeScript to JavaScript |
yarn start | Run the compiled server |
yarn dev | Run in development mode with hot reload |
yarn test | Execute test suite |
yarn lint | Run code linting |
Building from Source
# Development mode with file watching
yarn dev
# Production build
yarn build
# Run tests
yarn test
Dependencies
| Package | Purpose | Version |
|---|---|---|
@modelcontextprotocol/sdk | MCP protocol implementation | ^0.5.0 |
pdf-parse | PDF text extraction | ^1.1.1 |
zod | Runtime type validation | ^3.22.4 |
typescript | TypeScript compiler | ^5.0.0 |
🐛 Troubleshooting
Common Issues
| Issue | Cause | Solution |
|---|---|---|
ENOENT: no such file or directory | Invalid file path | Verify PDF file exists and path is correct |
File is not a PDF | Wrong file format | Ensure file has .pdf extension and is valid |
| Empty text output | Image-based PDF | This tool only extracts text-based content |
| Build errors | Missing dependencies | Run yarn install to install all dependencies |
Debug Mode
Enable verbose logging:
NODE_ENV=development yarn start
Testing
Run the comprehensive test suite:
# Run all tests
yarn test
# Test with specific PDF
echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {"name": "pdf_to_text", "arguments": {"file_paths": ["your-file.pdf"]}}}' | node dist/index.js
🤝 Contributing
We welcome contributions! Please see our for details.
Development Setup
- Fork the repository
- Clone your fork
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Test thoroughly:
yarn test - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
Code Style
- Follow existing TypeScript conventions
- Add tests for new features
- Update documentation as needed
- Ensure all tests pass
📄 License
This project is licensed under the MIT License - see the file for details.
🙏 Acknowledgments
- Model Context Protocol for the excellent MCP specification
- pdf-parse for reliable PDF text extraction
- Cursor IDE for MCP integration support
📞 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
Made with ❤️ for the MCP community