mcp-pdf-reader by a3tai - MCP Server

MCP PDF Reader

A robust open source Model Context Protocol (MCP) server for reading and analyzing PDF documents. This server enables AI assistants and tools to seamlessly interact with PDF files through a standardized protocol.

🌟 Open Source & Community Driven - Built with ❤️ by the community, for the community.

🚀 Features

🧠 Smart Content Analysis: Intelligent PDF content type detection (text, scanned images, mixed, or no content)
📋 Server Intelligence: New pdf_server_info tool provides comprehensive setup guidance and directory insights
📄 Enhanced PDF Processing: Read, validate, and extract text with automatic recommendations for next steps
🎯 Workflow Guidance: Context-aware suggestions on when to use asset extraction based on content analysis
🖼️ Visual Asset Extraction: Detect and extract images from PDFs with format identification
🔍 Smart Search: Find PDF files with fuzzy search capabilities
📊 Statistics: Get comprehensive directory and file statistics
🏗️ Structured Data Extraction: Extract content with positioning coordinates, formatting, and semantic relationships
📊 Table Detection: Intelligent table structure recognition and data extraction
🔍 Content Querying: Search and filter extracted content using flexible criteria
📋 Comprehensive Metadata: Extract document properties, page information, and custom metadata
🔄 Dual Mode Support:
- Stdio Mode: Standard MCP protocol for AI assistants (Zed, Claude Desktop, etc.)
- Server Mode: HTTP REST API with SSE transport for web integration
⚡ Production Ready: Comprehensive error handling, logging, and graceful shutdown
🧪 Well Tested: 65-76% test coverage with unit and integration tests
🛠️ Easy Integration: Simple installation and configuration

🎯 Use Cases

AI Code Editors: Integrate with Zed editor for PDF document analysis
Documentation Tools: Extract and analyze technical documentation with structure preservation
Research Assistants: Process academic papers and research documents with semantic understanding
Data Extraction: Extract structured data from forms, tables, and formatted documents
Content Management: Organize and search large PDF collections with intelligent querying
Web Applications: HTTP API for web-based PDF processing and analysis

📦 Installation

Direct Install (Fastest)

If you have Go installed, you can install directly:

# Install directly from GitHub
go install github.com/a3tai/mcp-pdf-reader/cmd/mcp-pdf-reader@latest

# Verify installation
mcp-pdf-reader --help

Quick Install (Recommended)

# Clone the repository
git clone https://github.com/a3tai/mcp-pdf-reader.git
cd mcp-pdf-reader

# Build and install using Go's standard install method
make install

# Ensure Go's bin directory is in your PATH (usually already is)
export PATH="$(go env GOPATH)/bin:$PATH"

# Verify installation
mcp-pdf-reader --help

Manual Build

# Build from source (creates local binary)
make build

# Or install Go dependencies and build locally
go mod tidy
go build -o mcp-pdf-reader cmd/mcp-pdf-reader/main.go

# Or install directly with Go (installs to GOPATH/bin)
go install github.com/a3tai/mcp-pdf-reader/cmd/mcp-pdf-reader@latest

System Requirements

Go 1.21+ for building from source
Linux, macOS, or Windows (tested on all platforms)

🖥️ Usage

MCP Protocol Mode (Default)

Perfect for AI assistants and editors like Zed:

# Use current directory for PDFs (default)
mcp-pdf-reader

# Specify PDF directory
mcp-pdf-reader --dir=/path/to/documents

# Debug mode
mcp-pdf-reader --dir=/path/to/documents --log-level=debug

HTTP Server Mode

For web applications and REST API access:

# Start HTTP server
mcp-pdf-reader --mode=server --dir=/path/to/documents

# Custom host and port
mcp-pdf-reader --mode=server --host=0.0.0.0 --port=9090 --dir=/docs

# Health check
curl http://localhost:8080/health

🔧 Configuration Options

Flag	Default	Description
`--mode`	`stdio`	Server mode: `stdio` or `server`
`--dir`	current directory	Directory containing PDF files
`--host`	`127.0.0.1`	Server host (server mode only)
`--port`	`8080`	Server port (server mode only)
`--log-level`	`info`	Log level: `debug`, `info`, `warn`, `error`
`--max-file-size`	`104857600`	Maximum PDF file size in bytes (100MB)

⚡ Quick Reference

Common Commands

# Basic usage (stdio mode for MCP clients) - uses current directory
mcp-pdf-reader

# Specify custom directory
mcp-pdf-reader --dir=/path/to/pdfs

# Server mode for testing/debugging
mcp-pdf-reader --mode=server --dir=./docs

# Custom port and host
mcp-pdf-reader --mode=server --host=0.0.0.0 --port=9090

# Debug mode
mcp-pdf-reader --mode=server --log-level=debug --dir=./docs

# Larger file size limit (200MB)
mcp-pdf-reader --max-file-size=209715200 --dir=./docs

# Environment variables (alternative to flags)
MCP_PDF_DIR=/path/to/pdfs mcp-pdf-reader
MCP_PDF_MODE=server MCP_PDF_PORT=9090 mcp-pdf-reader

Quick Setup for Popular Editors

Editor	Config File	Configuration
Zed	`~/.config/zed/settings.json`	`"mcp-pdf-reader": {"command": {"path": "mcp-pdf-reader", "args": []}}`
Cursor	`~/.cursor/settings.json`	`"mcp-pdf-reader": {"command": "mcp-pdf-reader", "args": ["--dir=${workspaceFolder}"]}`
Claude Desktop	`~/Library/Application Support/Claude/claude_desktop_config.json`	`"mcp-pdf-reader": {"command": "mcp-pdf-reader", "args": ["--dir=/path/to/docs"]}`
VS Code	`.vscode/settings.json`	`"claude.mcpServers": {"mcp-pdf-reader": {"command": "mcp-pdf-reader", "args": ["--dir=${workspaceFolder}"]}}`

Testing Your Setup

# 1. Verify installation
mcp-pdf-reader --help

# 2. Test with sample directory
mkdir -p ~/test-pdfs
mcp-pdf-reader -mode=server -pdfdir=~/test-pdfs

# 3. Check health endpoint (server mode)
curl http://localhost:8080/health

# 4. Test MCP tools
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | mcp-pdf-reader

📡 MCP Tools

The server provides comprehensive PDF analysis tools via the MCP protocol, including both basic extraction and advanced structured analysis:

`pdf_read_file`

Extract text content from a PDF file.

Parameters:

path (string): Full path to the PDF file

Example:

{
  "path": "/home/user/documents/research.pdf"
}

`pdf_assets_file`

Extract visual assets like images from a PDF file.

Parameters:

path (string): Full path to the PDF file

Example:

{
  "path": "/home/user/documents/presentation.pdf"
}

`pdf_validate_file`

Validate if a file is a readable PDF.

Parameters:

path (string): Full path to the PDF file

Example:

{
  "path": "/home/user/documents/document.pdf"
}

`pdf_stats_file`

Get detailed statistics about a PDF file including metadata.

Parameters:

path (string): Full path to the PDF file

Example:

{
  "path": "/home/user/documents/report.pdf"
}

`pdf_search_directory`

List and search PDF files in a directory with optional fuzzy search.

Parameters:

directory (string): Directory path to search
query (string): Optional fuzzy search query

Example:

{
  "directory": "/home/user/documents",
  "query": "machine learning"
}

`pdf_stats_directory`

Get statistics about PDF files in a directory.

Parameters:

directory (string): Directory path to analyze

Example:

{
  "directory": "/home/user/documents"
}

`pdf_extract_structured`

Extract structured content with positioning coordinates and formatting information.

Parameters:

path (string): Full path to the PDF file
mode (string): Extraction mode - "raw", "structured", "semantic", "table", or "complete" (default: "structured")
config (object): Configuration options
- extract_text (bool): Extract text content
- extract_images (bool): Extract images
- extract_tables (bool): Extract tables
- extract_forms (bool): Extract form fields
- extract_annotations (bool): Extract annotations
- include_coordinates (bool): Include positioning coordinates
- include_formatting (bool): Include formatting information
- pages (array): Specific pages to extract (default: all)
- min_confidence (number): Minimum confidence threshold

Example:

{
  "path": "/home/user/documents/form.pdf",
  "mode": "structured",
  "config": {
    "extract_text": true,
    "include_coordinates": true,
    "include_formatting": true,
    "pages": [1, 2, 3]
  }
}

`pdf_extract_tables`

Extract tabular data from PDF with structure preservation and cell-level analysis.

Parameters:

path (string): Full path to the PDF file
config (object): Configuration options
- include_coordinates (bool): Include positioning coordinates
- pages (array): Specific pages to extract (default: all)
- min_confidence (number): Minimum confidence threshold

Example:

{
  "path": "/home/user/documents/spreadsheet.pdf",
  "config": {
    "include_coordinates": true,
    "min_confidence": 0.7
  }
}

`pdf_extract_semantic`

Extract content with semantic grouping and relationship detection.

Parameters:

path (string): Full path to the PDF file
config (object): Configuration options
- include_coordinates (bool): Include positioning coordinates
- include_formatting (bool): Include formatting information
- pages (array): Specific pages to extract (default: all)
- min_confidence (number): Minimum confidence threshold

Example:

{
  "path": "/home/user/documents/document.pdf",
  "config": {
    "include_coordinates": true,
    "include_formatting": true
  }
}

`pdf_extract_complete`

Comprehensive extraction of all content types (text, images, tables, forms, annotations).

Parameters:

path (string): Full path to the PDF file
config (object): Configuration options
- pages (array): Specific pages to extract (default: all)
- min_confidence (number): Minimum confidence threshold

Example:

{
  "path": "/home/user/documents/complex.pdf",
  "config": {
    "pages": [1, 2, 3],
    "min_confidence": 0.8
  }
}

`pdf_query_content`

Query and filter extracted PDF content using flexible search criteria.

Parameters:

path (string): Full path to the PDF file
query (object): Query criteria for filtering content
- content_types (array): Content types to filter ("text", "image", "table", "form", "annotation")
- pages (array): Pages to search
- text_query (string): Text search query
- min_confidence (number): Minimum confidence threshold
- bounding_box (object): Spatial filter area
  - x (number): X coordinate
  - y (number): Y coordinate
  - width (number): Width
  - height (number): Height

Example:

{
  "path": "/home/user/documents/report.pdf",
  "query": {
    "content_types": ["text", "table"],
    "text_query": "revenue",
    "pages": [1, 2, 3],
    "min_confidence": 0.7
  }
}

`pdf_get_page_info`

Get detailed information about PDF pages including dimensions, layout, and properties.

Parameters:

path (string): Full path to the PDF file

Example:

{
  "path": "/home/user/documents/document.pdf"
}

`pdf_get_metadata`

Extract comprehensive document metadata and properties.

Parameters:

path (string): Full path to the PDF file

Example:

{
  "path": "/home/user/documents/document.pdf"
}

🔥 Enhanced Features

Smart Content Analysis

The PDF reader now provides intelligent content type detection and recommendations:

`pdf_server_info` - Get Started Faster

A new tool that provides comprehensive server information and usage guidance.

What it provides:

📋 Server capabilities and configuration
📁 Current directory contents (PDF files found)
🛠️ Complete list of available tools with usage guidance
📖 Step-by-step workflow recommendations
🖼️ Supported image formats for asset extraction

Usage:

{
  "name": "pdf_server_info",
  "arguments": {}
}

Why use it: Start here to understand what PDFs are available and how to best analyze them.

Enhanced PDF Reading with Content Intelligence

The pdf_read_file tool now provides smart content analysis:

Content Type Detection:

📝 text - PDF contains readable text content
🖼️ scanned_images - PDF contains scanned images with minimal text
🔀 mixed - PDF contains both text and images
❌ no_content - PDF appears empty or unreadable

Smart Recommendations:

✅ Automatic guidance on whether to use pdf_assets_file
📊 Image count detection - know if images are present before extraction
🎯 Next step suggestions based on content type

Enhanced Response Format:

Successfully read PDF: /path/to/document.pdf
Pages: 15
Size: 2458392 bytes
Content Type: mixed
Has Images: true
Image Count: 8

💡 INFO: This PDF contains both text and images. You may want to use 'pdf_assets_file' to extract the images as well.

Content:
[extracted text content...]

Intelligent Workflow Guidance

The system now provides contextual recommendations:

For text-based PDFs: Content is ready to use, no further action needed
For scanned documents: Recommends using pdf_assets_file to extract images
For mixed content: Suggests optional image extraction based on your needs
For problematic files: Provides specific troubleshooting guidance

Better Error Handling and User Experience

🔍 Proactive validation - tools suggest when files might not be readable
📋 Rich context - understand your PDF directory contents upfront
🎯 Targeted recommendations - know which tools to use when
📖 Comprehensive guidance - built-in usage instructions and examples

🎨 Integration Examples

🎯 Zed Editor

Add to your Zed settings (~/.config/zed/settings.json):

{
  "context_servers": {
    "mcp-pdf-reader": {
      "command": {
        "path": "mcp-pdf-reader",
        "args": ["-pdfdir=${workspaceFolder}"],
        "env": null
      },
      "settings": {}
    }
  }
}

Project-specific Zed configuration (.zed/settings.json in your project):

{
  "context_servers": {
    "mcp-pdf-reader": {
      "command": {
        "path": "mcp-pdf-reader",
        "args": ["-pdfdir=./docs"],
        "env": null
      },
      "settings": {}
    }
  }
}

🎯 Cursor IDE

Add to your Cursor settings (~/.cursor/settings.json):

{
  "mcpServers": {
    "mcp-pdf-reader": {
      "command": "mcp-pdf-reader",
      "args": ["-pdfdir", "${workspaceFolder}"],
      "env": {}
    }
  }
}

For specific PDF directories:

{
  "mcpServers": {
    "mcp-pdf-reader": {
      "command": "mcp-pdf-reader",
      "args": ["-pdfdir", "/path/to/your/documents"],
      "env": {}
    }
  }
}

🎯 Windsurf

Add to your Windsurf configuration (~/.windsurf/settings.json):

{
  "mcp": {
    "servers": {
      "mcp-pdf-reader": {
        "command": "mcp-pdf-reader",
        "args": ["-pdfdir", "${workspaceRoot}"],
        "env": {}
      }
    }
  }
}

Project-specific Windsurf config (.windsurf/settings.json):

{
  "mcp": {
    "servers": {
      "mcp-pdf-reader": {
        "command": "mcp-pdf-reader",
        "args": ["-pdfdir", "./documentation"],
        "env": {}
      }
    }
  }
}

🎯 Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%\Claude\claude_desktop_config.json on Windows):

{
  "mcpServers": {
    "mcp-pdf-reader": {
      "command": "mcp-pdf-reader",
      "args": ["-pdfdir", "/path/to/your/documents"]
    }
  }
}

For multiple document directories:

{
  "mcpServers": {
    "mcp-pdf-reader-docs": {
      "command": "mcp-pdf-reader",
      "args": ["-pdfdir", "/Users/yourname/Documents"]
    },
    "mcp-pdf-reader-research": {
      "command": "mcp-pdf-reader",
      "args": ["-pdfdir", "/Users/yourname/Research/papers"]
    }
  }
}

🎯 Claude Code (VS Code Extension)

Add to your VS Code settings (settings.json):

{
  "claude.mcpServers": {
    "mcp-pdf-reader": {
      "command": "mcp-pdf-reader",
      "args": ["-pdfdir", "${workspaceFolder}"],
      "env": {}
    }
  }
}

Workspace-specific settings (.vscode/settings.json):

{
  "claude.mcpServers": {
    "mcp-pdf-reader": {
      "command": "mcp-pdf-reader",
      "args": ["-pdfdir", "./docs"],
      "env": {}
    }
  }
}

🎯 Roo Code

Add to your Roo configuration (~/.roo/config.json):

{
  "mcpServers": {
    "mcp-pdf-reader": {
      "command": "mcp-pdf-reader",
      "args": ["-pdfdir", "{{workspace}}"],
      "cwd": "{{workspace}}"
    }
  }
}

For specific directories:

{
  "mcpServers": {
    "mcp-pdf-reader": {
      "command": "mcp-pdf-reader",
      "args": ["-pdfdir", "/path/to/pdfs"],
      "cwd": "/path/to/pdfs"
    }
  }
}

🎯 Cline (VS Code Extension)

Add to your Cline settings in VS Code (settings.json):

{
  "cline.mcpServers": {
    "mcp-pdf-reader": {
      "command": "mcp-pdf-reader",
      "args": ["-pdfdir", "${workspaceFolder}/docs"],
      "env": {}
    }
  }
}

Global Cline configuration:

{
  "cline.mcpServers": {
    "mcp-pdf-reader": {
      "command": "mcp-pdf-reader",
      "args": ["-pdfdir", "${env:HOME}/Documents"],
      "env": {}
    }
  }
}

📁 Common Configuration Patterns

Use Current Project Directory

# Most editors support workspace variables
-pdfdir=${workspaceFolder}      # Zed, VS Code-based
-pdfdir=${workspaceRoot}        # Windsurf
-pdfdir={{workspace}}           # Roo

Use Specific Subdirectory

# For documentation in your project
-pdfdir=./docs
-pdfdir=./documentation
-pdfdir=./papers

Use Home Directory

# For personal document collections
-pdfdir=${env:HOME}/Documents
-pdfdir=/Users/yourname/Documents      # macOS
-pdfdir=/home/yourname/Documents       # Linux
-pdfdir=C:\Users\yourname\Documents    # Windows

Multiple Instances

You can run multiple instances for different directories:

{
  "context_servers": {
    "mcp-pdf-reader-docs": {
      "command": {
        "path": "mcp-pdf-reader",
        "args": ["-pdfdir=./docs", "-port=8080"]
      }
    },
    "mcp-pdf-reader-research": {
      "command": {
        "path": "mcp-pdf-reader",
        "args": ["-pdfdir=/path/to/research", "-port=8081"]
      }
    }
  }
}

🚀 Quick Setup Tips

After Installation: The mcp-pdf-reader binary will be globally available if $(go env GOPATH)/bin is in your PATH (default with Go installations).
Verify Installation: Run mcp-pdf-reader --help to ensure it's working.
Test Configuration: Start with stdio mode (default) for MCP clients, use server mode for debugging.
Path Variables: Most editors support workspace variables - use them for portable configurations.
Multiple Directories: Create separate MCP server instances for different PDF collections.

🔧 Troubleshooting

Installation Issues

❌ Command not found: `mcp-pdf-reader`

Problem: After installation, the binary is not found in PATH.

Solutions:

# Check if Go's bin directory is in your PATH
echo $PATH | grep $(go env GOPATH)/bin

# If not found, add to your shell profile
echo 'export PATH="$(go env GOPATH)/bin:$PATH"' >> ~/.bashrc  # Linux/WSL
echo 'export PATH="$(go env GOPATH)/bin:$PATH"' >> ~/.zshrc   # macOS (if using zsh)

# Reload your shell
source ~/.bashrc  # or ~/.zshrc

❌ Permission denied during installation

Problem: Installation fails with permission errors.

Solutions:

# Don't use sudo with go install - it should install to your user directory
go install github.com/a3tai/mcp-pdf-reader/cmd/mcp-pdf-reader@latest

# If still having issues, check your GOPATH
go env GOPATH
go env GOBIN

❌ Module not found or build errors

Problem: Build fails with module or dependency errors.

Solutions:

# Clean module cache and retry
go clean -modcache
go install github.com/a3tai/mcp-pdf-reader/cmd/mcp-pdf-reader@latest

# Or build from source
git clone https://github.com/a3tai/mcp-pdf-reader.git
cd mcp-pdf-reader
go mod tidy
make install

Configuration Issues

❌ MCP server not connecting in editors

Problem: Editor can't connect to the MCP server.

Solutions:

Verify binary is accessible:

which mcp-pdf-reader
mcp-pdf-reader --help

Test in stdio mode:

echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' | mcp-pdf-reader

Check editor-specific config location:
- Zed: ~/.config/zed/settings.json
- Cursor: ~/.cursor/settings.json
- Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
- VS Code: .vscode/settings.json (workspace) or user settings

❌ "Directory does not exist" errors

Problem: PDF directory path is invalid.

Solutions:

# Use absolute paths
"args": ["-pdfdir=/home/user/Documents"]

# Or verify workspace variables work in your editor
"args": ["-pdfdir=${workspaceFolder}/docs"]

# Create the directory if it doesn't exist
mkdir -p ~/Documents/pdfs

❌ "No PDF files found" but files exist

Problem: Server can't find PDFs in the specified directory.

Solutions:

Check file extensions (must be .pdf):
```
ls -la /path/to/pdfs/*.pdf
```

Test directory access:

mcp-pdf-reader -mode=server -pdfdir=/path/to/pdfs
# Then visit http://localhost:8080/health

Check permissions:

ls -la /path/to/pdfs/
# Ensure read permissions on directory and files

Runtime Issues

❌ Server crashes or exits immediately

Problem: MCP server terminates unexpectedly.

Solutions:

Run in server mode for debugging:

mcp-pdf-reader -mode=server -pdfdir=./docs -loglevel=debug

Check for port conflicts (server mode):

lsof -i :8080  # Check if port 8080 is in use
mcp-pdf-reader -mode=server -port=8081  # Try different port

Verify PDF directory permissions:

# Test with a simple directory
mkdir -p ~/test-pdfs
mcp-pdf-reader -mode=server -pdfdir=~/test-pdfs

❌ Large PDF files cause errors

Problem: "File too large" or memory errors.

Solutions:

# Increase file size limit (default: 100MB)
mcp-pdf-reader -maxfilesize=209715200  # 200MB

# Check file sizes
ls -lh /path/to/pdfs/*.pdf

❌ PDF text extraction fails

Problem: PDF content appears empty or garbled.

Solutions:

Test with different PDFs (some PDFs may be image-only or encrypted)

Use validation tool:

mcp-pdf-reader -mode=server -pdfdir=./docs
# Then test with the validate_pdf tool

Editor-Specific Issues

🎯 Zed Editor

Restart Zed after config changes
Check Zed's output panel for MCP errors
Use absolute paths if workspace variables don't work

🎯 Cursor IDE

Restart Cursor after configuration changes
Check the "Output" tab for MCP-related logs
Ensure the MCP extension is enabled

🎯 Claude Desktop

Restart Claude Desktop after config changes
Check ~/Library/Logs/Claude/ for error logs (macOS)
Verify JSON syntax in config file

🎯 VS Code Extensions

Check extension logs in the "Output" panel
Verify the extension supports MCP servers
Try disabling/re-enabling the extension

Getting Help

If you're still having issues:

Check the server health (server mode):
```
curl http://localhost:8080/health
```

Enable debug logging:

mcp-pdf-reader -mode=server -loglevel=debug -pdfdir=./docs

Create a minimal test case:

mkdir test-mcp
cd test-mcp
echo "Test content" > test.pdf  # Not a real PDF, but tests basic functionality
mcp-pdf-reader -mode=server -pdfdir=.

Open an issue on GitHub with:
- Your operating system
- Go version (go version)
- Editor/tool being used
- Complete error messages
- Configuration file contents

🧪 Development

Building and Testing

# Install dependencies
make deps

# Run tests
make test

# Run tests with coverage
make test-coverage

# Build for development
make build

# Run development server
make run

# Run in server mode
make run-server

Code Quality

# Format code
make fmt

# Run linter (requires golangci-lint)
make lint

# Cross-compile for all platforms
make build-all

Project Structure

mcp-pdf-reader/
├── cmd/mcp-pdf-reader/     # Main application entry point
├── internal/
│   ├── config/             # Configuration management
│   ├── mcp/               # MCP server implementation
│   └── pdf/               # PDF processing logic
├── Makefile               # Build and development commands
├── go.mod                 # Go module definition
└── README.md             # This file

🌐 API Reference (Server Mode)

Health Check

GET /health

Returns server health status and version information.

MCP Endpoints

GET /sse                   # Server-Sent Events endpoint
POST /message              # MCP message endpoint

🤝 Contributing

We love contributions! This is an open source project and we welcome contributions from everyone. Whether you're fixing bugs, adding features, improving documentation, or helping with tests - every contribution matters.

How to Contribute

🍴 Fork the repository on GitHub
🌿 Create a feature branch: git checkout -b feature/amazing-feature
✨ Make your changes and add comprehensive tests
🧪 Run the test suite: make test (ensure all tests pass)
🎨 Format your code: make fmt
📝 Update documentation if needed
🚀 Submit a pull request with a clear description

Ways to Contribute

🐛 Bug Reports: Found a bug? Open an issue with reproduction steps
💡 Feature Requests: Have an idea? We'd love to hear it!
📖 Documentation: Help improve our docs and examples
🧪 Testing: Add tests or improve existing ones
🔧 Code: Fix bugs or implement new features
🌍 Translation: Help make this accessible to more people

Development Guidelines

Write clear, documented code
Add tests for new functionality
Follow Go best practices and idioms
Keep pull requests focused and atomic
Be respectful and constructive in discussions

📊 Performance

Memory Efficient: Streaming PDF processing with configurable limits
Fast Search: Optimized file system traversal and indexing
Concurrent Safe: Handle multiple requests simultaneously
Resource Limits: Configurable file size limits and timeouts

🔒 Security

Input Validation: Comprehensive validation of all inputs
Path Sanitization: Prevents directory traversal attacks
File Size Limits: Configurable limits to prevent resource exhaustion
Secure Defaults: Safe configuration out of the box
Automated Security Scanning: Continuous security analysis with gosec

Security Scanning

This project uses gosec for automated security scanning of Go code. Security scans are automatically run on every pull request and release.

Running Security Scans Locally

# Install gosec
go install github.com/securego/gosec/v2/cmd/gosec@latest

# Run security scan
make gosec

# Or run directly with gosec
gosec -conf .gosec.json ./...

Security Configuration

Security scanning is configured via .gosec.json with:

Customized rules for Go security best practices
Exclusions for test files and false positives
Integration with GitHub Security tab via SARIF reports

📄 License

This project is licensed under the MIT License - see the file for details.

🌟 Open Source Community

This project is proudly open source and maintained by contributors from around the world. We believe in the power of community-driven development to create better tools for everyone.

Join Our Community

💬 Discussions: Share ideas and get help in GitHub Discussions
🐛 Issues: Report bugs or request features in GitHub Issues
🎉 Contributors: Check out our amazing contributors

Project Values

🔓 Open: Transparent development and decision-making
🤝 Inclusive: Welcoming to all contributors regardless of experience level
🚀 Quality: Maintaining high standards through testing and code review
📖 Documentation: Keeping documentation up-to-date and comprehensive

🏢 About Rude Company LLC

Rude Company LLC is building innovative AI-powered development tools and open source solutions. We create intelligent systems that enhance developer productivity and enable seamless human-AI collaboration.

A3T is brought to you by Rude Company LLC and focuses on AI development tools and automation.

Website: https://rude.la
A3T Project GitHub: https://github.com/a3tai

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Support: For support, please use GitHub Issues

Built with ❤️ by Rude Company LLC.

a3tai/mcp-pdf-reader

MCP PDF Reader

🚀 Features

🎯 Use Cases

📦 Installation

Direct Install (Fastest)

Quick Install (Recommended)

Manual Build

System Requirements

🖥️ Usage

MCP Protocol Mode (Default)

HTTP Server Mode

🔧 Configuration Options

⚡ Quick Reference

Common Commands

Quick Setup for Popular Editors

Testing Your Setup

📡 MCP Tools

pdf_read_file

pdf_assets_file

pdf_validate_file

pdf_stats_file

pdf_search_directory

pdf_stats_directory

pdf_extract_structured

pdf_extract_tables

pdf_extract_semantic

pdf_extract_complete

pdf_query_content

pdf_get_page_info

pdf_get_metadata

🔥 Enhanced Features

Smart Content Analysis

pdf_server_info - Get Started Faster

Enhanced PDF Reading with Content Intelligence

Intelligent Workflow Guidance

Better Error Handling and User Experience

🎨 Integration Examples

🎯 Zed Editor

🎯 Cursor IDE

🎯 Windsurf

🎯 Claude Desktop

🎯 Claude Code (VS Code Extension)

🎯 Roo Code

🎯 Cline (VS Code Extension)

📁 Common Configuration Patterns

Use Current Project Directory

Use Specific Subdirectory

Use Home Directory

Multiple Instances

🚀 Quick Setup Tips

🔧 Troubleshooting

Installation Issues

❌ Command not found: mcp-pdf-reader

❌ Permission denied during installation

❌ Module not found or build errors

Configuration Issues

❌ MCP server not connecting in editors

❌ "Directory does not exist" errors

❌ "No PDF files found" but files exist

Runtime Issues

❌ Server crashes or exits immediately

❌ Large PDF files cause errors

❌ PDF text extraction fails

Editor-Specific Issues

🎯 Zed Editor

🎯 Cursor IDE

🎯 Claude Desktop

🎯 VS Code Extensions

Getting Help

🧪 Development

Building and Testing

Code Quality

Project Structure

🌐 API Reference (Server Mode)

Health Check

MCP Endpoints

🤝 Contributing

How to Contribute

Ways to Contribute

`pdf_read_file`

`pdf_assets_file`

`pdf_validate_file`

`pdf_stats_file`

`pdf_search_directory`

`pdf_stats_directory`

`pdf_extract_structured`

`pdf_extract_tables`

`pdf_extract_semantic`

`pdf_extract_complete`

`pdf_query_content`

`pdf_get_page_info`

`pdf_get_metadata`

`pdf_server_info` - Get Started Faster

❌ Command not found: `mcp-pdf-reader`