corneyc/documents-mcp
If you are the rightful owner of documents-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
Documents MCP is a Model Context Protocol server designed to integrate local document management with cloud synchronization, specifically for AI agent interaction.
Documents MCP Server
Advanced Model Context Protocol Document Management System
Table of Contents
- Problem Statement
- Solution Overview
- Use Case Scenarios
- System Architecture
- Design Process
- Implementation
- Features
- Installation
- Configuration
- Usage
- API Reference
- Development
- Testing
- Deployment
- Contributing
- License
Problem Statement
The Challenge
Modern knowledge workers face significant inefficiencies in document management:
- Fragmented Ecosystems: Documents scattered across local storage, cloud platforms, and collaborative tools
- Limited AI Integration: Existing document systems don't provide seamless AI agent access
- Manual Sync Overhead: Constant manual synchronization between local work and cloud storage
- Context Loss: AI assistants can't access local documents for contextual assistance
- Version Control Issues: Difficulty tracking changes across multiple platforms
Market Gap
Traditional document management solutions fall short in the emerging AI-first workflow era:
- No Protocol Standardization: Lack of standardized protocols for AI-document interaction
- Platform Lock-in: Vendor-specific solutions that don't interoperate
- Limited Real-time Capabilities: Insufficient support for real-time AI collaboration
- Security Concerns: Inadequate access control for sensitive local documents
Solution Overview
Documents MCP is a production-ready Model Context Protocol server that bridges local document management with cloud synchronization, specifically designed for AI agent integration.
Core Value Proposition
Unified Interface: Single protocol for AI agents to access both local and cloud documents
Real-time Sync: Bidirectional synchronization with conflict resolution
Enterprise Security: Path validation, access control, and audit trails
Protocol Compliance: Full MCP specification implementation
Rich Metadata: Comprehensive file information and sync status tracking
Use Case Scenarios
Scenario 1: AI-Powered Content Creation
Actor: Content Creator
Goal: Leverage AI assistance while maintaining local document control
Workflow:
- Writer maintains drafts locally for version control
- AI agent (Claude Desktop) accesses documents via MCP protocol
- Real-time suggestions and edits are synchronized to Notion
- Team collaboration occurs through cloud interface
- Final versions sync back to local storage
Benefits:
- Maintains local control while enabling cloud collaboration
- AI assistant has full context of writing projects
- Seamless version tracking across platforms
Scenario 2: Technical Documentation Management
Actor: Software Development Team
Goal: Maintain synchronized technical documentation across environments
Workflow:
- Developers write documentation locally alongside code
- MCP server automatically syncs to shared Notion workspace
- AI agents help maintain consistency and completeness
- Non-technical stakeholders access via Notion interface
- Changes propagate bidirectionally with conflict resolution
Benefits:
- Documentation stays current with codebase
- Reduces documentation maintenance overhead
- Enables AI-assisted technical writing
Scenario 3: Research Data Organization
Actor: Academic Researcher
Goal: Organize and analyze research documents with AI assistance
Workflow:
- Research files stored locally for security and offline access
- AI agent helps categorize and analyze document content
- Structured metadata synced to Notion for team visibility
- Search and discovery enhanced through AI understanding
- Publication-ready documents maintained locally
Benefits:
- Sensitive research data remains local
- AI-enhanced document analysis and organization
- Team collaboration without compromising data security
System Architecture
High-Level Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ AI Agents │ │ Documents MCP │ │ Cloud Storage │
│ (Claude, etc.) │◄──►│ Server │◄──►│ (Notion) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────────┐
│ Local File System│
└──────────────────┘
Component Architecture
Documents MCP Server
├── MCP Protocol Layer
│ ├── JSON-RPC Handler
│ ├── Tool Registration
│ └── Request/Response Processing
├── Document Management
│ ├── File System Operations
│ ├── Metadata Extraction
│ ├── Search & Indexing
│ └── Security & Validation
├── Sync Engine
│ ├── Notion API Integration
│ ├── Conflict Resolution
│ ├── Status Tracking
│ └── Bidirectional Sync
└── Configuration & Utilities
├── Environment Management
├── Error Handling
└── Logging & Monitoring
Design Process
Phase 1: Requirements Analysis
Stakeholder Research:
- AI application developers needing document access
- Knowledge workers seeking AI-enhanced workflows
- Development teams requiring synchronized documentation
Technical Requirements:
- MCP protocol compliance for AI agent compatibility
- Secure local file system access with path validation
- Cloud synchronization with conflict resolution
- Rich metadata and search capabilities
Phase 2: Protocol Selection
Decision: Model Context Protocol (MCP)
- Rationale: Emerging standard for AI-system integration
- Benefits: Future-proofing, standardization, ecosystem compatibility
- Trade-offs: Early adoption risks, limited tooling
Alternative Considered: REST API
- Rejected: Less suitable for AI agent integration patterns
Phase 3: Architecture Design
Modular Architecture Principles:
- Separation of Concerns: Distinct layers for protocol, documents, and sync
- Extensibility: Plugin architecture for additional cloud providers
- Testability: Isolated components for unit testing
- Security: Defense-in-depth with multiple validation layers
Phase 4: Technology Stack Selection
Core Technologies:
- TypeScript: Type safety, developer experience, ecosystem maturity
- Node.js: JavaScript ecosystem, npm packages, deployment flexibility
- Notion API: Rich collaboration features, extensive API capabilities
Development Tools:
- tsx: TypeScript execution for development and testing
- dotenv: Environment configuration management
- MCP SDK: Official protocol implementation libraries
Implementation
Development Stages
Stage 1: Core MCP Protocol Implementation
Objective: Establish basic MCP server functionality
Implementation:
- JSON-RPC request/response handling
- Tool registration and discovery
- Basic document operations (list, read, write)
- Protocol compliance testing
Challenges Addressed:
- MCP specification interpretation
- TypeScript type definitions for protocol
- Request validation and error handling
Stage 2: Enhanced Document Operations
Objective: Add production-ready file management capabilities
Implementation:
- Rich metadata extraction (size, dates, file types)
- Path security and validation
- File type detection and filtering
- Search functionality (name and content)
- Directory operations and traversal
Security Measures:
- Path traversal attack prevention
- File size limits and validation
- Type checking and sanitization
Stage 3: Cloud Synchronization
Objective: Implement bidirectional Notion sync with conflict resolution
Implementation:
- Notion API integration and authentication
- Database schema design and management
- Sync status tracking and monitoring
- Conflict detection and resolution algorithms
- Error handling and retry logic
Synchronization Logic:
- Compare local vs. cloud modification timestamps
- Detect and flag conflicts for user resolution
- Maintain sync history and audit trails
Code Quality and Testing
TypeScript Implementation:
- Strict type checking enabled
- Comprehensive interface definitions
- Generic types for reusability
- Proper error type handling
Testing Strategy:
- Manual JSON-RPC protocol testing
- Integration testing with Notion API
- File system operation validation
- Error condition testing
Features
Core Document Operations
- List Documents: Recursive directory scanning with metadata
- Read Documents: Content extraction with encoding detection
- Write Documents: Atomic operations with backup support
- Search Documents: Full-text search across name and content
Synchronization Capabilities
- Notion Integration: Bidirectional sync with rich metadata
- Conflict Resolution: Intelligent handling of concurrent changes
- Sync Status: Real-time monitoring of synchronization state
- Timestamp Tracking: Modification time comparison and validation
Security & Performance
- Path Validation: Prevention of directory traversal attacks
- File Size Limits: Protection against resource exhaustion
- Type Detection: Safe handling of different file formats
- Efficient Operations: Optimized for large document collections
Developer Experience
- Environment Configuration: Flexible setup via environment variables
- Rich Error Messages: Comprehensive error reporting and debugging
- TypeScript Support: Full type safety and IntelliSense
- Protocol Compliance: Adherence to MCP specifications
Installation
Prerequisites
- Node.js 18+
- npm or yarn package manager
- TypeScript 5.0+
- Active Notion account (for cloud sync features)
Quick Start
# Clone the repository
git clone https://github.com/corneyc/documents-mcp.git
cd documents-mcp
# Install dependencies
npm install
# Configure environment
cp .env.example .env
# Edit .env with your configuration
# Build the project
npm run build
# Start the MCP server
npm run mcp
Docker Installation
# Build Docker image
docker build -t documents-mcp .
# Run container
docker run -v $(pwd)/documents:/app/documents \
-e NOTION_TOKEN=your_token \
-e NOTION_DATABASE_ID=your_db_id \
documents-mcp
Configuration
Environment Variables
Create a .env file in the project root:
# Notion Integration (Required for sync features)
NOTION_TOKEN=secret_your_notion_integration_token
NOTION_DATABASE_ID=your_database_uuid
# Document Storage (Optional)
DOCUMENTS_ROOT=/path/to/documents
# Server Configuration (Optional)
LOG_LEVEL=info
MAX_FILE_SIZE=10485760 # 10MB default
Notion Setup
-
Create Integration:
- Visit https://www.notion.so/my-integrations
- Create new integration named "Documents MCP Sync"
- Copy the integration token
-
Create Database:
- Create new database in Notion
- Add required properties:
- Title (title type)
- Local Path (text type)
- File Size (number type)
- Last Modified (date type)
- Status (select type)
-
Share Database:
- Click Share on your database
- Add your integration
- Copy database ID from URL
Usage
Claude Desktop Integration
Configure Claude Desktop to use your MCP server:
// ~/.claude-desktop/claude_desktop_config.json
{
"mcpServers": {
"documents-mcp": {
"command": "/opt/homebrew/bin/npm",
"args": ["run", "mcp"],
"cwd": "/path/to/documents-mcp"
}
}
}
Manual Testing
Test server functionality with JSON-RPC calls:
# List documents
echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {"name": "list_documents", "arguments": {}}}' | npm run mcp
# Read document
echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {"name": "read_document", "arguments": {"key": "example.md"}}}' | npm run mcp
# Sync to Notion
echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {"name": "sync_to_notion", "arguments": {"path": "example.md"}}}' | npm run mcp
API Reference
MCP Tools
list_documents
Lists all documents with metadata filtering options.
Parameters:
includeDirectories(boolean): Include directories in resultstextFilesOnly(boolean): Filter to text files onlymaxSize(number): Maximum file size in bytes
Response: Array of file metadata objects
read_document
Reads document content with metadata.
Parameters:
key(string): Document path to read
Response: Document content and metadata
write_document
Writes content to document with options.
Parameters:
key(string): Document path to writecontent(string): Content to writecreateBackup(boolean): Create backup before overwriteoverwrite(boolean): Allow overwriting existing files
Response: Write operation result and metadata
sync_to_notion
Synchronizes local document to Notion.
Parameters:
path(string): Local document path to sync
Response: Sync operation result and Notion page ID
get_sync_status
Retrieves sync status for all documents.
Parameters: None
Response: Array of sync status objects
search_documents
Searches documents by name or content.
Parameters:
query(string): Search querysearchContent(boolean): Search file contentcaseSensitive(boolean): Case-sensitive searchfilePattern(string): File name regex pattern
Response: Array of matching documents
Development
Project Structure
documents-mcp/
├── src/
│ ├── mcp-server.ts # Main MCP server implementation
│ ├── utils/
│ │ ├── local-documents-enhanced.ts # Enhanced file operations
│ │ ├── local-documents.ts # Basic file operations
│ │ └── notion-sync.ts # Notion integration
│ └── types.ts # TypeScript definitions
├── documents/ # Default document storage
├── tests/ # Test files
├── package.json # Node.js configuration
├── tsconfig.json # TypeScript configuration
├── wrangler.toml # Cloudflare Workers config
└── README.md # Project documentation
Development Commands
# Install dependencies
npm install
# Start development server
npm run dev
# Run MCP server
npm run mcp
# Run tests
npm test
# Type checking
npm run type-check
# Build for production
npm run build
Adding New Features
- Implement Tool Logic: Add function to appropriate utility module
- Register Tool: Add tool definition to MCP server
- Add Handler: Implement tool handler in CallToolRequestSchema
- Update Types: Add TypeScript interfaces
- Test Integration: Verify with JSON-RPC calls
Testing
Manual Testing
The project includes comprehensive manual testing procedures:
# Test basic protocol functionality
./scripts/test-protocol.sh
# Test document operations
./scripts/test-documents.sh
# Test Notion integration
./scripts/test-notion.sh
# Test error handling
./scripts/test-errors.sh
Integration Testing
Test with actual AI agents:
- Configure Claude Desktop with MCP server
- Test document listing and reading
- Verify Notion synchronization
- Test error handling and recovery
Performance Testing
# Test with large document collections
./scripts/test-performance.sh
# Monitor memory usage
npm run monitor
# Benchmark sync operations
npm run benchmark
Deployment
Local Development
# Start server locally
npm run mcp
# Run in development mode with auto-restart
npm run dev
Production Deployment
Docker Deployment
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY src/ ./src/
COPY tsconfig.json ./
RUN npm run build
EXPOSE 3000
CMD ["npm", "run", "mcp"]
Cloudflare Workers
Deploy the companion web API to Cloudflare Workers:
# Deploy to Cloudflare
npm run deploy
# Deploy to specific environment
npm run deploy:staging
Environment Setup
Production environment configuration:
NODE_ENV=production
NOTION_TOKEN=secret_production_token
NOTION_DATABASE_ID=production_database_id
DOCUMENTS_ROOT=/app/documents
LOG_LEVEL=info
Performance Considerations
Optimization Strategies
- Lazy Loading: Documents loaded only when accessed
- Caching: In-memory caching of frequently accessed files
- Batch Operations: Efficient bulk synchronization
- Connection Pooling: Optimized Notion API connections
Scaling Recommendations
- Horizontal Scaling: Multiple MCP server instances
- Load Balancing: Distribute document operations
- Database Sharding: Separate Notion databases by team/project
- CDN Integration: Cache static document content
Contributing
Development Process
- Fork Repository: Create personal fork of the project
- Feature Branch: Create branch for new feature or fix
- Implementation: Develop with comprehensive testing
- Documentation: Update documentation for changes
- Pull Request: Submit PR with detailed description
Code Standards
- TypeScript: Strict mode enabled with comprehensive typing
- ESLint: Code quality and consistency enforcement
- Prettier: Automated code formatting
- Conventional Commits: Standardized commit messages
Testing Requirements
- Unit Tests: All utility functions must have unit tests
- Integration Tests: MCP protocol compliance testing
- Documentation: All public APIs must be documented
License
MIT License - see file for details.
Acknowledgments
- Anthropic: MCP protocol specification and SDK
- Notion: Comprehensive API and developer documentation
- TypeScript Team: Excellent tooling and type system
- Node.js Community: Rich ecosystem and package availability
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Project Wiki
Roadmap
Short Term (Q1 2025)
- Google Drive integration
- Enhanced search with full-text indexing
- Automated testing suite
- Performance monitoring dashboard
Medium Term (Q2 2025)
- Multi-cloud sync support
- Real-time collaboration features
- Advanced conflict resolution UI
- Plugin architecture for extensions
Long Term (Q3-Q4 2025)
- Enterprise authentication integration
- Advanced analytics and reporting
- Mobile app companion
- AI-powered document insights
Built with love for the AI-powered future of document management