gmossy/claude_document_mcp_server
If you are the rightful owner of claude_document_mcp_server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A production-ready, enterprise-grade Model Context Protocol (MCP) server providing AI assistants with comprehensive document management capabilities.
Document Management MCP Server
Created by Glenn Mossy
*Booz Allen Hamilton Sr. AI Software Developer & Data Scientist
November 27, 2024
Overview
A production-ready, enterprise-grade Model Context Protocol (MCP) server that provides AI assistants with comprehensive document management capabilities. Built with modern Python 3.13, this server demonstrates advanced software engineering practices including clean architecture, comprehensive testing, and multi-format document processing.
Key Highlights
- 13 Production-Ready MCP Tools for complete document lifecycle management
- Multi-Format Support: Word (.docx), PDF, Excel (.xlsx), Markdown, and plain text
- Advanced Search: Full-text search with FTS5 indexing and semantic filtering
- Version Control: Complete document history with diff comparison
- Enterprise Features: Bulk operations, analytics, and export capabilities
- Robust Architecture: SQLite with FTS5, async operations, comprehensive error handling
Features
Core Document Operations
- Create documents with titles, content, tags, metadata, and status
- Read documents with optional version history
- Update documents with automatic versioning
- Delete or archive documents safely
Advanced Capabilities
- Full-text search with FTS5 indexing across titles and content
- Tag-based filtering with AND logic for precise results
- Version control with complete history and comparison tools
- Content analysis including word count, reading time, and keyword extraction
- Multi-format export (Markdown, HTML, JSON, TXT, Word, PDF, Excel)
- Bulk operations for efficient tag management
- Comprehensive statistics and system monitoring
Document Format Support
- Microsoft Word (.docx) - Read and write with metadata extraction
- PDF - Read and create with multi-page support
- Microsoft Excel (.xlsx) - Multi-sheet extraction and creation
- Microsoft PowerPoint (.pptx) - Slide extraction and presentation creation
- Markdown (.md) - Full support with formatting
- Plain Text (.txt) - Universal compatibility
Technical Architecture
Technology Stack
- Python 3.13 - Latest Python with performance improvements
- FastMCP - Modern MCP server framework with async support
- SQLite with FTS5 - Full-text search indexing for performance
- Pydantic v2 - Type-safe data validation and serialization
- openpyxl - Excel file processing
- python-docx - Word document manipulation
- python-pptx - PowerPoint presentation handling
- pypdf & reportlab - PDF reading and generation
Design Patterns
- Clean Architecture - Separation of concerns with clear boundaries
- Async/Await - Non-blocking I/O for scalability
- Type Safety - Comprehensive type hints and Pydantic models
- Error Handling - Graceful degradation with detailed error messages
- Version Control - Automatic versioning with complete audit trail
Code Quality
- Comprehensive Testing - Unit tests for all major components
- Documentation - Detailed docstrings and user guides
- Type Checking - Full mypy compatibility
- Code Formatting - Black and Ruff for consistency
- Best Practices - Following PEP 8 and modern Python standards
Quick Start
Installation
Using UV (Recommended):
# Install UV if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Navigate to MCP document server subproject
cd backend/mcp_document_server
# Install Python 3.13 and sync dependencies
uv python install 3.13
uv venv --python 3.13
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv sync
Using pip:
cd backend/mcp_document_server
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -e .[dev]
Running the Server
The server runs using stdio transport for MCP communication:
cd backend/mcp_document_server
source .venv/bin/activate # or source venv/bin/activate
python document_mcp_server.py
The server will start and wait for MCP protocol messages on stdin/stdout. It's designed to be used with MCP clients like Claude Desktop or the MCP Inspector.
Testing the Server
Option 1: MCP Inspector (Recommended)
The MCP Inspector provides a web UI to interact with your server. Use one of the following:
Option A — Direct (no config, simplest)
cd /Users/glennmossy/dpg-ai-projects/claude_document_mcp_server
npx @modelcontextprotocol/inspector python backend/mcp_document_server/document_mcp_server.py
Option B — Using an Inspector config file
- Create
inspector.config.jsonin the repo root:
{
"mcpServers": {
"document-mcp": {
"command": "uv",
"args": [
"run",
"--project",
"backend/mcp_document_server",
"python",
"document_mcp_server.py"
]
}
}
}
- Start Inspector with that server:
cd /Users/glennmossy/dpg-ai-projects/claude_document_mcp_server
npx @modelcontextprotocol/inspector --config inspector.config.json --server document-mcp
Then:
- Open the URL printed in the terminal (contains MCP_PROXY_AUTH_TOKEN).
- In the left panel, Transport Type should be STDIO. Click Connect.
- In the sidebar, select server
document_mcpto see the tools.
Troubleshooting:
- If you see HTTP 404 or “Connection Error” when using Streamable HTTP, switch to STDIO and click Connect (this server does not expose /sse).
- If Inspector says the server isn’t found, ensure your config uses the key
mcpServers(notservers) and you passed--config. - If you accidentally launched bare
npxand dropped intosh-3.2$, typeexitand run the full command.
JSON examples you can paste into MCP Inspector
All tools accept JSON. Below are ready-to-paste examples for common tasks.
- List ALL documents (paginate, newest first) — use tool
document_search
{
"response_format": "json",
"limit": 100,
"offset": 0
}
- Search by keywords and tags — use tool
document_search
{
"query": "quarterly report",
"tags": ["finance", "2024"],
"status": "published",
"limit": 20,
"offset": 0,
"response_format": "json"
}
- Create a document — use tool
document_create
{
"title": "Q4 Report",
"content": "Executive summary...\n\nHighlights...",
"tags": ["finance", "2024"],
"status": "draft",
"metadata": { "author": "Glenn", "department": "Finance" }
}
- Get a document (with content and versions) — use tool
document_get
{
"document_id": "doc_abc123def456",
"include_content": true,
"include_versions": true,
"response_format": "json"
}
- Update a document (creates version if content changes) — use tool
document_update
{
"document_id": "doc_abc123def456",
"content": "Updated body...",
"tags": ["finance", "2024", "reviewed"],
"version_comment": "Added CFO notes"
}
- Archive vs permanently delete — use tool
document_delete
Archive (default):
{ "document_id": "doc_abc123def456", "permanent": false }
Permanent delete:
{ "document_id": "doc_abc123def456", "permanent": true }
- List all tags — use tool
document_list_tags
{
"sort_by_count": true,
"min_count": 1,
"response_format": "json"
}
- System statistics — use tool
document_statistics
{ "response_format": "json" }
Option 2: Manual Testing with Claude Desktop
Add to your Claude Desktop config file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"document-mcp": {
"command": "python",
"args": ["/absolute/path/to/backend/mcp_document_server/document_mcp_server.py"],
"env": {
"PYTHONPATH": "/absolute/path/to/.venv/lib/python3.13/site-packages"
}
}
}
}
Then restart Claude Desktop and the tools will be available.
Option 3: Quick Syntax Check
# Verify Python syntax
python -m py_compile document_mcp_server.py
# Check for import errors
python -c "import document_mcp_server; print('✓ Server loads successfully')"
Quick Test Workflow
Once you have the server running in MCP Inspector:
-
Create a document:
- Tool:
document_create - Input:
{"title": "Test Doc", "content": "Hello world", "tags": ["test"]}
- Tool:
-
Search for it:
- Tool:
document_search - Input:
{"query": "hello"}
- Tool:
-
Get statistics:
- Tool:
document_statistics - Input:
{}
- Tool:
-
Analyze content:
- Tool:
document_analyze - Input:
{"document_id": "<id_from_create>"}
- Tool:
Available Tools
Document CRUD Operations
document_create
Create a new document with automatic versioning.
{
"title": "Q3 Financial Report",
"content": "## Executive Summary\n\nThis quarter showed...",
"tags": ["finance", "quarterly", "2024"],
"status": "draft",
"metadata": {
"author": "Jane Smith",
"department": "Finance"
}
}
document_get
Retrieve a document with optional content and version history.
{
"document_id": "doc_abc123def456",
"include_content": true,
"include_versions": true,
"response_format": "markdown"
}
document_update
Update document content, tags, or metadata with versioning.
{
"document_id": "doc_abc123def456",
"content": "Updated content...",
"tags": ["finance", "quarterly", "2024", "reviewed"],
"version_comment": "Added review notes from CFO"
}
document_delete
Archive or permanently delete a document.
{
"document_id": "doc_abc123def456",
"permanent": false
}
Search and Discovery
document_search
Powerful search with full-text, tag filtering, and pagination.
{
"query": "financial report quarterly",
"tags": ["finance"],
"status": "published",
"created_after": "2024-01-01T00:00:00Z",
"sort_by": "updated_at",
"sort_order": "desc",
"limit": 20,
"offset": 0,
"response_format": "json"
}
document_list_tags
List all tags with usage counts.
{
"sort_by_count": true,
"min_count": 1,
"response_format": "markdown"
}
Version Control
document_get_version
Retrieve a specific historical version.
{
"document_id": "doc_abc123def456",
"version_number": 2,
"response_format": "json"
}
document_compare_versions
Compare two versions to see changes.
{
"document_id": "doc_abc123def456",
"version_a": 1,
"version_b": 3
}
Analysis and Export
document_analyze
Get content statistics and extract keywords.
{
"document_id": "doc_abc123def456",
"include_stats": true,
"include_keywords": true,
"response_format": "markdown"
}
Output includes:
- Word count, character count
- Line and paragraph counts
- Average word length
- Estimated reading time
- Top 15 keywords
document_export
Export to Markdown, HTML, JSON, or plain text.
{
"document_id": "doc_abc123def456",
"format": "html",
"include_metadata": true
}
Bulk Operations
document_bulk_tag
Add or remove tags from multiple documents.
{
"document_ids": ["doc_abc123", "doc_def456", "doc_ghi789"],
"add_tags": ["reviewed", "2024"],
"remove_tags": ["draft"]
}
System Monitoring
document_statistics
Get comprehensive system statistics.
{
"response_format": "markdown"
}
Provides:
- Total documents and storage usage
- Status distribution (draft/published/archived)
- Version statistics
- Recent activity
- Most versioned documents
Data Model
Document Structure
{
"id": "doc_abc123def456",
"title": "Document Title",
"content": "Document content in markdown or plain text",
"tags": ["tag1", "tag2"],
"status": "draft|published|archived",
"metadata": {"key": "value"},
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-16T14:20:00Z",
"size": 1234,
"content_hash": "sha256_hash"
}
Version Structure
{
"document_id": "doc_abc123def456",
"version_number": 1,
"title": "Title at this version",
"content": "Content at this version",
"tags": ["tags", "at", "version"],
"status": "status_at_version",
"metadata": {},
"created_at": "2024-01-15T10:30:00Z",
"comment": "Version change description",
"content_hash": "sha256_hash"
}
Response Formats
All data-returning tools support two formats:
Markdown (default)
Human-readable format with headers, lists, and formatting:
# Document Analysis
**Document**: Q3 Financial Report
**ID**: `doc_abc123def456`
## Statistics
- **Word Count**: 1,234
- **Estimated Reading Time**: 6 minutes
...
JSON
Machine-readable structured data:
{
"document_id": "doc_abc123def456",
"title": "Q3 Financial Report",
"stats": {
"word_count": 1234,
"reading_time_minutes": 6
}
}
Database Schema
The server uses SQLite with the following tables:
- documents - Main document storage
- document_versions - Version history
- documents_fts - Full-text search index (FTS5)
Database and document storage are automatically initialized on first run.
Configuration
Default constants (configurable in source):
DATABASE_PATH:./documents.dbDOCUMENTS_DIR:./document_storageMAX_CONTENT_SIZE: 10MBMAX_TAGS: 50 per documentMAX_SEARCH_RESULTS: 100DEFAULT_PAGE_SIZE: 20
Best Practices
Tool Annotations
All tools include MCP annotations:
readOnlyHint: Whether the tool modifies datadestructiveHint: Whether it performs destructive operationsidempotentHint: Whether repeated calls have the same effectopenWorldHint: Whether it interacts with external services
Error Handling
All tools return structured error responses with:
- Clear error messages
- Specific suggestions for resolution
- Consistent JSON format
Pagination
Search tools support pagination with:
limit: Results per page (1-100)offset: Skip count for pagination- Response includes
has_moreandnext_offset
Integration Examples
Claude Desktop Configuration
Add to your Claude Desktop config:
{
"mcpServers": {
"document-mcp": {
"command": "python",
"args": ["/path/to/document_mcp_server.py"]
}
}
}
Example Workflows
Creating and Publishing a Report:
document_create- Create initial draftdocument_update- Add content revisionsdocument_analyze- Check statisticsdocument_update- Set status to "published"
Organizing Documents:
document_search- Find related documentsdocument_bulk_tag- Apply consistent tagsdocument_list_tags- Review tag organization
Reviewing Changes:
document_get- Get current version with historydocument_compare_versions- See what changeddocument_get_version- Retrieve specific version
Development
Project Structure
backend/
mcp_document_server/
document_mcp_server.py # Main MCP server implementation
document_parsers.py # Document parsing utilities (Word, PDF, Excel, PPTX, etc.)
docs/ # MCP/server docs
document_storage/ # Storage directory (auto-created)
documents.db # SQLite database (auto-created)
tests/ # Test suite and sample office files
pyproject.toml # MCP server project configuration
uv.lock # uv dependency lockfile
Dockerfile # Container image for this server
README-mcp.md # Subproject README
dist/
document_mcp-*.whl, *.tar.gz # Built artifacts
Code Quality
The codebase follows:
- PEP 8 style guidelines
- Type hints throughout
- Pydantic v2 for validation
- Comprehensive docstrings
- DRY principles with shared utilities
Testing
# Install dev dependencies
pip install -e .[dev]
# Run linting
ruff check .
black --check .
mypy .
License
MIT License - See LICENSE file for details.
Contributing
Contributions welcome! Please ensure:
- Code follows existing patterns
- All tools have proper annotations
- Input validation uses Pydantic
- Error messages are actionable
- Documentation is updated
Project Metrics
- Lines of Code: 7,300+
- Test Coverage: Comprehensive unit and integration tests
- Documentation: 5 detailed guides + inline documentation
- Supported Formats: 5 (Word, PDF, Excel, Markdown, Text)
- MCP Tools: 13 production-ready endpoints
- Dependencies: Minimal, well-maintained packages
- Performance: Sub-second response for most operations
Skills Demonstrated
This project showcases proficiency in:
Software Engineering
- Clean Code Architecture - Modular design with clear separation of concerns
- API Design - RESTful principles applied to MCP tool design
- Database Design - Efficient schema with FTS5 indexing
- Error Handling - Comprehensive exception handling and validation
- Documentation - Professional-grade documentation and examples
Data Science & AI
- Document Processing - Multi-format parsing and text extraction
- Search & Retrieval - Full-text search with ranking algorithms
- Content Analysis - Statistical analysis and keyword extraction
- Version Control - Data versioning and diff algorithms
- AI Integration - MCP protocol for LLM tool use
Modern Python
- Python 3.13 - Latest language features and optimizations
- Async Programming - Non-blocking I/O with asyncio
- Type Safety - Comprehensive type hints and Pydantic validation
- Package Management - Modern tooling with UV
- Testing - Unit tests and integration testing
DevOps & Tools
- Git - Version control and repository management
- Virtual Environments - Dependency isolation
- CI/CD Ready - Structured for automated deployment
- Cross-Platform - Works on macOS, Linux, and Windows
About the Creator
Glenn Mossy is a Senior AI Software Developer and Data Scientist with expertise in building production-ready AI systems. This project demonstrates the ability to:
- Design and implement complex systems from scratch
- Write clean, maintainable, and well-documented code
- Integrate multiple technologies into cohesive solutions
- Follow software engineering best practices
- Deliver enterprise-grade applications
Contact & Links
- Project Date: November 26, 2024
- Role: Creator & Lead Developer
Acknowledgments
Built following the Model Context Protocol specification and best practices.
This project serves as a portfolio piece demonstrating advanced software engineering, AI integration, and data science capabilities.