SankaiAI/mydoc-mcp
If you are the rightful owner of mydoc-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
mydocs-mcp is a Model Context Protocol server designed to enhance AI coding agents' capabilities by leveraging personal document history for intelligent template generation and pattern recognition.
mydoc-mcp
Personal Document Intelligence MCP Server
A Model Context Protocol server that enables AI coding agents like Claude Code to intelligently search, index, and retrieve your personal documents with sub-200ms performance.
๐ Quick Start
Prerequisites
- Python 3.11 or higher
- Claude Code or any MCP-compatible client
- 500MB disk space for database and logs
Installation
Option 1: Standard Installation
# Clone the repository
git clone https://github.com/yourusername/mydoc-mcp.git
cd mydoc-mcp
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run the server
python -m src.main
Option 2: Docker Installation
# Using Docker Compose
docker-compose up
# Or build and run manually
docker build -t mydoc-mcp .
docker run -v ./data:/app/data -v ./documents:/app/documents mydoc-mcp
๐ For detailed Docker deployment guide, see
Configure Claude Code
Add to your Claude Code MCP settings:
{
"mcpServers": {
"mydocs": {
"command": "python",
"args": ["-m", "src.main"],
"cwd": "/path/to/mydoc-mcp",
"env": {
"DOCUMENT_ROOT": "/path/to/your/documents",
"DATABASE_URL": "sqlite:///data/mydoc.db"
}
}
}
}
๐ For detailed Claude Code setup guide, see
๐ mydoc-mcp vs Traditional Claude Code File Lookup
How Claude Code Works Today (Current Capabilities)
Claude Code is quite capable with built-in tools:
User: "Create API docs like the good one I wrote before"
Claude: "Let me search for API documentation in your project"
โ Uses: find . -name "*.md" | grep -l "API"
โ Uses: grep -r "API documentation" docs/
Claude: "I found several API docs. Let me read the most recent one..."
Result: โ
Finds files in current project, but limited to current session/project
Claude Code CAN:
- Search files with terminal commands (
find
,grep
) - Use pattern matching (
Glob
) to discover files - Read and analyze project structure
- Understand file relationships within current project
How mydoc-mcp Works (Intelligent Approach)
With mydoc-mcp, the same request becomes:
User: "Create API docs like the good ones I've written before"
mydoc-mcp: *Automatically finds your top 5 API docs across ALL projects*
Claude: "I found your best API documentation patterns. Based on your most successful approaches..."
Result: โ
Instant access to proven patterns (2-3 minutes)
Key Differences
Claude Code (Current) | mydoc-mcp Enhanced | The Gap We Fill |
---|---|---|
๐๏ธ Current project only | ๐ Cross-project intelligence | Access ALL your historical documents |
๐ Session-based discovery | ๐พ Persistent document memory | Remembers documents across sessions |
๐ Pattern matching search | ๐ฏ Relevance-ranked results | Finds your BEST examples, not just any match |
๐ File-system limited | ๐ Intelligence about content quality | Knows which docs were successful |
โฑ๏ธ Each session starts fresh | ๐ง Learns your document patterns | Builds knowledge of your writing style |
๐ Find files that exist | ๐ฏ Surface relevant examples proactively | Suggests what you didn't know you needed |
Real-World Example: Creating a Technical Specification
Claude Code Today (Current Session):
๐ค "Help me write a technical spec for the new payment system"
๐ค "Let me search for existing technical specs in this project"
โ find . -name "*spec*" -o -name "*technical*"
โ grep -r "technical specification" docs/
๐ค "I found 2 spec files in this project. Let me analyze them..."
โฑ๏ธ Time: 5-8 minutes (good file discovery in current project)
๐ Quality: Based on current project examples only
๐ซ Limitation: Can't access your best specs from other projects
mydoc-mcp Enhanced Workflow:
๐ค "Help me write a technical spec for the new payment system"
๐ฏ mydoc-mcp automatically finds:
- 3 of your best technical specifications
- Similar payment/financial system docs
- Your preferred spec structure and terminology
๐ค "Based on your most successful technical specs, especially your payment gateway and auth system designs, I'll create a spec that follows your proven patterns..."
โฑ๏ธ Time: 3-5 minutes (instant context)
๐ Quality: Based on proven patterns from multiple successful projects
Why This Matters
๐ Speed: 60-80% Faster
- No manual file hunting
- Instant access to relevant examples
- Automated pattern recognition
๐ Quality: Better Outcomes
- Based on your BEST work, not just any example
- Learns what patterns work for you
- Maintains consistency across projects
๐ง Intelligence: Personal Learning
- Remembers your successful approaches
- Identifies document relationships
- Suggests improvements based on your evolution
โก Workflow: Seamless Integration
- Works transparently with Claude Code
- No workflow changes required
- Enhanced capabilities without complexity
Current MVP vs Future Vision
โ Available Now (Phase 1):
- Intelligent keyword search and relevance ranking
- Automatic document indexing and discovery
- Persistent document database across sessions
- Fast pattern-based retrieval (<200ms)
- Cross-project document access
๐ Coming Soon (Phase 2):
- Full semantic understanding with AI embeddings
- Advanced pattern recognition and template generation
- Multi-project document relationship analysis
- Proactive document suggestions based on context
The workflows shown above represent the full vision. Current MVP provides the foundation with keyword-based intelligence that's already significantly better than single-project file lookup.
Note about Claude Code's Future: If Claude Code adds embedding-based search, mydoc-mcp would still provide unique value through cross-project learning, persistent memory, and document quality intelligence.
๐ What mydoc-mcp Enables That Claude Code Can't Do
๐ Cross-Project Document Intelligence
What Claude Code Does:
- Searches files in current project directory only
- Starts fresh each session
- No memory of past projects or documents
What mydoc-mcp Adds:
- โ Access ALL your historical documents across every project
- โ Persistent document database that remembers everything
- โ Cross-project pattern recognition - find similar approaches from any past work
- โ Continuous learning - builds knowledge from your document history
๐ฏ Intelligent Document Discovery & Ranking
What Claude Code Does:
- Basic pattern matching (
find
,grep
) - Returns files that match search terms
- No understanding of document quality
What mydoc-mcp Adds:
- โ Relevance-based ranking - finds your BEST examples, not just matches
- โ Content quality intelligence - learns which documents were successful
- โ Semantic similarity (Phase 2) - understands meaning, not just keywords
- โ Automatic metadata extraction - title, structure, relationships
โก Performance & Production Features
What Claude Code Does:
- File operations depend on system performance
- No caching or optimization for document access
- No specialized document handling
What mydoc-mcp Adds:
- โ Sub-200ms guaranteed response times (achieved <100ms average)
- โ Intelligent caching - search results and parsed documents
- โ Auto-indexing with file watching - new documents indexed automatically
- โ Batch processing - handle multiple documents efficiently
- โ Production-ready reliability - comprehensive error handling and logging
๐ง Developer Experience Enhancement
What Claude Code Does:
- Requires manual file path specification
- Generic document processing
- Session-limited context
What mydoc-mcp Adds:
- โ "Find documents like my best API specs" - intent-based discovery
- โ Personal writing pattern recognition - adapts to YOUR style
- โ Proactive document suggestions - surfaces relevant examples automatically
- โ Template generation from patterns (Phase 2) - create based on your proven approaches
๐ Comprehensive File Type Support (25+ Types)
Beyond Just Documentation - Index Your Entire Project Intelligence
mydoc-mcp supports 25+ file types, making it truly comprehensive for project document intelligence:
๐ Documentation & Content
- Markdown:
.md
,.markdown
,.mdown
,.mkd
,.mkdn
- Text Files:
.txt
,.text
,.readme
,.changelog
,.license
- Project Notes:
.notes
,.todo
,.fixme
,.authors
,.contributors
๐ป Code & Scripts
- Programming Languages:
.py
,.js
,.html
,.htm
,.css
,.sql
- Shell Scripts:
.sh
,.bat
,.cmd
,.ps1
- Infrastructure:
.dockerfile
,.gitignore
โ๏ธ Configuration & Data
- Structured Data:
.json
,.xml
,.yaml
,.yml
,.csv
,.tsv
- Configuration:
.cfg
,.conf
,.config
,.ini
,.properties
,.env
- Logs & Data:
.log
,.dat
Why This Matters: Beyond Traditional "Document" Search
Unlike document-only solutions, mydoc-mcp learns from your entire project ecosystem:
โ
Code Comments & Documentation: Learn patterns from your Python docstrings, JavaScript comments
โ
Configuration Consistency: Find your best practices in Docker, YAML, JSON configurations
โ
Script Templates: Discover your proven shell scripts and automation patterns
โ
Data Patterns: Learn from your CSV structures, log formats, and data organization
Real-World Intelligence Examples
Python Development:
User: "Create a new API endpoint"
mydoc-mcp finds:
- Your best Python API implementations (.py files)
- Associated configuration patterns (.yaml, .json)
- Documentation examples (.md files)
- Deployment scripts (.sh, .dockerfile)
Frontend Projects:
User: "Set up a new component"
mydoc-mcp discovers:
- Successful component patterns (.js files)
- Styling approaches (.css files)
- Configuration setups (.json files)
- Documentation formats (.md files)
Competitive Advantage: Holistic Project Intelligence
Approach | File Types | Intelligence Level |
---|---|---|
Traditional Tools | Documentation only | Surface-level file matching |
GitHub MCP | Repository structure | Code discovery & navigation |
mydoc-mcp | 25+ project file types | Holistic project pattern learning |
Key Insight: mydoc-mcp doesn't just find your documentation - it learns from your entire development pattern ecosystem to help you replicate successful approaches across all file types.
๐ mydoc-mcp vs GitHub MCP: Why Both Matter
"Why not just use GitHub MCP to access my historical repos?"
Excellent question! GitHub MCP is incredibly powerful for repository-based work, but mydoc-mcp serves a different, complementary purpose:
GitHub MCP Strengths
- โ Repository management: Code discovery across multiple repos
- โ Version control integration: Git history, commits, branches
- โ Code-centric search: Find functions, classes, implementation patterns
- โ Project structure navigation: Repository organization and relationships
mydoc-mcp Unique Value
- โ Document quality intelligence: Learns which documents were most successful
- โ Writing pattern recognition: Adapts to your personal documentation style
- โ Performance-optimized: Sub-200ms document retrieval (no API limits)
- โ Privacy-first: 100% local, works with any documents (non-Git files included)
Real-World Comparison
Scenario: "Create a technical specification like my best ones"
GitHub MCP Approach:
1. Search across multiple repos for "technical specification"
2. Find 15+ spec files across different projects
3. Manual review to identify the best examples
4. Time: 8-12 minutes + quality assessment
mydoc-mcp Approach:
1. Instantly surface top 3 technical specifications based on:
- Document reuse frequency and success patterns
- Cross-reference success (docs that led to successful projects)
- Your personal writing evolution and improvements
2. Time: 2-3 minutes with pre-filtered quality ranking
Different Problem Domains
Focus Area | GitHub MCP | mydoc-mcp | Best Use Case |
---|---|---|---|
Primary Purpose | Repository & code discovery | Document quality intelligence | Code structure vs writing patterns |
Search Target | "What code patterns exist?" | "What documentation works best for me?" | Different questions entirely |
Intelligence Type | Repository structure awareness | Personal writing pattern learning | Complementary strengths |
Performance | Network/API dependent | Local, sub-200ms guaranteed | Speed vs breadth trade-off |
Scope | Git repositories only | Any documents anywhere | Repository vs filesystem |
Why Use Both Together
Optimal Workflow:
- GitHub MCP: Discover code patterns and project structure across repositories
- mydoc-mcp: Generate documentation templates based on your proven successful approaches
- Result: Code structure insights + personalized documentation patterns = faster, better outcomes
Example Combined Usage:
User: "Create API documentation for this new service"
Claude Code Workflow:
1. GitHub MCP โ Find similar API implementations across your repos
2. mydoc-mcp โ Retrieve your most successful API documentation templates
3. Generate โ New API docs using proven code patterns + your best writing style
When to Choose Which
Use GitHub MCP when:
- Discovering code implementations across projects
- Understanding repository relationships and history
- Finding specific functions or technical implementations
- Working within Git-based workflows
Use mydoc-mcp when:
- Creating documentation that matches your successful patterns
- Learning from your personal document evolution
- Optimizing for document retrieval speed and quality
- Working with documents outside of Git repositories
Use Both when:
- Building comprehensive project documentation
- Maintaining consistency across code and documentation
- Leveraging both technical and writing pattern intelligence
Key Insight: Complementary, Not Competitive
mydoc-mcp doesn't replace GitHub MCP - it enhances your documentation workflow while GitHub MCP enhances your code discovery workflow. Together, they provide comprehensive historical intelligence for both your technical implementations and your documentation patterns.
๐ ๏ธ Usage
Basic Commands
Index a Document
# Through Claude Code
"Index the document at /path/to/document.md"
# Response
{
"success": true,
"document_id": "doc_12345",
"indexed_at": "2025-09-04T15:00:00Z"
}
Search Documents
# Search for documents
"Search for documents about API design"
# Response
{
"results": [
{
"id": "doc_12345",
"title": "API Design Guidelines",
"relevance_score": 0.95,
"snippet": "...REST API design patterns..."
}
],
"total": 5,
"search_time_ms": 45
}
Retrieve Document
# Get specific document
"Get the document with ID doc_12345"
# Response
{
"success": true,
"content": "# API Design Guidelines\n\n...",
"metadata": {
"title": "API Design Guidelines",
"file_type": "markdown",
"word_count": 1500
}
}
Configuration Options
Environment Variables
# Core server settings
TRANSPORT=stdio # MCP transport protocol
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
LOG_FILE=logs/mydocs.log # Optional log file path
# Database & storage
DATABASE_URL=sqlite:///data/mydoc.db # Database connection string
DOCUMENT_ROOT=./data/documents # Root directory for documents
CACHE_DIRECTORY=./data/cache # Cache directory for processed files
# Performance tuning
MAX_CONCURRENT_CONNECTIONS=10 # Maximum concurrent MCP connections
REQUEST_TIMEOUT=30.0 # Request timeout in seconds
RESPONSE_TIMEOUT=30.0 # Response timeout in seconds
MAX_SEARCH_RESULTS=50 # Maximum search results returned
DEFAULT_SEARCH_LIMIT=10 # Default number of search results
# Document processing
MAX_DOCUMENT_SIZE=10485760 # Max document size (10MB)
SUPPORTED_EXTENSIONS=.md,.txt,.py,.js,.json,.yaml,.html,.css,.sql,.sh,.dockerfile,.env,.log,.csv # Comma-separated file extensions (25+ types supported)
Configuration File (.env)
# Create a .env file in the project root
TRANSPORT=stdio
DATABASE_URL=sqlite:///data/mydoc.db
DOCUMENT_ROOT=/home/user/Documents
LOG_LEVEL=INFO
WATCH_ENABLED=true
๐ Performance Metrics
Operation | Target | Actual | Status |
---|---|---|---|
Index Document | < 200ms | 45ms avg | โ PASS |
Search Documents | < 200ms | 67ms avg | โ PASS |
Get Document | < 200ms | 23ms avg | โ PASS |
Bulk Index (10 docs) | < 2s | 450ms | โ PASS |
Test Environment: Windows 11, Python 3.11, SQLite, 1000 test documents
๐ง Architecture
System Components
mydoc-mcp/
โโโ ๐ MCP Server Core (src/)
โ โโโ main.py # Entry point & MCP server bootstrap
โ โโโ server.py # MCP server implementation
โ โโโ config.py # Configuration management
โ โโโ logging_config.py # Structured logging setup
โ โโโ tool_registry.py # MCP tool registration system
โโโ ๐ง MCP Tools (src/tools/)
โ โโโ base.py # Abstract tool base class
โ โโโ indexDocument.py # Document indexing tool
โ โโโ searchDocuments.py # Intelligent search tool
โ โโโ getDocument.py # Document retrieval tool
โ โโโ registration.py # Tool auto-registration
โโโ ๐พ Storage Layer (src/database/)
โ โโโ connection.py # Async SQLite connection management
โ โโโ models.py # Database schema & models
โ โโโ database_manager.py # Document CRUD operations
โ โโโ queries.py # Optimized SQL queries
โ โโโ migrations.py # Schema migrations
โโโ ๐ Document Processing (src/parsers/)
โ โโโ base.py # Abstract parser interface
โ โโโ parser_factory.py # Parser selection & creation
โ โโโ markdown_parser.py # Markdown document parsing
โ โโโ text_parser.py # Plain text parsing
โ โโโ database_integration.py # Parser โ database integration
โโโ ๐๏ธ File System Monitoring (src/watcher/)
โโโ file_watcher.py # File system event monitoring
โโโ event_handler.py # Document change processing
โโโ config.py # Watcher configuration
Data Flow Architecture
Document Indexing Flow
File Change โ File Watcher โ Event Handler โ Parser Factory โ
Specific Parser โ Database Manager โ SQLite โ Search Index Update
Search Query Flow
MCP Tool Request โ Query Validation โ Database Manager โ
Optimized SQL Query โ Relevance Scoring โ Result Ranking โ JSON Response
System Integration Flow
Claude Code โ MCP Protocol โ Tool Registry โ Async Tool Execution โ
Storage Layer โ Performance Validation โ Response (< 200ms)
Key Architectural Decisions
๐ Performance-First Design
- Async/await throughout: All I/O operations are non-blocking
- Connection pooling: Efficient database connection management
- Optimized queries: Sub-200ms response time guarantee
- Smart caching: Result caching with TTL expiration
๐ Extensible Plugin Architecture
- Factory patterns: Easy addition of new parsers and tools
- Interface-based design: Clean separation of concerns
- Modular components: Independent development and testing
- Event-driven updates: Real-time file system monitoring
๐ณ Docker Deployment
Quick Start with Docker
# Development mode
docker-compose -f docker-compose.dev.yml up
# Production mode
docker-compose up -d
# View logs
docker-compose logs -f
# Stop server
docker-compose down
Docker Compose Configuration
version: '3.8'
services:
mydoc-mcp:
image: mydoc-mcp:latest
volumes:
- ./data:/app/data
- ~/Documents:/app/documents:ro
environment:
- DOCUMENT_ROOT=/app/documents
- LOG_LEVEL=INFO
restart: unless-stopped
๐งช Testing
Run Tests
# Run all tests
python -m pytest tests/
# Run integration tests
python tests/test_integration.py
# Run performance tests
python tests/test_performance.py
# Validate MCP compliance
python tests/test_mcp_validation.py
Test Coverage
- Unit Tests: 72% coverage
- Integration Tests: 100% of critical paths
- Performance Tests: All operations validated < 200ms
- MCP Compliance: A grade (86% validation)
๐ Documentation
User Guides
- - Detailed setup instructions
- - How to use with Claude Code
- - All configuration options
Technical Documentation
- - Complete MCP tool documentation
- - System design
- - Storage structure
Developer Resources
- - How to contribute
- - Dev environment setup
- - Version history
๐ Troubleshooting
Common Issues
Server won't start
# Check Python version
python --version # Must be 3.11+
# Verify dependencies
pip list | grep mcp
# Check logs
tail -f logs/mydoc-mcp.log
Documents not indexing
# Check document root
echo $DOCUMENT_ROOT
# Verify permissions
ls -la $DOCUMENT_ROOT
# Force reindex
python -m src.tools.reindex --force
Slow search performance
# Check database size
du -h data/mydoc.db
# Optimize database
python -m src.tools.optimize
# Clear cache
python -m src.tools.clear-cache
Debug Mode
# Enable debug logging
export LOG_LEVEL=DEBUG
python -m src.main
# Or in .env file
LOG_LEVEL=DEBUG
DEBUG_MODE=true
๐ฏ Roadmap
Phase 1: MVP (Complete)
- โ Core MCP server with stdio transport
- โ Document indexing and storage
- โ Keyword search with ranking
- โ Three core MCP tools
- โ Docker deployment
Phase 2: Enhanced Search (Planned)
- ๐ Semantic search with embeddings
- ๐ Advanced query syntax
- ๐ Search filters and facets
- ๐ Search history and suggestions
Phase 3: Advanced Features
- ๐ PDF and DOCX support
- ๐ Template generation from patterns
- ๐ Document clustering
- ๐ Cross-document insights
Phase 4: Enterprise
- ๐ Multi-user support
- ๐ Remote deployment (HTTP+SSE)
- ๐ Authentication and permissions
- ๐ Audit logging
๐ค Contributing
We welcome contributions! Please see our for details.
Development Process
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Code Style
- Python 3.11+ type hints
- Black formatting
- Comprehensive docstrings
- 80% test coverage minimum
๐ License
MIT License - see for details
๐ Acknowledgments
- Anthropic - For the Model Context Protocol
- MCP Community - For inspiration and best practices
- Contributors - For making this project better
๐ Support
Getting Help
- ๐ Check the
- ๐ Report issues on GitHub Issues
- ๐ฌ Join our Discord community
Project Status
- Current Version: 1.0.0-beta
- Status: Day 2 Complete, Ready for Production Testing
- Last Updated: September 4, 2025
Transform your document workflow with intelligent MCP-powered search and retrieval! ๐
Built with โค๏ธ for the AI development community