mydoc-mcp by SankaiAI - MCP Server

mydoc-mcp

Personal Document Intelligence MCP Server

A Model Context Protocol server that enables AI coding agents like Claude Code to intelligently search, index, and retrieve your personal documents with sub-200ms performance.

🚀 Quick Start

Prerequisites

Python 3.11 or higher
Claude Code or any MCP-compatible client
500MB disk space for database and logs

Installation

Option 1: Standard Installation

# Clone the repository
git clone https://github.com/yourusername/mydoc-mcp.git
cd mydoc-mcp

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run the server
python -m src.main

Option 2: Docker Installation

# Using Docker Compose
docker-compose up

# Or build and run manually
docker build -t mydoc-mcp .
docker run -v ./data:/app/data -v ./documents:/app/documents mydoc-mcp

📚 For detailed Docker deployment guide, see

Configure Claude Code

Add to your Claude Code MCP settings:

{
  "mcpServers": {
    "mydocs": {
      "command": "python",
      "args": ["-m", "src.main"],
      "cwd": "/path/to/mydoc-mcp",
      "env": {
        "DOCUMENT_ROOT": "/path/to/your/documents",
        "DATABASE_URL": "sqlite:///data/mydoc.db"
      }
    }
  }
}

📚 For detailed Claude Code setup guide, see

🆚 mydoc-mcp vs Traditional Claude Code File Lookup

How Claude Code Works Today (Current Capabilities)

Claude Code is quite capable with built-in tools:

User: "Create API docs like the good one I wrote before"
Claude: "Let me search for API documentation in your project"
       → Uses: find . -name "*.md" | grep -l "API"
       → Uses: grep -r "API documentation" docs/
Claude: "I found several API docs. Let me read the most recent one..."
Result: ✅ Finds files in current project, but limited to current session/project

Claude Code CAN:

Search files with terminal commands (find, grep)
Use pattern matching (Glob) to discover files
Read and analyze project structure
Understand file relationships within current project

How mydoc-mcp Works (Intelligent Approach)

With mydoc-mcp, the same request becomes:

User: "Create API docs like the good ones I've written before"
mydoc-mcp: *Automatically finds your top 5 API docs across ALL projects*
Claude: "I found your best API documentation patterns. Based on your most successful approaches..."
Result: ✅ Instant access to proven patterns (2-3 minutes)

Key Differences

Claude Code (Current)	mydoc-mcp Enhanced	The Gap We Fill
🗂️ Current project only	🌐 Cross-project intelligence	Access ALL your historical documents
🔄 Session-based discovery	💾 Persistent document memory	Remembers documents across sessions
🔍 Pattern matching search	🎯 Relevance-ranked results	Finds your BEST examples, not just any match
📁 File-system limited	📚 Intelligence about content quality	Knows which docs were successful
⏱️ Each session starts fresh	🧠 Learns your document patterns	Builds knowledge of your writing style
🔎 Find files that exist	🎯 Surface relevant examples proactively	Suggests what you didn't know you needed

Real-World Example: Creating a Technical Specification

Claude Code Today (Current Session):

👤 "Help me write a technical spec for the new payment system"
🤖 "Let me search for existing technical specs in this project"
    → find . -name "*spec*" -o -name "*technical*"
    → grep -r "technical specification" docs/
🤖 "I found 2 spec files in this project. Let me analyze them..."
⏱️ Time: 5-8 minutes (good file discovery in current project)
📊 Quality: Based on current project examples only
🚫 Limitation: Can't access your best specs from other projects

mydoc-mcp Enhanced Workflow:

👤 "Help me write a technical spec for the new payment system"
🎯 mydoc-mcp automatically finds:
   - 3 of your best technical specifications
   - Similar payment/financial system docs
   - Your preferred spec structure and terminology
🤖 "Based on your most successful technical specs, especially your payment gateway and auth system designs, I'll create a spec that follows your proven patterns..."
⏱️ Time: 3-5 minutes (instant context)
📊 Quality: Based on proven patterns from multiple successful projects

Why This Matters

🚀 Speed: 60-80% Faster

No manual file hunting
Instant access to relevant examples
Automated pattern recognition

📈 Quality: Better Outcomes

Based on your BEST work, not just any example
Learns what patterns work for you
Maintains consistency across projects

🧠 Intelligence: Personal Learning

Remembers your successful approaches
Identifies document relationships
Suggests improvements based on your evolution

⚡ Workflow: Seamless Integration

Works transparently with Claude Code
No workflow changes required
Enhanced capabilities without complexity

Current MVP vs Future Vision

✅ Available Now (Phase 1):

Intelligent keyword search and relevance ranking
Automatic document indexing and discovery
Persistent document database across sessions
Fast pattern-based retrieval (<200ms)
Cross-project document access

📅 Coming Soon (Phase 2):

Full semantic understanding with AI embeddings
Advanced pattern recognition and template generation
Multi-project document relationship analysis
Proactive document suggestions based on context

The workflows shown above represent the full vision. Current MVP provides the foundation with keyword-based intelligence that's already significantly better than single-project file lookup.

Note about Claude Code's Future: If Claude Code adds embedding-based search, mydoc-mcp would still provide unique value through cross-project learning, persistent memory, and document quality intelligence.

🚀 What mydoc-mcp Enables That Claude Code Can't Do

🌐 Cross-Project Document Intelligence

What Claude Code Does:

Searches files in current project directory only
Starts fresh each session
No memory of past projects or documents

What mydoc-mcp Adds:

✅ Access ALL your historical documents across every project
✅ Persistent document database that remembers everything
✅ Cross-project pattern recognition - find similar approaches from any past work
✅ Continuous learning - builds knowledge from your document history

🎯 Intelligent Document Discovery & Ranking

What Claude Code Does:

Basic pattern matching (find, grep)
Returns files that match search terms
No understanding of document quality

What mydoc-mcp Adds:

✅ Relevance-based ranking - finds your BEST examples, not just matches
✅ Content quality intelligence - learns which documents were successful
✅ Semantic similarity (Phase 2) - understands meaning, not just keywords
✅ Automatic metadata extraction - title, structure, relationships

⚡ Performance & Production Features

What Claude Code Does:

File operations depend on system performance
No caching or optimization for document access
No specialized document handling

What mydoc-mcp Adds:

✅ Sub-200ms guaranteed response times (achieved <100ms average)
✅ Intelligent caching - search results and parsed documents
✅ Auto-indexing with file watching - new documents indexed automatically
✅ Batch processing - handle multiple documents efficiently
✅ Production-ready reliability - comprehensive error handling and logging

🔧 Developer Experience Enhancement

What Claude Code Does:

Requires manual file path specification
Generic document processing
Session-limited context

What mydoc-mcp Adds:

✅ "Find documents like my best API specs" - intent-based discovery
✅ Personal writing pattern recognition - adapts to YOUR style
✅ Proactive document suggestions - surfaces relevant examples automatically
✅ Template generation from patterns (Phase 2) - create based on your proven approaches

📁 Comprehensive File Type Support (25+ Types)

Beyond Just Documentation - Index Your Entire Project Intelligence

mydoc-mcp supports 25+ file types, making it truly comprehensive for project document intelligence:

📝 Documentation & Content

Markdown: .md, .markdown, .mdown, .mkd, .mkdn
Text Files: .txt, .text, .readme, .changelog, .license
Project Notes: .notes, .todo, .fixme, .authors, .contributors

💻 Code & Scripts

Programming Languages: .py, .js, .html, .htm, .css, .sql
Shell Scripts: .sh, .bat, .cmd, .ps1
Infrastructure: .dockerfile, .gitignore

⚙️ Configuration & Data

Structured Data: .json, .xml, .yaml, .yml, .csv, .tsv
Configuration: .cfg, .conf, .config, .ini, .properties, .env
Logs & Data: .log, .dat

Why This Matters: Beyond Traditional "Document" Search

Unlike document-only solutions, mydoc-mcp learns from your entire project ecosystem:

✅ Code Comments & Documentation: Learn patterns from your Python docstrings, JavaScript comments
✅ Configuration Consistency: Find your best practices in Docker, YAML, JSON configurations
✅ Script Templates: Discover your proven shell scripts and automation patterns
✅ Data Patterns: Learn from your CSV structures, log formats, and data organization

Real-World Intelligence Examples

Python Development:

User: "Create a new API endpoint"
mydoc-mcp finds:
- Your best Python API implementations (.py files)
- Associated configuration patterns (.yaml, .json)
- Documentation examples (.md files)
- Deployment scripts (.sh, .dockerfile)

Frontend Projects:

User: "Set up a new component"
mydoc-mcp discovers:
- Successful component patterns (.js files) 
- Styling approaches (.css files)
- Configuration setups (.json files)
- Documentation formats (.md files)

Competitive Advantage: Holistic Project Intelligence

Approach	File Types	Intelligence Level
Traditional Tools	Documentation only	Surface-level file matching
GitHub MCP	Repository structure	Code discovery & navigation
mydoc-mcp	25+ project file types	Holistic project pattern learning

Key Insight: mydoc-mcp doesn't just find your documentation - it learns from your entire development pattern ecosystem to help you replicate successful approaches across all file types.

🌟 mydoc-mcp vs GitHub MCP: Why Both Matter

"Why not just use GitHub MCP to access my historical repos?"

Excellent question! GitHub MCP is incredibly powerful for repository-based work, but mydoc-mcp serves a different, complementary purpose:

GitHub MCP Strengths

✅ Repository management: Code discovery across multiple repos
✅ Version control integration: Git history, commits, branches
✅ Code-centric search: Find functions, classes, implementation patterns
✅ Project structure navigation: Repository organization and relationships

mydoc-mcp Unique Value

✅ Document quality intelligence: Learns which documents were most successful
✅ Writing pattern recognition: Adapts to your personal documentation style
✅ Performance-optimized: Sub-200ms document retrieval (no API limits)
✅ Privacy-first: 100% local, works with any documents (non-Git files included)

Real-World Comparison

Scenario: "Create a technical specification like my best ones"

GitHub MCP Approach:

1. Search across multiple repos for "technical specification"
2. Find 15+ spec files across different projects  
3. Manual review to identify the best examples
4. Time: 8-12 minutes + quality assessment

mydoc-mcp Approach:

1. Instantly surface top 3 technical specifications based on:
   - Document reuse frequency and success patterns
   - Cross-reference success (docs that led to successful projects)
   - Your personal writing evolution and improvements
2. Time: 2-3 minutes with pre-filtered quality ranking

Different Problem Domains

Focus Area	GitHub MCP	mydoc-mcp	Best Use Case
Primary Purpose	Repository & code discovery	Document quality intelligence	Code structure vs writing patterns
Search Target	"What code patterns exist?"	"What documentation works best for me?"	Different questions entirely
Intelligence Type	Repository structure awareness	Personal writing pattern learning	Complementary strengths
Performance	Network/API dependent	Local, sub-200ms guaranteed	Speed vs breadth trade-off
Scope	Git repositories only	Any documents anywhere	Repository vs filesystem

Why Use Both Together

Optimal Workflow:

GitHub MCP: Discover code patterns and project structure across repositories
mydoc-mcp: Generate documentation templates based on your proven successful approaches
Result: Code structure insights + personalized documentation patterns = faster, better outcomes

Example Combined Usage:

User: "Create API documentation for this new service"

Claude Code Workflow:
1. GitHub MCP → Find similar API implementations across your repos
2. mydoc-mcp → Retrieve your most successful API documentation templates  
3. Generate → New API docs using proven code patterns + your best writing style

When to Choose Which

Use GitHub MCP when:

Discovering code implementations across projects
Understanding repository relationships and history
Finding specific functions or technical implementations
Working within Git-based workflows

Use mydoc-mcp when:

Creating documentation that matches your successful patterns
Learning from your personal document evolution
Optimizing for document retrieval speed and quality
Working with documents outside of Git repositories

Use Both when:

Building comprehensive project documentation
Maintaining consistency across code and documentation
Leveraging both technical and writing pattern intelligence

Key Insight: Complementary, Not Competitive

mydoc-mcp doesn't replace GitHub MCP - it enhances your documentation workflow while GitHub MCP enhances your code discovery workflow. Together, they provide comprehensive historical intelligence for both your technical implementations and your documentation patterns.

🛠️ Usage

Basic Commands

Index a Document

# Through Claude Code
"Index the document at /path/to/document.md"

# Response
{
  "success": true,
  "document_id": "doc_12345",
  "indexed_at": "2025-09-04T15:00:00Z"
}

Search Documents

# Search for documents
"Search for documents about API design"

# Response
{
  "results": [
    {
      "id": "doc_12345",
      "title": "API Design Guidelines",
      "relevance_score": 0.95,
      "snippet": "...REST API design patterns..."
    }
  ],
  "total": 5,
  "search_time_ms": 45
}

Retrieve Document

# Get specific document
"Get the document with ID doc_12345"

# Response
{
  "success": true,
  "content": "# API Design Guidelines\n\n...",
  "metadata": {
    "title": "API Design Guidelines",
    "file_type": "markdown",
    "word_count": 1500
  }
}

Configuration Options

Environment Variables

# Core server settings
TRANSPORT=stdio                              # MCP transport protocol
LOG_LEVEL=INFO                              # DEBUG, INFO, WARNING, ERROR
LOG_FILE=logs/mydocs.log                    # Optional log file path

# Database & storage
DATABASE_URL=sqlite:///data/mydoc.db       # Database connection string
DOCUMENT_ROOT=./data/documents              # Root directory for documents
CACHE_DIRECTORY=./data/cache                # Cache directory for processed files

# Performance tuning
MAX_CONCURRENT_CONNECTIONS=10               # Maximum concurrent MCP connections
REQUEST_TIMEOUT=30.0                        # Request timeout in seconds
RESPONSE_TIMEOUT=30.0                       # Response timeout in seconds
MAX_SEARCH_RESULTS=50                       # Maximum search results returned
DEFAULT_SEARCH_LIMIT=10                     # Default number of search results

# Document processing
MAX_DOCUMENT_SIZE=10485760                  # Max document size (10MB)
SUPPORTED_EXTENSIONS=.md,.txt,.py,.js,.json,.yaml,.html,.css,.sql,.sh,.dockerfile,.env,.log,.csv # Comma-separated file extensions (25+ types supported)

Configuration File (.env)

# Create a .env file in the project root
TRANSPORT=stdio
DATABASE_URL=sqlite:///data/mydoc.db
DOCUMENT_ROOT=/home/user/Documents
LOG_LEVEL=INFO
WATCH_ENABLED=true

📊 Performance Metrics

Operation	Target	Actual	Status
Index Document	< 200ms	45ms avg	✅ PASS
Search Documents	< 200ms	67ms avg	✅ PASS
Get Document	< 200ms	23ms avg	✅ PASS
Bulk Index (10 docs)	< 2s	450ms	✅ PASS

Test Environment: Windows 11, Python 3.11, SQLite, 1000 test documents

🔧 Architecture

System Components

mydoc-mcp/
├── 🚀 MCP Server Core (src/)
│   ├── main.py                    # Entry point & MCP server bootstrap
│   ├── server.py                  # MCP server implementation
│   ├── config.py                  # Configuration management
│   ├── logging_config.py          # Structured logging setup
│   └── tool_registry.py           # MCP tool registration system
├── 🔧 MCP Tools (src/tools/)
│   ├── base.py                    # Abstract tool base class
│   ├── indexDocument.py          # Document indexing tool
│   ├── searchDocuments.py        # Intelligent search tool
│   ├── getDocument.py            # Document retrieval tool
│   └── registration.py           # Tool auto-registration
├── 💾 Storage Layer (src/database/)
│   ├── connection.py             # Async SQLite connection management
│   ├── models.py                 # Database schema & models
│   ├── database_manager.py       # Document CRUD operations
│   ├── queries.py                # Optimized SQL queries
│   └── migrations.py             # Schema migrations
├── 📄 Document Processing (src/parsers/)
│   ├── base.py                   # Abstract parser interface
│   ├── parser_factory.py        # Parser selection & creation
│   ├── markdown_parser.py       # Markdown document parsing
│   ├── text_parser.py           # Plain text parsing
│   └── database_integration.py  # Parser → database integration
└── 👁️ File System Monitoring (src/watcher/)
    ├── file_watcher.py          # File system event monitoring
    ├── event_handler.py         # Document change processing
    └── config.py                # Watcher configuration

Data Flow Architecture

Document Indexing Flow

File Change → File Watcher → Event Handler → Parser Factory → 
Specific Parser → Database Manager → SQLite → Search Index Update

Search Query Flow

MCP Tool Request → Query Validation → Database Manager → 
Optimized SQL Query → Relevance Scoring → Result Ranking → JSON Response

System Integration Flow

Claude Code → MCP Protocol → Tool Registry → Async Tool Execution → 
Storage Layer → Performance Validation → Response (< 200ms)

Key Architectural Decisions

🚀 Performance-First Design

Async/await throughout: All I/O operations are non-blocking
Connection pooling: Efficient database connection management
Optimized queries: Sub-200ms response time guarantee
Smart caching: Result caching with TTL expiration

🔌 Extensible Plugin Architecture

Factory patterns: Easy addition of new parsers and tools
Interface-based design: Clean separation of concerns
Modular components: Independent development and testing
Event-driven updates: Real-time file system monitoring

🐳 Docker Deployment

Quick Start with Docker

# Development mode
docker-compose -f docker-compose.dev.yml up

# Production mode
docker-compose up -d

# View logs
docker-compose logs -f

# Stop server
docker-compose down

Docker Compose Configuration

version: '3.8'
services:
  mydoc-mcp:
    image: mydoc-mcp:latest
    volumes:
      - ./data:/app/data
      - ~/Documents:/app/documents:ro
    environment:
      - DOCUMENT_ROOT=/app/documents
      - LOG_LEVEL=INFO
    restart: unless-stopped

🧪 Testing

Run Tests

# Run all tests
python -m pytest tests/

# Run integration tests
python tests/test_integration.py

# Run performance tests
python tests/test_performance.py

# Validate MCP compliance
python tests/test_mcp_validation.py

Test Coverage

Unit Tests: 72% coverage
Integration Tests: 100% of critical paths
Performance Tests: All operations validated < 200ms
MCP Compliance: A grade (86% validation)

📚 Documentation

User Guides

- Detailed setup instructions
- How to use with Claude Code
- All configuration options

Technical Documentation

- Complete MCP tool documentation
- System design
- Storage structure

Developer Resources

- How to contribute
- Dev environment setup
- Version history

🔍 Troubleshooting

Common Issues

Server won't start

# Check Python version
python --version  # Must be 3.11+

# Verify dependencies
pip list | grep mcp

# Check logs
tail -f logs/mydoc-mcp.log

Documents not indexing

# Check document root
echo $DOCUMENT_ROOT

# Verify permissions
ls -la $DOCUMENT_ROOT

# Force reindex
python -m src.tools.reindex --force

Slow search performance

# Check database size
du -h data/mydoc.db

# Optimize database
python -m src.tools.optimize

# Clear cache
python -m src.tools.clear-cache

Debug Mode

# Enable debug logging
export LOG_LEVEL=DEBUG
python -m src.main

# Or in .env file
LOG_LEVEL=DEBUG
DEBUG_MODE=true

🎯 Roadmap

Phase 1: MVP (Complete)

✅ Core MCP server with stdio transport
✅ Document indexing and storage
✅ Keyword search with ranking
✅ Three core MCP tools
✅ Docker deployment

Phase 2: Enhanced Search (Planned)

🔄 Semantic search with embeddings
🔄 Advanced query syntax
🔄 Search filters and facets
🔄 Search history and suggestions

Phase 3: Advanced Features

📅 PDF and DOCX support
📅 Template generation from patterns
📅 Document clustering
📅 Cross-document insights

Phase 4: Enterprise

📅 Multi-user support
📅 Remote deployment (HTTP+SSE)
📅 Authentication and permissions
📅 Audit logging

🤝 Contributing

We welcome contributions! Please see our for details.

Development Process

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

Code Style

Python 3.11+ type hints
Black formatting
Comprehensive docstrings
80% test coverage minimum

📄 License

MIT License - see for details

🙏 Acknowledgments

Anthropic - For the Model Context Protocol
MCP Community - For inspiration and best practices
Contributors - For making this project better

📞 Support

Getting Help

📖 Check the
🐛 Report issues on GitHub Issues
💬 Join our Discord community

Project Status

Current Version: 1.0.0-beta
Status: Day 2 Complete, Ready for Production Testing
Last Updated: September 4, 2025

Transform your document workflow with intelligent MCP-powered search and retrieval! 🚀

Built with ❤️ for the AI development community