shenyimings/DocNav-MCP
If you are the rightful owner of DocNav-MCP and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
DocNav MCP Server is a Model Context Protocol server designed to enhance the capabilities of LLMs in processing and navigating long-form documents.
load_document
Load and process a document for navigation.
get_outline
Get the document outline/table of contents.
get_section
Retrieve content from a specific document section.
search_document
Search for content within a document.
DocNav MCP Server
DocNav is a Model Context Protocol (MCP) server which empowers LLM Agents to read, analyze, and manage lengthy documents intelligently, mimicking human-like comprehension and navigation capabilities.
Features
- Document Navigation: Navigate through document sections, headings, and content structure
- Content Extraction: Extract and summarize specific document sections
- Search & Query: Find specific content within documents using intelligent search
- Multi-format Support: Currently supports Markdown (.md) files, with planned support for PDF and other formats
- MCP Integration: Seamless integration with MCP-compatible LLMs and applications
Architecture
DocNav follows a modular, extensible architecture:
- Core MCP Server: Main server implementation using the MCP protocol
- Document Processors: Pluggable processors for different file types
- Navigation Engine: Handles document structure analysis and navigation
- Content Extractors: Extract and format content from documents
- Search Engine: Provides search and query capabilities across documents
Installation
Prerequisites
- Python 3.10+
- uv package manager
Setup
- Clone the repository:
git clone https://github.com/shenyimings/DocNav-MCP.git
cd DocNav-MCP
- Install dependencies:
uv sync
Usage
Starting the MCP Server
uv run server.py
Connect to the MCP server
{
"mcpServers": {
"docnav": {
"command": "{{PATH_TO_UV}}", // Run `which uv` and place the output here
"args": [
"--directory",
"{{PATH_TO_SRC}}",
"run",
"server.py"
]
}
}
}
Available Tools
-
load_document
: Load a document for navigation and analysis- Args:
file_path
(path to document file) - Returns: Success message with auto-generated document ID
- Args:
-
get_outline
: Get document outline/table of contents- Args:
doc_id
(document identifier),max_depth
(max heading depth, default 3) - Returns: Formatted document outline
- Tip: Use first after loading a document to understand structure
- Args:
-
read_section
: Read content of a specific document section- Args:
doc_id
(document identifier),section_id
(e.g., 'h1_0', 'h2_1') - Returns: Section content with subsections
- Args:
-
search_document
: Search for specific content within a document- Args:
doc_id
(document identifier),query
(search term or phrase) - Returns: Formatted search results with context
- Args:
-
navigate_section
: Get navigation context for a section- Args:
doc_id
(document identifier),section_id
(section to navigate to) - Returns: Navigation context with parent, siblings, children
- Args:
-
list_documents
: List all currently loaded documents- Returns: List of loaded documents with metadata
-
get_document_stats
: Get statistics about a loaded document- Args:
doc_id
(document identifier) - Returns: Document statistics and structure info
- Args:
-
remove_document
: Remove a document from the navigator- Args:
doc_id
(document identifier) - Returns: Success or error message
- Args:
Example Usage
# Load a document
result = await tools.load_document("path/to/document.md")
# Get document outline
outline = await tools.get_outline(doc_id)
# Get specific section content
section = await tools.read_section(doc_id, section_id)
# Search within document
results = await tools.search_document(doc_id, "search query")
Development
Project Structure
docnav-mcp/
--- server.py # Main MCP server
--- docnav/
------- __init__.py # Package initialization
------- models.py # Data models
------- navigator.py # Document navigation engine
------- processors/
------- __init__.py # Processor package
------- base.py # Base processor interface
------- markdown.py # Markdown processor
--- tests/
------- ... # Test files
Development Guidelines
See for detailed development guidelines including:
- Code quality standards
- Testing requirements
- Package management with uv
- Formatting and linting rules
Adding New Document Processors
- Create a new processor class inheriting from
BaseProcessor
- Implement the required methods:
can_process
,process
,extract_section
,search
- Register the processor in the
DocumentNavigator
- Add comprehensive tests
Running Tests
# Run all tests
uv run tests/run_tests.py
Code Quality
# Format code
uv run --frozen ruff format .
# Check linting
uv run --frozen ruff check .
# Type checking
uv run --frozen pyright
Roadmap
- Complete Markdown processor implementation
- Add PDF document support (PyMuPDF)
- Improve test coverage and quality
- Implement advanced search capabilities
- Add document summarization features
- Support for additional document formats (DOCX, TXT, etc.)
- Performance optimizations for large documents
- Caching mechanisms for frequently accessed documents
- Add persistent storage for loaded documents
Contributing
- Fork the repository
- Create a feature branch
- Follow the development guidelines in CLAUDE.md
- Add tests for new functionality
- Submit a pull request
License
This project is licensed under the Apache-2.0 License - see the LICENSE file for details.
Support
For issues and questions:
- Open an issue on GitHub
- Check the documentation in CLAUDE.md
- Review existing issues and discussions