shaifulshabuj/simple-document-mcp-server
If you are the rightful owner of simple-document-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A minimal MCP server for document processing and search with multi-language support.
Simple Document MCP Server
A minimal MCP (Model Context Protocol) server for document processing and search with multi-language support.
๐ฏ Features
- Multi-format Support: PDF, DOCX, XLSX, and TXT files
- Multi-language Support: English, Japanese, Bangla (Bengali), and more
- Full-text Search: Search across all indexed documents with context
- Document Metadata: Extract file type, language, size, and modification date
- Interactive Client: Easy-to-use command-line interface
- Flexible Directory: Custom documents directory via command-line arguments
- Configurable Logging: Adjustable log levels (DEBUG, INFO, WARNING, ERROR)
- Error Handling: Robust error handling and logging
- Auto-create Directories: Automatically creates missing document directories
๐ Quick Start
Option 1: Automated Setup
# Run the setup script
./setup.sh
# Activate the virtual environment
source venv/bin/activate
Option 2: Manual Setup
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create documents directory
mkdir -p documents
๐ Usage
Starting the Server
# Terminal 1: Start the MCP server
python simple_mcp_server.py # Use default ./documents directory
python simple_mcp_server.py --dir /path/docs # Use custom directory
python simple_mcp_server.py -d ~/Documents # Use home Documents folder
python simple_mcp_server.py --log-level DEBUG # Enable debug logging
Using the Client
# Terminal 2: Start the interactive client
python simple_client.py # Use default settings
python simple_client.py --dir /path/docs # Use custom documents directory
python simple_client.py demo # Run demo mode
python simple_client.py demo --dir ~/docs # Run demo with custom directory
python simple_client.py --log-level DEBUG # Enable debug logging
Available Commands
Command | Description | Example |
---|---|---|
scan | Scan and index all documents | scan |
search <query> | Search for text in documents | search machine learning |
list | List all processed documents | list |
stats | Show document collection statistics | stats |
content <filename> | Get full content of a document | content sample.txt |
tools | Show available MCP tools | tools |
quit | Exit the client | quit |
๐ Directory Structure
docmcp/
โโโ simple_mcp_server.py # Main MCP server
โโโ simple_client.py # Interactive client
โโโ requirements.txt # Python dependencies
โโโ setup.sh # Automated setup script
โโโ README.md # This file
โโโ documents/ # Document storage
โโโ english/ # English documents
โโโ japanese/ # Japanese documents
โโโ bangla/ # Bangla documents
๐ง MCP Tools
The server provides 5 MCP tools:
- scan_documents: Index all documents in the documents directory
- search_documents: Search for text with configurable result limits
- list_documents: List all processed documents with metadata
- get_document_stats: Get collection statistics (size, languages, types)
- get_document_content: Retrieve full content of a specific document
๐ Language Support
The server automatically detects document language using langdetect
. Supported languages include:
- English (en)
- Japanese (ja)
- Bangla/Bengali (bn)
- And many more (any language supported by langdetect)
๐ Supported File Types
Extension | Type | Library Used |
---|---|---|
.pdf | PDF Documents | PyPDF2 |
.docx | Word Documents | python-docx |
.xlsx | Excel Spreadsheets | openpyxl |
.txt | Text Files | Built-in (multi-encoding) |
๐ Search Features
- Full-text search across all document content
- Context highlighting around matches
- Multiple matches per document with position tracking
- Result limiting to prevent overwhelming output
- Case-insensitive search
๐ ๏ธ Development
Adding New File Types
To add support for new file types, extend the SimpleDocumentProcessor
class:
def extract_text_from_newtype(self, file_path: Path) -> str:
# Your extraction logic here
pass
# Add to the extractors dictionary in process_document()
extractors = {
'.newext': (self.extract_text_from_newtype, "New Type"),
# ... existing extractors
}
Customizing Search
The search functionality can be enhanced by modifying the search_documents
method:
def search_documents(self, query: str, max_results: int = 50) -> List[Dict[str, Any]]:
# Add regex support, fuzzy matching, etc.
pass
๐ Error Handling
The server includes comprehensive error handling:
- File reading errors: Gracefully handles corrupted or unreadable files
- Encoding issues: Tries multiple encodings for text files
- Missing dependencies: Clear error messages for missing libraries
- Server errors: JSON error responses for client handling
๐ Example Output
Server Help
$ python simple_mcp_server.py --help
usage: simple_mcp_server.py [-h] [--dir DIR] [--log-level {DEBUG,INFO,WARNING,ERROR}] [--version]
Simple Document MCP Server
options:
-h, --help show this help message and exit
--dir DIR, -d DIR Directory containing documents to process (default: ./documents)
--log-level {DEBUG,INFO,WARNING,ERROR}
Set logging level (default: INFO)
--version show program's version number and exit
Client Help
$ python simple_client.py --help
usage: simple_client.py [-h] [--dir DIR] [--log-level {DEBUG,INFO,WARNING,ERROR}] [{interactive,demo}]
Simple Document MCP Client
positional arguments:
{interactive,demo} Run in interactive mode or demo mode (default: interactive)
options:
-h, --help show this help message and exit
--dir DIR, -d DIR Directory containing documents to process (uses server default if not specified)
--log-level {DEBUG,INFO,WARNING,ERROR}
Set server logging level (default: INFO)
Document Scan with Custom Directory
๐ฌ Running Demo Commands...
๐ Documents directory: /custom/path/docs
โ
Scanned and processed 3 documents
๐ Documents (3):
1. sample.txt (Text File, en)
Path: /custom/path/docs/sample.txt
Size: 1.2 KB
Preview: This is a sample English document for testing...
2. sample.txt (Text File, ja)
Path: /custom/path/docs/sample.txt
Size: 0.8 KB
Preview: ใใใฏๆฅๆฌ่ชใฎใตใณใใซๆๆธใงใ...
Search Results
๐ฏ Search Results for 'processing' (2 matches):
1. sample.txt (Text File, en)
Path: documents/english/sample.txt
Position: 156
Context: ...document **processing** system supports...
2. readme.txt (Text File, en)
Path: documents/english/readme.txt
Position: 89
Context: ...server can **processing**: - PDF files...
๐จโ๐ป Author
Shaiful Islam Shabuj
- GitHub: @shaifulshabuj
- Repository: simple-document-mcp-server
๐ค Contributing
- Fork the repository
- Create a feature branch
- Add your improvements
- Test with the provided client
- Submit a pull request
๐ License
This project is licensed under the MIT License - see the file for details.
Copyright (c) 2024 Shaiful Islam Shabuj
๐ Troubleshooting
Common Issues
"Missing required dependency"
pip install -r requirements.txt
"Server not found"
- Make sure you're in the correct directory
- Check that
simple_mcp_server.py
exists - Verify Python path in the client
"No documents found"
- Check that documents exist in the
documents/
directory - Run the
scan
command first - Verify file permissions
"Language detection failed"
- Document might be too short for reliable detection
- Try with longer text content
- Check for non-text content in files
Getting Help
- Check the server logs for detailed error messages
- Run the demo client:
python simple_client.py demo
- Verify your setup with the sample documents
- Check file permissions and encoding issues