BierschneiderEmanuel/McpDocServer
If you are the rightful owner of McpDocServer and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The MCP VectorStore Server is a Model Context Protocol server designed to facilitate advanced vector store operations for document search, PDF processing, and information retrieval.
MCP VectorStore Server
A Model Context Protocol (MCP) server that provides advanced vector store operations for document search, PDF processing, and information retrieval. This server wraps the functionality from vectorstore.py into a standardized MCP interface.
Features
- Vector Store Operations: Create, search, and manage document vector stores
- PDF Processing: Extract and index content from PDF documents using LLMSherpa
- Semantic Search: Advanced document search using HuggingFace embeddings
- Web Search Integration: Google, Wikipedia, and DuckDuckGo search capabilities
- File Operations: Read and process local files
- Mathematical Calculations: Built-in calculator functionality
Prerequisites
System Requirements
- Python: 3.8 or higher
- Operating System: Linux, macOS, or Windows
- Memory: Minimum 4GB RAM (8GB+ recommended for large document collections)
- Storage: At least 2GB free space for models and vector stores
- Network: Internet connection for downloading models and web searches
Optional GPU Support
For improved performance with large document collections:
- CUDA: 11.8 or higher
- GPU: NVIDIA GPU with 4GB+ VRAM
- cuDNN: Compatible version for your CUDA installation
Installation
Step 1: Clone or Download the Repository
# If you have the files locally, navigate to the directory
cd /path/to/McpDocServer
# Or clone from a repository (if available)
# git clone <repository-url>
# cd McpDocServer
Step 2: Create a Virtual Environment
# Create a virtual environment
python3 -m venv venv
# Activate the virtual environment
# On Linux/macOS:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate
Step 3: Install Dependencies
# Upgrade pip
pip install --upgrade pip
# Install all required packages
pip install -r requirements.txt
Step 4: Install LLMSherpa (Optional but Recommended)
For optimal PDF processing, install LLMSherpa locally:
# Install LLMSherpa
pip install llmsherpa
# Start the LLMSherpa server (in a separate terminal)
llmsherpa --port 5001
Step 5: Download Embedding Models
The server will automatically download the required embedding model on first use, but you can pre-download it:
# Download the embedding model
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-mpnet-base-v2')"
Configuration
Environment Variables
Create a .env file in the project directory:
# LLMSherpa API URL (use local if available, otherwise cloud)
LLMSHERPA_API_URL=http://localhost:5001/api/parseDocument?renderFormat=all
# Vector store directory
VECTORSTORE_DIR=/path/to/your/documents
# User agent for web scraping
USER_AGENT=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
# Optional: CUDA device for GPU acceleration
CUDA_VISIBLE_DEVICES=0
Directory Structure
Prepare your document directory:
your_documents/
├── pdfs/
│ ├── document1.pdf
│ ├── document2.pdf
│ └── ...
├── text_files/
│ ├── notes.txt
│ └── ...
└── other_documents/
└── ...
Usage
Starting the MCP Server
# Make the server executable
chmod +x mcp_vectorstore_server.py
# Start the server on linux
python /home/em/McpDocServer/mcp_vectorstore_server.py
or windows with wsl
wsl -d Ubuntu-24.04 bash -c "/mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh"
Using with MCP Clients
0. Claude Desktop
Add to your MCP configuration:
{
"mcpServers": {
"vectorstore": {
"command": "python",
"args": ["/home/em/McpDocServer/mcp_vectorstore_server.py"],
"env": {
"PYTHONPATH": "/home/em/McpDocServer/McpDocServer"
}
}
}
}
1. GitHub Copilot
- Click on Configure Tools in the GitHub Copilot Chat window:
- Click on Add More Tools in the top search bar.
- Click on Add MCP Server in the top search bar.
- Click on command (stdio) in the top search bar.
- Enter command to run:
- python /home/em/McpDocServer/mcp_vectorstore_server.py
or on windows: wsl -d Ubuntu-24.04 /mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh - Enter mcp server id / name e.g. McpDocServer-19be5552
- Configure settings.json
{
"security.workspace.trust.untrustedFiles": "open",
"python.defaultInterpreterPath": "/mnt/c/Users/emanu/Desktop/LLM/venv/venv/bin/python",
"terminal.integrated.inheritEnv": false,
"git.openRepositoryInParentFolders": "never",
"terminal.integrated.scrollback": 100000,
"mcp": {
"servers": {
"McpDocServer-19be5552": {
"type": "stdio",
"command": "python",
"args": [
"/mnt/c/Users/emanu/Desktop/McpDocServer/mcp_vectorstore_server.py"
]
}
}
}
}
- Check if the following tools are available in the mcp server tool list when you click on Configure Tools in the GitHub Copilot Chat window and scroll to bottom:
vectorstore_search
vectorstore_create
vectorstore_info
vectorstore_clear
read_file
google_search
wikipedia_search
duckduckgo_search
calculate - Select Agent mode in GitHub Copilot Chat window and use vectorstore_search to get information:
use vectorstore_search to get information on unit testing
11)Confirm tool call usage.
2. Continue MCP CLient
name: McpDocServer
version: 1.0.1
schema: v1
mcpServers:
- name: McpDocServer
command: wsl -d Ubuntu-24.04
args:
- "/mnt/c/Users/emanu/Desktop/McpDocServer/start_mcp.sh"
env: {}
mcp_timeout: 180 # set timeout to 180 sec
timeout: 9999
connectionTimeout: 120000 # 120 seconds = 2 minutes
3. Other MCP Clients
Configure your MCP client to use the server:
# Example with a generic MCP client
mcp-client --server python --args /path/to/McpDocServer/mcp_vectorstore_server.py
Available Tools
Vector Store Operations
vectorstore_search
Search the vector store for relevant documents.
Parameters:
query(string, required): Search queryk(integer, optional): Number of results (default: 2)
Example:
{
"name": "vectorstore_search",
"arguments": {
"query": "machine learning algorithms",
"k": 5
}
}
vectorstore_create
Create a new vector store from documents in a directory.
Parameters:
directory_path(string, required): Path to directory containing documents
Example:
{
"name": "vectorstore_create",
"arguments": {
"directory_path": "/home/user/documents/research_papers"
}
}
vectorstore_info
Get information about the current vector store.
Example:
{
"name": "vectorstore_info",
"arguments": {}
}
vectorstore_clear
Clear all documents from the vector store.
Example:
{
"name": "vectorstore_clear",
"arguments": {}
}
File Operations
read_file
Read the contents of a file on the system.
Parameters:
filename(string, required): Path to the file to read
Example:
{
"name": "read_file",
"arguments": {
"filename": "/home/user/documents/notes.txt"
}
}
Web Search Operations
google_search
Search Google for information.
Parameters:
query(string, required): Search querymax_results(integer, optional): Maximum number of results (default: 3)
Example:
{
"name": "google_search",
"arguments": {
"query": "latest AI developments 2024",
"max_results": 5
}
}
wikipedia_search
Search Wikipedia for information.
Parameters:
query(string, required): Search query
Example:
{
"name": "wikipedia_search",
"arguments": {
"query": "artificial intelligence"
}
}
duckduckgo_search
Search DuckDuckGo for information.
Parameters:
query(string, required): Search query
Example:
{
"name": "duckduckgo_search",
"arguments": {
"query": "privacy-focused search engines"
}
}
Utility Operations
calculate
Perform mathematical calculations.
Parameters:
operation(string, required): Mathematical operation to perform
Example:
{
"name": "calculate",
"arguments": {
"operation": "2 + 2 * 3"
}
}
Resources
The server provides the following resources:
vectorstore://info
Returns information about the current vector store in JSON format.
Example Response:
{
"num_documents": 150,
"directory": "/home/user/documents",
"embeddings_model": "sentence-transformers/all-mpnet-base-v2"
}
Troubleshooting
Common Issues
1. Import Errors
Problem: ModuleNotFoundError for various packages
Solution: Ensure all dependencies are installed:
pip install -r requirements.txt
2. CUDA/GPU Issues
Problem: CUDA-related errors Solution: Install CPU-only versions:
pip uninstall faiss-gpu torch
pip install faiss-cpu
3. LLMSherpa Connection Issues
Problem: Cannot connect to LLMSherpa API Solution:
- Start LLMSherpa server:
llmsherpa --port 5001 - Or use cloud API by updating the URL in the code
4. Memory Issues
Problem: Out of memory errors with large documents Solution:
- Reduce chunk size in the text splitter
- Use smaller embedding models
- Process documents in batches
5. Permission Issues
Problem: Cannot read files or directories Solution: Check file permissions:
chmod 644 /path/to/documents/*
chmod 755 /path/to/documents/
Performance Optimization
For Large Document Collections
-
Use GPU acceleration:
# In vectorstore.py, ensure CUDA is enabled model_kwargs={'device': 'cuda'} -
Optimize chunk size:
# Adjust in PDFVectorStoreTool.__init__ chunk_size=1000, # Smaller chunks for better performance chunk_overlap=100, -
Batch processing:
# Process documents in smaller batches batch_size = 10
For Better Search Results
-
Adjust similarity threshold:
# In vectorstore_search method similarity_threshold = 0.7 -
Use different embedding models:
# Try different models for better results model_name="sentence-transformers/all-MiniLM-L6-v2" # Faster model_name="sentence-transformers/all-mpnet-base-v2" # Better quality
Development
Project Structure
McpDocServer/
├── mcp_vectorstore_server.py # Main MCP server
├── vectorstore.py # Original vectorstore implementation
├── requirements.txt # Python dependencies
├── README.md # This documentation
└── .env # Environment variables (create this)
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
Testing
# Run basic functionality tests
python -c "
from mcp_vectorstore_server import *
print('Server imports successfully')
"
# Test vector store operations
python -c "
from vectorstore import PDFVectorStoreTool
tool = PDFVectorStoreTool()
print(f'Vector store initialized with {tool.vectorstore_get_num_items()} documents')
"
License
This project is provided as-is for educational and research purposes. Please ensure you comply with the licenses of all included dependencies.
Support
For issues and questions:
- Check the troubleshooting section above
- Review the error logs
- Ensure all dependencies are correctly installed
- Verify your system meets the requirements
Changelog
Version 1.0.0
- Initial release
- MCP server implementation
- Vector store operations
- Web search integration
- File operations
- Mathematical calculations