a-pogany/librarian-mcp
If you are the rightful owner of librarian-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
Librarian MCP is a documentation search system that provides technical documentation access to both LLMs and humans through an MCP server.
Librarian MCP - Enterprise RAG Documentation Search System
Version: 2.0.3 Status: Production Ready
A production-grade documentation search system that makes technical documentation accessible to LLMs and humans through an MCP (Model Context Protocol) server with enterprise-grade RAG (Retrieval Augmented Generation) capabilities.
Features
✅ Phase 1 (Complete):
- HTTP/SSE MCP server for Claude Desktop/Cline integration
- Keyword-based search with relevance ranking
- Multi-format support (.md, .txt, .docx)
- Real-time file watching with automatic index updates
- Product/component hierarchical organization
✅ Phase 2 (Complete - v2.0.0):
- E5-large-v2 embeddings (1024-dimensional, 30-40% better quality)
- Hierarchical document chunking (512-token chunks, 128-token overlap)
- Persistent vector storage (ChromaDB with optimized HNSW)
- Two-stage reranking (cross-encoder for 2x precision improvement)
- Query embedding cache (5x faster repeated queries)
- BM25 keyword search (probabilistic scoring)
- Reciprocal Rank Fusion (RRF) (hybrid search optimization)
- Semantic + Keyword hybrid search (best of both worlds)
✅ Phase 2.5 (Complete - v2.0.3):
- Reranking Mode (two-stage search: semantic retrieval + keyword refinement)
- Filters semantically similar but contextually irrelevant documents
- Combines semantic (70%) and keyword (30%) scores
- Configurable candidates (default: 50) and threshold (default: 0.1)
- Enhanced Chunking (all file types with semantic/fixed strategies)
- Semantic chunking for Markdown (heading-based on ## and ###)
- Fixed-size chunking for text files (512 tokens, 128 overlap)
- Sentence boundary preservation
- Rich Metadata (tags, doc types, date filtering for temporal queries)
- Tag extraction from YAML frontmatter (list or comma-separated)
- Document type inference (6 types: api, guide, architecture, reference, readme, documentation)
- Temporal filtering with
modified_after/modified_before(ISO 8601)
🔜 Phase 3 (Planned):
- REST API for HTTP access
- React web UI for human users
- Advanced section-level filtering
Quick Start
Prerequisites
- Python 3.10 or higher
- pip
- ~2GB RAM for RAG features
- ~2GB disk for vector database
Installation
- Clone the repository
git clone <repository-url>
cd librarian-mcp
- Create virtual environment
python3 -m venv venv
source venv/bin/activate # On macOS/Linux
# venv\Scripts\activate # On Windows
- Install dependencies
cd backend
pip install -r requirements.txt
Note: First run will download ~1.4GB of models:
- E5-large-v2 embedding model (~1.3GB)
- Cross-encoder reranking model (~80MB)
- Models are cached in
~/.cache/torch/sentence_transformers/
- Create documentation folder
mkdir -p docs/product-name/component-name
- Configure environment (optional)
cp .env.example .env
# Edit .env to customize settings
Running the Server
cd backend
python main.py
The server will start on http://127.0.0.1:3001
Initialization Output:
INFO Embeddings enabled: True
INFO Search mode: hybrid
INFO Loading embedding model: intfloat/e5-large-v2
INFO Model loaded successfully. Embedding dimension: 1024
INFO Reranker model loaded successfully
INFO Hybrid search engine initialized in 'hybrid' mode (RRF)
Documentation Structure
Organize your documentation in this hierarchy:
docs/
├── product-name/ # e.g., symphony, project-x
│ ├── component-name/ # e.g., PAM, auth, database
│ │ ├── file.md
│ │ ├── spec.docx # Large DOCX files (200-600 pages supported)
│ │ └── notes.txt
│ └── architecture/
├── meetings/
│ └── product-name/
└── shared/ # Cross-product docs
Claude Desktop Integration
- Configure Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"doc-search": {
"url": "http://127.0.0.1:3001/mcp"
}
}
}
-
Restart Claude Desktop
-
Verify integration
The following tools should appear in Claude's available tools:
search_documentation- Hybrid search (semantic + keyword)get_document- Retrieve full document contentlist_products- List all productslist_components- List components for a productget_index_status- Get indexing statistics
Usage Examples
Basic Search
You (to Claude): "How do I implement OAuth2 authentication?"
Claude will use semantic search to understand intent:
- Finds "OAuth implementation guide" even without exact keywords
- Returns relevant sections from 200-page DOCX files
- Understands related concepts (SSO, tokens, authorization flows)
Hybrid Search (Best Results)
You: "Search for Python machine learning libraries"
Hybrid search combines:
- Keyword matching: exact terms "Python" and "machine learning"
- Semantic understanding: related concepts (NumPy, Pandas, scikit-learn)
- RRF fusion: optimal ranking from both engines
Get Specific Document
You: "Show me the OAuth spec from symphony/PAM"
Claude will use get_document:
{
"path": "symphony/PAM/oauth-spec.md"
}
List Available Products
You: "What products do we have documentation for?"
Claude will use list_products
Metadata Filtering (v2.0.3)
You: "Find API documentation about authentication modified in the last 30 days"
Claude will search with metadata filters:
{
"query": "authentication",
"doc_type": "api",
"modified_after": "2024-11-04"
}
Returns only API docs tagged with authentication from the last month
Tag-Based Search (v2.0.3)
You: "Show me all security-related guides"
Claude will search with tag filter:
{
"query": "security",
"tags": ["security", "auth", "encryption"]
}
Matches documents with YAML frontmatter:
---
tags: [security, best-practices]
---
Temporal Queries (v2.0.3)
You: "What changed in the architecture docs this week?"
Claude will filter by date range:
{
"query": "architecture",
"doc_type": "architecture",
"modified_after": "2024-11-27"
}
Architecture
Search Pipeline (Hybrid Mode)
User Query
↓
Parallel Retrieval:
├─ Keyword Engine → BM25 scoring → 30 results
└─ Semantic Engine:
├─ Query Embedding (e5-large-v2, cached)
├─ Vector Search (ChromaDB) → 50 candidates
└─ Cross-Encoder Rerank → 30 results
↓
RRF Fusion: Combine 30 + 30 → top 10
↓
Return Results (150-200ms latency)
Document Indexing Pipeline
DOCX File (300 pages)
↓
Parser → Enhanced Metadata (sections, headings, tables)
↓
Hierarchical Chunker → 200 chunks (512 tokens, 128 overlap)
↓
Embedding Generator (e5-large-v2, 1024d)
↓
Batch Insert → Persistent Vector DB (ChromaDB)
↓
Indexed: 200 chunks × 1024d embeddings
Configuration
config.json (Production Settings)
{
"system": {
"version": "2.0.0"
},
"search": {
"mode": "hybrid",
"use_reranking": true,
"reranker_model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
"use_rrf": true
},
"embeddings": {
"enabled": true,
"model": "intfloat/e5-large-v2",
"dimension": 1024,
"persist_directory": "./vector_db",
"chunk_size": 512,
"chunk_overlap": 128
},
"chunking": {
"strategy": "hierarchical",
"respect_boundaries": true,
"preserve_tables": true
},
"cache": {
"query_embedding_cache_size": 10000
},
"mcp": {
"host": "127.0.0.1",
"port": 3001
}
}
Search Mode Options
Hybrid Mode (Default - Best Results):
{"search": {"mode": "hybrid", "use_rrf": true}}
- Combines keyword and semantic search
- RRF fusion for optimal ranking
- Best precision and recall
Semantic-Only Mode (Context-Aware):
{"search": {"mode": "semantic"}}
- Vector similarity search only
- Best for conceptual queries
- Understands intent and context
Keyword-Only Mode (Fastest):
{"search": {"mode": "keyword"}}
- Traditional keyword matching
- Sub-millisecond queries
- Good for exact term searches
Performance
Expected Performance (v2.0.0)
Relevance Metrics:
- Precision@10: ~85% (was ~40% in v1.0)
- Recall@10: ~75% (was ~30% in v1.0)
- Document Coverage: 100% (was 0.5% with truncation)
Speed Metrics:
- Cold query: 150-200ms (first query with model loading)
- Warm query: 150-200ms (semantic + reranking + RRF)
- Cached query: 10-20ms (5x speedup)
- Keyword-only: <1ms
Indexing Performance:
- Initial indexing: ~30 sec for 500 docs (includes model download)
- Re-indexing: ~5 sec for 500 docs (models cached)
- Large DOCX: ~1 sec for 300-page document
Scale Metrics:
- Documents: 1,000-10,000 (tested up to 1,000)
- Disk usage: ~1-2GB for 1,000 large docs
- RAM usage: ~2GB (models + working memory)
Development
Running Tests
cd backend
# Run all tests
pytest tests/ -v
# Run RAG feature tests
python test_rag_features.py
# Run search pipeline tests
python test_search_pipelines.py
# With coverage
pytest tests/ --cov=core --cov=mcp --cov-report=html
Project Structure
librarian-mcp/
├── backend/
│ ├── core/ # Core components
│ │ ├── parsers.py # File parsers (MD, TXT, DOCX)
│ │ ├── indexer.py # Document indexing
│ │ ├── search.py # Keyword search engine
│ │ ├── embeddings.py # E5-large-v2 embeddings
│ │ ├── vector_db.py # ChromaDB wrapper
│ │ ├── chunking.py # Hierarchical chunking
│ │ ├── reranker.py # Cross-encoder reranking
│ │ ├── cache.py # Query embedding cache
│ │ ├── bm25_search.py # BM25 keyword search
│ │ ├── semantic_search.py # Semantic search engine
│ │ └── hybrid_search.py # Hybrid search (RRF)
│ ├── mcp/ # MCP server
│ │ └── tools.py # MCP tool definitions
│ ├── config/ # Configuration
│ │ └── settings.py # Config management
│ ├── tests/ # Unit tests
│ ├── main.py # Server entry point
│ └── requirements.txt # Dependencies
├── docs/ # Documentation files
├── config.json # Configuration file
├── CLAUDE.md # Developer guide
├── QUICKSTART.md # 5-minute setup
├── IMPLEMENTATION_COMPLETE.md # v2.0 implementation details
└── README.md # This file
MCP Tools Reference
search_documentation
Search across all documentation using hybrid search (semantic + keyword).
Parameters:
query(str): Search query (natural language supported)product(str, optional): Filter by productcomponent(str, optional): Filter by componentfile_types(list, optional): Filter by file extensionsmax_results(int, default=10): Maximum resultsmode(str, optional): Override search mode (keyword/semantic/hybrid)
Returns:
results: List of matching documents with relevance scorestotal: Number of resultsquery: Search query usedsearch_mode: Mode used (keyword/semantic/hybrid_rrf)
Example:
{
"query": "How to implement authentication",
"product": "symphony",
"max_results": 5
}
get_document
Retrieve full content of a specific document.
Parameters:
path(str): Relative path from docs rootsection(str, optional): Extract specific section by heading
Returns:
content: Full document contentheadings: List of headingsmetadata: Document metadata (sections, pages, tables)
list_products
List all available products.
Returns:
products: List of products with component countstotal: Number of products
list_components
List components for a specific product.
Parameters:
product(str): Product name
Returns:
components: List of components with document countstotal: Number of components
get_index_status
Get current indexing status and statistics.
Returns:
status: Index statustotal_documents: Number of indexed documentstotal_chunks: Number of indexed chunks (with RAG)products: Number of productsembedding_model: Current embedding modelsearch_mode: Current search modelast_indexed: Last index update time
Troubleshooting
Server won't start
Check port availability:
lsof -i :3001
Check configuration:
cat config.json
# Verify docs.root_path exists
Model download issues
First run downloads ~1.4GB of models:
# Manual download test
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('intfloat/e5-large-v2')"
Check disk space:
df -h ~/.cache/torch
# Need ~2GB free space
No documents indexed
Check documentation path:
ls -la docs/
Check file permissions:
chmod -R 755 docs/
Claude Desktop not connecting
Verify server is running:
curl http://127.0.0.1:3001/health
Check Claude Desktop config:
cat ~/Library/Application\ Support/Claude/claude_desktop_config.json
Restart Claude Desktop
Search returns no results
Check index status:
curl http://127.0.0.1:3001/health
# Or use get_index_status tool in Claude
Verify embeddings are enabled:
grep -A5 "embeddings" config.json
# Should show "enabled": true
Check vector database:
ls -la vector_db/
# Should contain chroma.sqlite3 and other files
Migration from v1.0
Automatic Migration
No action required. System will:
- Download models on first startup (~1.4GB, one-time)
- Re-index existing documents with new chunking
- Generate embeddings for all chunks
- Use hybrid search by default
Keep v1.0 Behavior
To disable RAG and keep keyword-only search:
{
"search": { "mode": "keyword" },
"embeddings": { "enabled": false }
}
Clear Old Data
If upgrading from v2.0 beta:
rm -rf ./vector_db
# Restart server to rebuild with new settings
Documentation
- QUICKSTART.md - 5-minute setup guide
- CLAUDE.md - Developer guide for Claude Code
- IMPLEMENTATION_COMPLETE.md - v2.0.0 implementation details
- COMPREHENSIVE_TEST_REPORT.md - Test results and validation
- INDEXING_GUIDE.md - Document indexing documentation
- ENTERPRISE_RAG_ROADMAP.md - RAG enhancement roadmap
Changelog
v2.0.0 (December 3, 2025) - Enterprise RAG Release
New Features:
- E5-large-v2 embeddings (1024d, 30-40% better quality)
- Hierarchical document chunking (512-token, 128-overlap)
- Persistent vector storage (ChromaDB with optimization)
- Two-stage reranking (cross-encoder, 2x precision improvement)
- Query embedding cache (5x faster repeated queries)
- BM25 keyword search (probabilistic scoring)
- Reciprocal Rank Fusion (RRF hybrid search)
Performance Improvements:
- 100% document coverage (was 0.5% with truncation)
- Precision@10: ~85% (was ~40%)
- Recall@10: ~75% (was ~30%)
- Scales to 10,000+ documents
Breaking Changes:
- None - fully backward compatible
v1.0.0 (November 2024) - Initial Release
- HTTP/SSE MCP server
- Keyword-based search
- Multi-format support (.md, .txt, .docx)
- Real-time file watching
- Product/component organization
License
[Your License]
Contributing
[Contributing guidelines]
Support
For issues and questions:
- GitHub Issues: https://github.com/anthropics/librarian-mcp/issues
- Documentation: See CLAUDE.md and QUICKSTART.md
- Test Reports: See COMPREHENSIVE_TEST_REPORT.md
Built with:
- FastAPI + FastMCP (HTTP/SSE MCP server)
- sentence-transformers (E5-large-v2 embeddings + cross-encoders)
- ChromaDB (persistent vector database)
- rank-bm25 (BM25Okapi keyword search)
- python-docx (DOCX parsing)
- watchdog (file monitoring)