buzzwomen-dev2/docs-mcp-server
If you are the rightful owner of docs-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The MCP Server provides local documentation for Django, DRF, and Psycopg, facilitating easy access and search capabilities for developers.
Docs MCP Server - Hybrid Search for Technical Documentation
A high-performance Model Context Protocol (MCP) server that provides hybrid search capabilities combining:
- Semantic search using FastEmbed (BAAI/bge-small-en-v1.5) + Qdrant vector database
- Keyword search using Elasticsearch BM25 with inverted index
- Weighted fusion (0.4 BM25 + 0.6 semantic by default)
Designed for local developer documentation search across Django, Django REST Framework, Psycopg, and other technical docs.
Prerequisites
This server requires Elasticsearch and Qdrant to be running. The easiest way is using Docker:
Option 1: Using Docker Compose (Recommended)
# Start services (Elasticsearch + Qdrant)
./start-services.sh
# Stop services
./stop-services.sh
The script will:
- Start Elasticsearch on port 9200
- Start Qdrant on port 6333
- Wait for both services to be healthy
- Persist data in Docker volumes
Option 2: Manual Docker Commands
# Elasticsearch
docker run -d \
--name docs-elasticsearch \
-p 9200:9200 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
elasticsearch:8.11.0
# Qdrant
docker run -d \
--name docs-qdrant \
-p 6333:6333 \
-p 6334:6334 \
qdrant/qdrant
Option 3: Native Installation
- Elasticsearch: https://www.elastic.co/downloads/elasticsearch
- Qdrant: https://qdrant.tech/documentation/quick-start/
Features
- 🔍 Hybrid Search: Combines semantic understanding with keyword precision
- 🏷️ Rich Metadata: Auto-tags documents by tech, component, topic, version
- 📊 Structured Results: Pydantic models with individual and combined scores
- 🚀 Dual Interface: MCP tools + REST API
- 💾 Persistent Index: Elasticsearch + Qdrant with disk-based metadata cache
- 📝 Smart Chunking: Semantic for prose (~100 tokens), structural for code (80-150 lines)
- 🔒 Localhost-only: Secure by default, no external access
- ⚡ Fast Embeddings: CPU-optimized FastEmbed (no GPU required)
Quick Start
1. Install Dependencies
# Using uv (recommended)
uv sync
# Or using pip
pip install -e .
2. Build the Search Index
First time setup - index your documentation:
# Use the build script (recommended)
uv run scripts/build_index.py
# Or index programmatically
uv run python -c "
import asyncio
from pathlib import Path
from search_engine import IndexManager
async def main():
manager = IndexManager()
stats = await manager.index_documents([Path('docs')])
print(f'Indexed {stats["chunks_created"]} chunks from {stats["files_processed"]} files')
asyncio.run(main())
"
Or use the MCP tool after starting the server (see below).
3. Run the Server
As MCP Server (stdio)
uv run server.py
Then configure in your MCP client (Claude Desktop, VSCode, etc):
{
"mcpServers": {
"docs-server": {
"command": "uv",
"args": ["run", "server.py"],
"cwd": "/path/to/docs-mcp-server"
}
}
}
As HTTP API Server
uv run server.py --http
Access at:
- API:
http://localhost:8000 - Docs:
http://localhost:8000/docs
4. Validate Setup
# Run validation script
uv run scripts/test_setup.py
5. Test with MCP Inspector
# In one terminal: start server
uv run server.py
# In another terminal: start inspector
npx @modelcontextprotocol/inspector uv run server.py
Navigate to http://localhost:5173 and test the tools.
MCP Features
This server implements the full MCP specification with Tools, Resources, and Prompts.
Tools
search
Hybrid search with semantic + keyword matching.
Parameters:
query(required): Search query texttech(optional): Filter by technology (django/drf/psycopg)component(optional): Filter by component/moduletop_k(default 10): Number of results
Example:
search(
query="How does Django auth middleware work?",
tech="django",
top_k=5
)
Returns: List of SearchResultModel with:
chunk_id: Unique identifiercontent: Chunk text (truncated to 500 chars)source_path: Original file pathtech,component,topic,version,file_type: Metadatabm25_score,semantic_score,hybrid_score: Individual and combined scores
retrieve
Fetch a specific chunk by ID.
Parameters:
chunk_id(required): From search results
Returns: Full SearchResultModel with complete content.
reindex_docs
Build or rebuild the search index. Uses checksums to skip unchanged files.
Parameters:
paths(optional): Specific file/directory paths to indextech(optional): Index specific technology (django/drf/psycopg)
Examples:
# Reindex all docs (incremental - skips unchanged)
reindex_docs()
# Index only Django docs
reindex_docs(tech="django")
# Index specific paths
reindex_docs(paths=["docs/django-6.0/ref"])
Returns: Statistics with files_processed, chunks_created, errors.
Note: This performs incremental updates by default. To force a complete rebuild, use the build script with --force flag.
get_index_stats
Get index statistics and health.
Returns: IndexStats with chunk counts, model info, technologies, etc.
list_sources
List all indexed source files.
Parameters:
tech(optional): Filter by technology
Returns: List of SourceInfo with file metadata and chunk counts.
search_docs (deprecated)
Legacy keyword-only search. Use search instead for better results.
list_docs
Show directory structure for a technology.
list_technologies
List available documentation sets.
Resources
MCP resources provide direct access to documentation for use as context:
doc://{tech}/{path}
Access raw documentation files:
doc://django/ref/models/fields.md
doc://drf/api-guide/serializers.md
chunk://{chunk_id}
Access indexed chunks from search results:
chunk://middleware_3_1234
Usage in VS Code: Select Add Context > MCP Resources in Chat view.
Prompts
Preconfigured search prompts available as slash commands:
/mcp.docsSearch.find_authentication_docs- Authentication & authorization docs/mcp.docsSearch.find_database_docs- Database & ORM documentation/mcp.docsSearch.find_api_docs- API & serialization docs
Each prompt accepts a tech parameter (django/drf/psycopg).
Usage in VS Code: Type / in chat input to see available MCP prompts.
REST API Endpoints
All endpoints bind to 127.0.0.1:8000 (localhost only).
POST /search
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "Django middleware authentication",
"tech": "django",
"top_k": 10
}'
GET /retrieve/{chunk_id}
curl http://localhost:8000/retrieve/middleware_3_1234
GET /status
curl http://localhost:8000/status
GET /sources?tech=django
curl http://localhost:8000/sources?tech=django
POST /reindex
curl -X POST http://localhost:8000/reindex \
-H "Content-Type: application/json" \
-d '{"tech": "django"}'
Configuration
Environment Variables
All configuration is managed via environment variables. Set these before running the server or build script.
Core Settings
| Variable | Default | Description |
|---|---|---|
INDEX_DIR | .index | Directory for metadata cache (checksums, pickled data) |
EMBEDDING_MODEL | BAAI/bge-small-en-v1.5 | FastEmbed model for semantic embeddings |
CHUNK_SIZE_TOKENS | 100 | Target chunk size in tokens for semantic splitting |
Search Weights
| Variable | Default | Description |
|---|---|---|
BM25_WEIGHT | 0.4 | Weight for BM25 keyword scores (0.0-1.0) |
SEMANTIC_WEIGHT | 0.6 | Weight for semantic similarity scores (0.0-1.0) |
Note: Weights should sum to 1.0. Increase BM25 for keyword-heavy queries, increase semantic for concept matching.
Elasticsearch Configuration
| Variable | Default | Description |
|---|---|---|
ES_HOST | http://localhost:9200 | Elasticsearch server URL |
ES_INDEX_NAME | docs_index | Index name for BM25 documents |
Qdrant Configuration
| Variable | Default | Description |
|---|---|---|
QDRANT_HOST | localhost | Qdrant server host |
QDRANT_PORT | 6333 | Qdrant gRPC port |
QDRANT_COLLECTION | docs_collection | Collection name for vector embeddings |
Redis Configuration (Prepared, Not Yet Used)
| Variable | Default | Description |
|---|---|---|
REDIS_HOST | localhost | Redis server host (for future caching) |
REDIS_PORT | 6379 | Redis server port |
REDIS_ENABLED | false | Enable Redis caching (currently unused) |
Build Script Flags
The scripts/build_index.py script supports the following command-line flags:
Index Building
# Index all documentation
uv run scripts/build_index.py
# Index specific technology
uv run scripts/build_index.py --tech django
uv run scripts/build_index.py --tech drf
uv run scripts/build_index.py --tech psycopg
# Force reindex (ignore checksums, rebuild everything)
uv run scripts/build_index.py --force
# Index specific paths
uv run scripts/build_index.py --paths docs/django-6.0/ref docs/django-6.0/topics
Clear Index
# Clear ALL documentation from index
uv run scripts/build_index.py --clear
# Clear specific technology only (keeps others intact)
uv run scripts/build_index.py --tech django --clear
uv run scripts/build_index.py --tech drf --clear
Warning:
--clearwithout--techwill delete ALL indexed documentation. Use--techflag to selectively clear.
Combined Operations
# Clear Django docs and reindex them
uv run scripts/build_index.py --tech django --clear
uv run scripts/build_index.py --tech django
# Force rebuild specific technology
uv run scripts/build_index.py --tech drf --force
Configuration Examples
High semantic weight for conceptual queries:
export BM25_WEIGHT=0.3
export SEMANTIC_WEIGHT=0.7
uv run server.py
High BM25 weight for exact keyword matching:
export BM25_WEIGHT=0.7
export SEMANTIC_WEIGHT=0.3
uv run server.py
Custom embedding model:
export EMBEDDING_MODEL="sentence-transformers/all-mpnet-base-v2"
uv run server.py
Larger chunks for more context:
export CHUNK_SIZE_TOKENS=200
uv run scripts/build_index.py --force # Rebuild with new chunk size
Custom Elasticsearch instance:
export ES_HOST="http://192.168.1.100:9200"
export ES_INDEX_NAME="my_docs"
python server.py
Example Queries
Tune your hybrid weights based on these common developer queries:
"How does Django auth middleware work?"
"What is the DRF request lifecycle?"
"Configure Nuxt SSR"
"Redis cache invalidation flow"
"psycopg transaction patterns"
"Django model field options"
"DRF serializer validation"
Architecture
┌─────────────────┐
│ MCP Client │ (Claude, VSCode, etc)
│ or HTTP Client │
└────────┬────────┘
│
v
┌─────────────────┐
│ server.py │ FastMCP + FastAPI
│ (Tools/API) │ • Lifespan context manager
│ │ • Context[ServerSession, AppContext]
│ │ • Structured Pydantic outputs
└────────┬────────┘
│
v
┌────────────────────────────────────────┐
│ search_engine.py │
│ (IndexManager) │
│ ┌──────────────┐ ┌───────────────┐ │
│ │ Elasticsearch │ │ Qdrant │ │
│ │ BM25 Index │ │ Vector Index │ │
│ │ (Inverted) │ │ (HNSW) │ │
│ └──────────────┘ └───────────────┘ │
│ │ │ │
│ v v │
│ ┌────────────────────────┐ │
│ │ FastEmbed (CPU) │ │
│ │ bge-small-en-v1.5 │ │
│ │ 384-dim vectors │ │
│ └────────────────────────┘ │
└────────────────────────────────────────┘
│
v
┌─────────┐
│ .index/ │ (Metadata cache)
└─────────┘
Implementation Notes:
- Lifespan Management: IndexManager initialized once via
@asynccontextmanagerand injected into tool context - Context Pattern: Tools receive
Context[ServerSession, AppContext]for accessing shared state - Structured Output: All results returned as validated Pydantic models
- Security: Localhost-only binding, query logging for monitoring
Document Sources
Current documentation sets in docs/:
- Django 6.0 (~300 files): Models, views, ORM, middleware, admin
- Django REST Framework 3.16.1 (~30 files): Serializers, views, authentication
- Psycopg 3.3.1 (~20 files): PostgreSQL adapter, connection pooling
Supports: .md, .txt, .rst, .py, .json, .yaml, .yml, .html
Performance
Initial Indexing
- 350-400 files → 2-5 minutes
- Generates embeddings (384-dim FastEmbed with CPU optimization)
- Builds Elasticsearch BM25 inverted index + Qdrant HNSW vector index
- Memory usage: ~500MB-1GB peak during embedding generation
- Disk usage: ~50-100MB for
.index/cache + Qdrant vectors
Incremental Updates
The system uses checksums to detect file changes and skip reindexing unchanged files:
# First run: indexes all files
uv run scripts/build_index.py
# Processing 350 files...
# Second run: skips unchanged files
uv run scripts/build_index.py
# Skipped 350 files (unchanged)
# After modifying one file
uv run scripts/build_index.py
# Skipped 349 files (unchanged)
# Processing 1 file...
Checksum behavior:
- Saved incrementally every 10 files to prevent loss on crashes
- Stored in
.index/checksums.json - Use
--forceflag to bypass checksum validation and rebuild everything
Search Latency
- Semantic search (Qdrant): ~50-100ms (HNSW index lookup)
- BM25 search (Elasticsearch): ~10-20ms (inverted index)
- Hybrid fusion: ~5ms (score normalization and combination)
- Total per query: ~100-150ms
Optimizations Implemented
-
Batched Processing:
- Embeddings: 50 chunks per batch (prevents OOM)
- Qdrant uploads: 100 points per batch
- Elasticsearch indexing: 500 documents per batch
-
Concurrent Operations:
- Separate read/write locks allow searches during indexing
- Async I/O for all network operations
-
Resource Management:
- Disk space check before cache writes (1GB minimum)
- Rate limiting between batches (0.1s delays)
- Incremental checkpoint saving
-
Memory Efficiency:
- Streaming file processing
- Batch cleanup after processing
- No GPU required (FastEmbed CPU-optimized)
System Requirements
- CPU: Modern multi-core (4+ cores recommended)
- RAM: 2GB minimum, 4GB recommended
- Disk: 500MB free for indices and cache
- Services:
- Elasticsearch 8.x+ on localhost:9200
- Qdrant 1.x+ on localhost:6333
- GPU: Not required (FastEmbed uses CPU)
Troubleshooting
Docker Services Not Running
Symptom: Connection errors to Elasticsearch or Qdrant
Solution:
# Check service status
docker ps
# Start services if not running
docker-compose up -d
# Check logs
docker-compose logs elasticsearch
docker-compose logs qdrant
Index is Empty
Symptom: get_index_stats shows 0 chunks
Solutions:
# Check status via REST API
curl http://localhost:8000/status
# Rebuild index manually
uv run scripts/build_index.py
# Or via Python
uv run python -c "
import asyncio
from pathlib import Path
from search_engine import IndexManager
asyncio.run(IndexManager().index_documents([Path('docs')]))
"
Build Script Crashes (Exit 143 / SIGTERM)
Symptom: Indexing process terminates with Exit Code 143 or runs out of memory
Causes:
- System OOM (Out Of Memory) killer terminating process
- Insufficient RAM for large batch processing
Solutions:
-
Reduce batch sizes (already optimized in current version):
- Embeddings: 50 chunks per batch
- Qdrant: 100 points per batch
- Elasticsearch: 500 documents per batch
-
Close memory-intensive applications during indexing
-
Monitor memory usage:
# Watch memory in real-time watch -n 1 free -h # Monitor indexing process ps aux | grep build_index -
Index in smaller batches:
# Index one technology at a time uv run scripts/build_index.py --tech django uv run scripts/build_index.py --tech drf uv run scripts/build_index.py --tech psycopg
Semantic Search Returns All Scores = 0
Symptom: Search results show semantic_score: 0.0 for all results
Causes:
- Qdrant vectors not properly indexed
- Stale MCP client cache
- Server not restarted after reindexing
Solutions:
-
Restart the server:
# Stop server (Ctrl+C) # Restart uv run server.py -
Verify Qdrant has vectors:
curl http://localhost:6333/collections/docs_collection # Check "points_count" > 0 and "vectors_count" > 0 -
Test semantic search directly:
uv run python -c "
import asyncio from search_engine import IndexManager
async def test(): manager = IndexManager() await manager.initialize() results = await manager.search('Django authentication', top_k=5) for r in results: print(f'Semantic: {r.semantic_score}, BM25: {r.bm25_score}')
asyncio.run(test()) "
4. **Rebuild Qdrant collection:**
```bash
uv run scripts/build_index.py --clear
uv run scripts/build_index.py
Accidentally Cleared All Documentation
Symptom: Used --tech X --clear expecting to clear only technology X, but all docs were deleted
Cause: Older versions of the script had incorrect clear logic
Current Behavior (Fixed):
--clearalone: Clears ALL documentation--tech django --clear: Clears ONLY Django docs, keeps DRF and Psycopg--tech drf --clear: Clears ONLY DRF docs
Recovery:
# Reindex everything
uv run scripts/build_index.py
# Or reindex specific techs
uv run scripts/build_index.py --tech django
uv run scripts/build_index.py --tech drf
uv run scripts/build_index.py --tech psycopg
Poor Search Quality
Symptom: Results don't match expectations or miss relevant documents
Solutions:
-
Check query log at
.index/search_queries.logto see how queries are being processed -
Tune hybrid search weights:
- Keyword-heavy queries (exact terms, function names): Increase
BM25_WEIGHTexport BM25_WEIGHT=0.7 export SEMANTIC_WEIGHT=0.3 - Conceptual queries (how to, best practices): Increase
SEMANTIC_WEIGHTexport BM25_WEIGHT=0.3 export SEMANTIC_WEIGHT=0.7
- Keyword-heavy queries (exact terms, function names): Increase
-
Try different embedding models:
# Larger, more accurate (slower) export EMBEDDING_MODEL="sentence-transformers/all-mpnet-base-v2" # Smaller, faster (less accurate) export EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2" # Rebuild with new model uv run scripts/build_index.py --force -
Increase result count:
search(query="...", top_k=20) # Default is 10
Disk Space Issues
Symptom: Index building fails with disk write errors
Cause: Insufficient disk space for metadata cache writes
Solution:
- The system checks for minimum 1GB free space before writing caches
- Free up disk space if needed:
df -h # Check available space docker system prune # Clean Docker artifacts
Embeddings Taking Too Long
Solutions:
- FastEmbed is already CPU-optimized and batched (50 chunks at a time)
- Try smaller model:
export EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2" - Verify Elasticsearch and Qdrant are running:
docker ps - Check CPU usage isn't bottlenecked:
toporhtop
Missing Await Errors
Symptom: RuntimeWarning: coroutine 'get_stats' was never awaited
Cause: Async functions called without await
Solution: Already fixed in current version. All async calls properly awaited:
# Correct (current version)
stats = await manager.get_stats()
# Incorrect (would cause warning)
stats = manager.get_stats()
Development
Local Cache Directory (.index/)
The .index/ directory contains local metadata cache and should NOT be committed to Git:
.index/
├── checksums.json # File checksums for incremental indexing
├── metadata_cache.pkl # Pickled chunk metadata
└── search_queries.log # Query history for debugging
Git Configuration:
# Already in .gitignore
echo ".index/" >> .gitignore
Why not commit?
- Contains machine-specific paths
- Large binary pickle files
- Rebuilt automatically on first run
- Elasticsearch and Qdrant hold the actual indices
Rebuilding on new machine:
# Clone repo
git clone <repo-url>
cd docs-mcp-server
# Start services
docker-compose up -d
# Build index (will create .index/ automatically)
uv run scripts/build_index.py
Run Tests
# Quick sanity check (30 seconds)
uv run tests/smoke_test.py
# Comprehensive test suite (8-10 minutes)
uv run tests/run_comprehensive_tests.py
# Individual test suites
uv run tests/test_search_engine.py # Core functionality
uv run tests/test_score_accuracy.py # Score validation
uv run tests/test_integration.py # Integration tests
uv run tests/test_mcp_accuracy.py # 50+ query tests
# See tests/README.md for full documentation
Test Coverage:
- ✅ Core search functionality (20+ tests)
- ✅ Score accuracy & thresholds (15+ tests)
- ✅ Integration workflows (7+ tests)
- ✅ Real-world queries (50+ queries)
- ✅ Hybrid scoring validation
- ✅ Tech/component filters
- ✅ Edge cases & error handling
Debug Logging
Enable debug output:
# Set environment variable
export PYTHONVERBOSE=1
# Or add to Python code
uv run python -c "import logging; logging.basicConfig(level=logging.DEBUG)"
Debug semantic search:
# Already enabled in search_engine.py
# Check logs for:
# - "Semantic search returned X results"
# - "Normalized semantic scores: [...]"
# - "Final hybrid scores: [...]"
Inspect Query Log
# Watch queries in real-time
tail -f .index/search_queries.log
# Analyze query patterns
grep "query=" .index/search_queries.log | head -20
Testing Changes
After modifying search weights:
# Update environment
export BM25_WEIGHT=0.7
export SEMANTIC_WEIGHT=0.3
# Restart server (no reindex needed)
uv run server.py
After modifying chunk size:
# Update environment
export CHUNK_SIZE_TOKENS=200
# Must rebuild index with new chunks
uv run scripts/build_index.py --force
After modifying embedding model:
# Update environment
export EMBEDDING_MODEL="sentence-transformers/all-mpnet-base-v2"
# Must rebuild with new embeddings
uv run scripts/build_index.py --force
Future Enhancements
Post-MVP improvements:
- Local reranker (BAAI/bge-reranker) for top-k refinement
- Document diffing for incremental updates
- File watcher for auto-reindex
- Version-aware filtering (Django 4 vs 5)
- Query expansion with synonyms
- Usage analytics dashboard
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test with MCP Inspector
- Submit a pull request
Support
- GitHub Issues: [Report bugs or request features]
- MCP Docs: https://modelcontextprotocol.io/
- SentenceTransformers: https://www.sbert.net/
Built with: