bborbe/semantic-search
If you are the rightful owner of semantic-search and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
Semantic Search MCP is a server designed for semantic search over markdown files, utilizing sentence-transformers for embeddings and FAISS for vector search.
Semantic Search
Semantic search over markdown files. Find related notes by meaning, not just keywords. Detect duplicates before creating new notes.
Supports two server modes:
- MCP mode — For Claude Code integration
- REST mode — For OpenClaw, scripts, and HTTP clients
Features
- Semantic search using sentence-transformers
- Duplicate/similar note detection
- Auto-updating index with file watcher
- Multi-directory support
- Inline tag extraction (
#tag-name)
Installation
Permanent install (recommended)
# Install as a tool (creates ~/.local/bin/semantic-search-mcp)
uv tool install git+https://github.com/bborbe/semantic-search
💡 No GPU? Use CPU-only PyTorch
The default install includes CUDA support (~7GB). If you don't have a dedicated GPU, install with CPU-only PyTorch to save ~5GB disk space:
uv tool install --index https://download.pytorch.org/whl/cpu \ git+https://github.com/bborbe/semantic-searchPerformance is identical for typical vault sizes — embedding models run fine on CPU.
One-off usage
# Run directly with uvx (no install needed)
uvx --from git+https://github.com/bborbe/semantic-search semantic-search-mcp serve
From PyPI (when published)
pip install semantic-search-mcp
Server Modes
MCP Mode (for Claude Code)
claude mcp add -s project semantic-search \
--env CONTENT_PATH=/path/to/vault \
-- \
uvx --from git+https://github.com/bborbe/semantic-search semantic-search-mcp serve
Tools available:
search_related(query, top_k=5)— Find semantically related notescheck_duplicates(file_path)— Detect duplicate/similar notes
REST Mode (for OpenClaw/HTTP)
# Start server
CONTENT_PATH=/path/to/vault semantic-search-mcp serve --mode rest --port 8321
# Or with uvx
CONTENT_PATH=/path/to/vault uvx --from git+https://github.com/bborbe/semantic-search \
semantic-search-mcp serve --mode rest --port 8321
Endpoints:
| Endpoint | Method | Description |
|---|---|---|
/search?q=...&top_k=5 | GET | Semantic search |
/duplicates?file=...&threshold=0.85 | GET | Find duplicate notes |
/health | GET | Health check with index stats |
/reindex | GET/POST | Force index rebuild |
Example queries:
# Search
curl 'http://localhost:8321/search?q=kubernetes+deployment'
# Find duplicates
curl 'http://localhost:8321/duplicates?file=notes/my-note.md'
# Health check
curl 'http://localhost:8321/health'
CLI Commands
One-shot commands without running a server:
# Search
CONTENT_PATH=/path/to/vault semantic-search-mcp search "kubernetes deployment"
# Find duplicates
CONTENT_PATH=/path/to/vault semantic-search-mcp duplicates path/to/note.md
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
CONTENT_PATH | Directory to index (comma-separated for multiple) | ./content |
LOG_LEVEL | Logging level (DEBUG, INFO, WARNING, ERROR) | INFO |
Multiple Directories
Index multiple directories by separating paths with commas:
CONTENT_PATH=/path/to/vault1,/path/to/vault2,/path/to/docs
All directories are indexed together and searched as one unified index.
How It Works
First run downloads a small embedding model (~90MB) and indexes your markdown files (<1s for typical vaults). The index auto-updates when files change via filesystem watcher.
Indexed Content
Each markdown file is indexed with weighted components:
| Component | Weight | Notes |
|---|---|---|
| Filename | 3x | |
Frontmatter title | 3x | |
Frontmatter tags | 2x | Merged with inline tags |
Frontmatter aliases | 2x | |
Inline tags (#tag) | 2x | Extracted from body |
| First H1 heading | 2x | |
| Body content | 1x | First 500 words |
Development
# Clone
git clone https://github.com/bborbe/semantic-search
cd semantic-search
# Install dev dependencies
make install
# Run checks
make check
# Run tests
make test
License
BSD 2-Clause License — see .