semantic-search

bborbe/semantic-search

3.3

If you are the rightful owner of semantic-search and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

Semantic Search MCP is a server designed for semantic search over markdown files, utilizing sentence-transformers for embeddings and FAISS for vector search.

Tools
2
Resources
0
Prompts
0

Semantic Search

Semantic search over markdown files. Find related notes by meaning, not just keywords. Detect duplicates before creating new notes.

Supports two server modes:

  • MCP mode — For Claude Code integration
  • REST mode — For OpenClaw, scripts, and HTTP clients

Features

  • Semantic search using sentence-transformers
  • Duplicate/similar note detection
  • Auto-updating index with file watcher
  • Multi-directory support
  • Inline tag extraction (#tag-name)

Installation

Permanent install (recommended)

# Install as a tool (creates ~/.local/bin/semantic-search-mcp)
uv tool install git+https://github.com/bborbe/semantic-search

💡 No GPU? Use CPU-only PyTorch

The default install includes CUDA support (~7GB). If you don't have a dedicated GPU, install with CPU-only PyTorch to save ~5GB disk space:

uv tool install --index https://download.pytorch.org/whl/cpu \
  git+https://github.com/bborbe/semantic-search

Performance is identical for typical vault sizes — embedding models run fine on CPU.

One-off usage

# Run directly with uvx (no install needed)
uvx --from git+https://github.com/bborbe/semantic-search semantic-search-mcp serve

From PyPI (when published)

pip install semantic-search-mcp

Server Modes

MCP Mode (for Claude Code)

claude mcp add -s project semantic-search \
  --env CONTENT_PATH=/path/to/vault \
  -- \
  uvx --from git+https://github.com/bborbe/semantic-search semantic-search-mcp serve

Tools available:

  • search_related(query, top_k=5) — Find semantically related notes
  • check_duplicates(file_path) — Detect duplicate/similar notes

REST Mode (for OpenClaw/HTTP)

# Start server
CONTENT_PATH=/path/to/vault semantic-search-mcp serve --mode rest --port 8321

# Or with uvx
CONTENT_PATH=/path/to/vault uvx --from git+https://github.com/bborbe/semantic-search \
  semantic-search-mcp serve --mode rest --port 8321

Endpoints:

EndpointMethodDescription
/search?q=...&top_k=5GETSemantic search
/duplicates?file=...&threshold=0.85GETFind duplicate notes
/healthGETHealth check with index stats
/reindexGET/POSTForce index rebuild

Example queries:

# Search
curl 'http://localhost:8321/search?q=kubernetes+deployment'

# Find duplicates
curl 'http://localhost:8321/duplicates?file=notes/my-note.md'

# Health check
curl 'http://localhost:8321/health'

CLI Commands

One-shot commands without running a server:

# Search
CONTENT_PATH=/path/to/vault semantic-search-mcp search "kubernetes deployment"

# Find duplicates
CONTENT_PATH=/path/to/vault semantic-search-mcp duplicates path/to/note.md

Configuration

Environment Variables

VariableDescriptionDefault
CONTENT_PATHDirectory to index (comma-separated for multiple)./content
LOG_LEVELLogging level (DEBUG, INFO, WARNING, ERROR)INFO

Multiple Directories

Index multiple directories by separating paths with commas:

CONTENT_PATH=/path/to/vault1,/path/to/vault2,/path/to/docs

All directories are indexed together and searched as one unified index.

How It Works

First run downloads a small embedding model (~90MB) and indexes your markdown files (<1s for typical vaults). The index auto-updates when files change via filesystem watcher.

Indexed Content

Each markdown file is indexed with weighted components:

ComponentWeightNotes
Filename3x
Frontmatter title3x
Frontmatter tags2xMerged with inline tags
Frontmatter aliases2x
Inline tags (#tag)2xExtracted from body
First H1 heading2x
Body content1xFirst 500 words

Development

# Clone
git clone https://github.com/bborbe/semantic-search
cd semantic-search

# Install dev dependencies
make install

# Run checks
make check

# Run tests
make test

License

BSD 2-Clause License — see .