signalstash by HamStudy - MCP Server

Signal Stash

Process Markdown documentation into a searchable vector database with REST and MCP APIs.

Features

🗂️ Markdown Ingestion: Process .md files with frontmatter support
🔍 Semantic Search: Vector-based search using OpenAI embeddings
🌐 REST API: Query documents via HTTP endpoints
🔌 HTTP MCP API: Model Context Protocol via JSON-RPC 2.0 over HTTP
📊 Qdrant Integration: Efficient vector storage and retrieval with automatic replication
🎯 Multi-Source Support: Handle multiple documentation sources with intelligent relevance
⚡ Importance Scoring: Prioritize critical information in search results

Quick Start

Install dependencies
```
npm install
```

Set up environment

cp .env.example .env
# Edit .env with your API keys

Start Qdrant (using Docker)
```
docker run -p 6333:6333 qdrant/qdrant
```

Ingest documents

# Basic ingestion (to default collection)
npm run ingest -- /path/to/markdown/files

# Ingest to specific collection
npm run ingest -- /path/to/emails --collection=emails
npm run ingest -- /path/to/documentation --collection=docs

# With source context (improves search relevance)
npm run ingest -- /path/to/vendure-docs --source="vendure" --context="e-commerce backend"
npm run ingest -- /path/to/react-docs --source="react" --context="frontend framework"

# With importance scoring (higher = more important, default: 10)
npm run ingest -- /path/to/critical-docs --importance=20
npm run ingest -- /path/to/archive-docs --importance=5

# Combine all options
npm run ingest -- /path/to/emails --collection=emails --source="gmail" --context="personal emails" --importance=15

Start the server
```
npm run dev
```

Configuration

Environment variables (see .env.example):

OPENAI_API_KEY - OpenAI API key for embeddings
QDRANT_HOST - Qdrant server URL (default: http://localhost:6333)
QDRANT_API_KEY - Qdrant API key (optional)
QDRANT_COLLECTION - Collection name (default: docs)
PORT - Server port (default: 3000)
HOST - Server host/interface to bind to (default: 0.0.0.0)
LOG_LEVEL - Logging level (default: info)
EMBEDDING_MODEL - Model to use (openai or huggingface, default: openai)

API Endpoints

REST API

Search Endpoints:

GET /search/:collection?q=<query> - Semantic search (JSON format)
GET /search/:collection?q=<query>&format=markdown - Semantic search (Markdown format)
GET /search/:collection?q=<query>&source=<source> - Search within specific documentation source

Search Parameters:

q - Search query (required)
format - Response format: json (default) or markdown
limit - Maximum results (default: 5)
source - Filter by documentation source
scoreThreshold - Minimum relevance score 0-1 (default: 0.7, markdown only)
expandSections - Expand full sections with multiple matches (default: true, markdown only)
maxResponseChars - Maximum response size (default: 50000, markdown only)

Other Endpoints:

GET /sources/:collection - List all ingested documentation sources
GET /section/:collection/:hash - Get section by heading hash
GET /document/:collection/:filename - Get document by filename
GET /health - Health check

Note: The :collection parameter is optional in all routes. If omitted, the default collection (default) is used. For backward compatibility, the old routes without collection (e.g., /search?q=...) continue to work.

MCP API (HTTP-based)

The MCP API is available as an HTTP endpoint at /mcp/:collection using JSON-RPC 2.0 protocol:

Available Methods:

searchDocs - Search documents with semantic similarity
getSection - Get all paragraphs under a specific heading
getFile - Get all paragraphs from a specific file
listCollections - List all available collections

Collection Handling:

The collection in the URL (e.g., /mcp/emails) serves as the default collection for all tools
Each tool accepts an optional collection parameter to override the URL default
Use listCollections to discover available collections

Example Request:

# Search in default collection
curl -X POST http://localhost:3000/mcp/default \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "searchDocs",
    "params": {
      "query": "authentication",
      "limit": 5
    },
    "id": 1
  }'

# Search in emails collection
curl -X POST http://localhost:3000/mcp/emails \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "searchDocs",
    "params": {
      "query": "invoice",
      "limit": 5
    },
    "id": 1
  }'

# List all collections
curl -X POST http://localhost:3000/mcp/default \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "listCollections",
    "params": {},
    "id": 1
  }'

# Search with collection override (search "notes" collection while using emails endpoint)
curl -X POST http://localhost:3000/mcp/emails \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "searchDocs",
    "params": {
      "query": "todo",
      "collection": "notes"
    },
    "id": 1
  }'

Example Response:

{
  "jsonrpc": "2.0",
  "result": [
    {
      "match": {
        "text": "The login mutation allows...",
        "score": 0.89,
        "document": {
          "path": "guides/auth/index.md",
          "source": "vendure",
          "context": "e-commerce backend",
          "title": "Authentication Guide"
        },
        "location": {
          "paragraphIndex": 12,
          "headingHierarchy": ["API Reference", "Authentication", "Login"]
        }
      },
      "context": {
        "before": ["Previous paragraph..."],
        "after": ["Next paragraph..."]
      },
      "section": {
        "headingHash": "a1b2c3d4",
        "chunks": ["Other paragraphs in the same section..."]
      }
    }
  ],
  "id": 1
}

Example: Markdown Search Results

# Search with markdown format (default collection)
curl "http://localhost:3000/search/default?q=authentication&format=markdown"

# Search in specific collection
curl "http://localhost:3000/search/emails?q=invoice&format=markdown"

Returns a nicely formatted markdown document:

# Search Results

**Query:** "authentication"

---

## Authentication Guide
**Source:** vendure | **Context:** e-commerce backend | **Path:** `guides/auth/index.md`

### Authentication > Login

The login mutation allows users to authenticate with the system...

### Authentication > JWT Tokens

JWT tokens are used for maintaining session state...

*[Section expanded - 3 relevant matches, scores: 0.92, 0.89, 0.87]*

---

*Found 5 relevant results*

Development

npm run dev       # Start dev server with hot reload
npm run build     # Build TypeScript
npm run test      # Run tests
npm run lint      # Lint code
npm run format    # Format code

Advanced Features

Multiple Collections (Namespaces)

Signal Stash supports multiple collections to organize different types of content:

Collection Naming: All collections use the pattern docs-<collection> internally
Default Collection: If no collection is specified, default is used
Isolation: Each collection is completely isolated from others
Use Cases:
- Separate documentation from emails
- Isolate different projects or domains
- Create test vs production collections

# Ingest different content types to separate collections
npm run ingest -- /docs/api --collection=api-docs
npm run ingest -- /emails/archive --collection=emails
npm run ingest -- /notes/personal --collection=notes

# Search within specific collections
curl "http://localhost:3000/search/api-docs?q=authentication"
curl "http://localhost:3000/search/emails?q=invoice"
curl "http://localhost:3000/search/notes?q=todo"

Multi-Source Documentation Support

Signal Stash handles multiple documentation sources intelligently:

Context-Aware Embeddings: When you specify a source and context during ingestion, this information is included in the embedding text. For example, a chunk about "user authentication" from Vendure docs will be embedded with context like "Vendure (e-commerce backend) > API Reference > Authentication > Login :: The login mutation..."
Automatic Relevance: When searching, the embeddings naturally favor results from the relevant documentation set without requiring explicit filters. A search for "user authentication" will automatically rank Vendure auth docs higher than React auth docs based on the full context.
Optional Filtering: If needed, you can explicitly filter by source using the source parameter in search queries.

Importance Scoring

Control which information gets prioritized in search results:

Set importance during ingestion with --importance=<number> (default: 10)
Higher numbers indicate more important content
Useful for prioritizing:
- Critical API documentation (importance: 20)
- Standard documentation (importance: 10)
- Archived or legacy content (importance: 5)

Automatic Qdrant Replication

Signal Stash automatically detects your Qdrant cluster configuration and sets appropriate replication factors for high availability.

Architecture

TypeScript for type safety
Express for HTTP server
Unified/Remark for Markdown AST parsing
OpenAI for embeddings
Qdrant for vector storage
Pino for structured logging

License

MIT