signalstash

HamStudy/signalstash

3.2

If you are the rightful owner of signalstash and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Signal Stash is a platform designed to process Markdown documentation into a searchable vector database, offering REST and MCP APIs for efficient querying.

Tools
  1. searchDocs

    Search documents with semantic similarity.

  2. getSection

    Get all paragraphs under a specific heading.

  3. getFile

    Get all paragraphs from a specific file.

  4. listCollections

    List all available collections.

Signal Stash

Process Markdown documentation into a searchable vector database with REST and MCP APIs.

Features

  • 🗂️ Markdown Ingestion: Process .md files with frontmatter support
  • 🔍 Semantic Search: Vector-based search using OpenAI embeddings
  • 🌐 REST API: Query documents via HTTP endpoints
  • 🔌 HTTP MCP API: Model Context Protocol via JSON-RPC 2.0 over HTTP
  • 📊 Qdrant Integration: Efficient vector storage and retrieval with automatic replication
  • 🎯 Multi-Source Support: Handle multiple documentation sources with intelligent relevance
  • Importance Scoring: Prioritize critical information in search results

Quick Start

  1. Install dependencies

    npm install
    
  2. Set up environment

    cp .env.example .env
    # Edit .env with your API keys
    
  3. Start Qdrant (using Docker)

    docker run -p 6333:6333 qdrant/qdrant
    
  4. Ingest documents

    # Basic ingestion (to default collection)
    npm run ingest -- /path/to/markdown/files
    
    # Ingest to specific collection
    npm run ingest -- /path/to/emails --collection=emails
    npm run ingest -- /path/to/documentation --collection=docs
    
    # With source context (improves search relevance)
    npm run ingest -- /path/to/vendure-docs --source="vendure" --context="e-commerce backend"
    npm run ingest -- /path/to/react-docs --source="react" --context="frontend framework"
    
    # With importance scoring (higher = more important, default: 10)
    npm run ingest -- /path/to/critical-docs --importance=20
    npm run ingest -- /path/to/archive-docs --importance=5
    
    # Combine all options
    npm run ingest -- /path/to/emails --collection=emails --source="gmail" --context="personal emails" --importance=15
    
  5. Start the server

    npm run dev
    

Configuration

Environment variables (see .env.example):

  • OPENAI_API_KEY - OpenAI API key for embeddings
  • QDRANT_HOST - Qdrant server URL (default: http://localhost:6333)
  • QDRANT_API_KEY - Qdrant API key (optional)
  • QDRANT_COLLECTION - Collection name (default: docs)
  • PORT - Server port (default: 3000)
  • HOST - Server host/interface to bind to (default: 0.0.0.0)
  • LOG_LEVEL - Logging level (default: info)
  • EMBEDDING_MODEL - Model to use (openai or huggingface, default: openai)

API Endpoints

REST API

Search Endpoints:

  • GET /search/:collection?q=<query> - Semantic search (JSON format)
  • GET /search/:collection?q=<query>&format=markdown - Semantic search (Markdown format)
  • GET /search/:collection?q=<query>&source=<source> - Search within specific documentation source

Search Parameters:

  • q - Search query (required)
  • format - Response format: json (default) or markdown
  • limit - Maximum results (default: 5)
  • source - Filter by documentation source
  • scoreThreshold - Minimum relevance score 0-1 (default: 0.7, markdown only)
  • expandSections - Expand full sections with multiple matches (default: true, markdown only)
  • maxResponseChars - Maximum response size (default: 50000, markdown only)

Other Endpoints:

  • GET /sources/:collection - List all ingested documentation sources
  • GET /section/:collection/:hash - Get section by heading hash
  • GET /document/:collection/:filename - Get document by filename
  • GET /health - Health check

Note: The :collection parameter is optional in all routes. If omitted, the default collection (default) is used. For backward compatibility, the old routes without collection (e.g., /search?q=...) continue to work.

MCP API (HTTP-based)

The MCP API is available as an HTTP endpoint at /mcp/:collection using JSON-RPC 2.0 protocol:

Available Methods:

  • searchDocs - Search documents with semantic similarity
  • getSection - Get all paragraphs under a specific heading
  • getFile - Get all paragraphs from a specific file
  • listCollections - List all available collections

Collection Handling:

  • The collection in the URL (e.g., /mcp/emails) serves as the default collection for all tools
  • Each tool accepts an optional collection parameter to override the URL default
  • Use listCollections to discover available collections

Example Request:

# Search in default collection
curl -X POST http://localhost:3000/mcp/default \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "searchDocs",
    "params": {
      "query": "authentication",
      "limit": 5
    },
    "id": 1
  }'

# Search in emails collection
curl -X POST http://localhost:3000/mcp/emails \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "searchDocs",
    "params": {
      "query": "invoice",
      "limit": 5
    },
    "id": 1
  }'

# List all collections
curl -X POST http://localhost:3000/mcp/default \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "listCollections",
    "params": {},
    "id": 1
  }'

# Search with collection override (search "notes" collection while using emails endpoint)
curl -X POST http://localhost:3000/mcp/emails \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "searchDocs",
    "params": {
      "query": "todo",
      "collection": "notes"
    },
    "id": 1
  }'

Example Response:

{
  "jsonrpc": "2.0",
  "result": [
    {
      "match": {
        "text": "The login mutation allows...",
        "score": 0.89,
        "document": {
          "path": "guides/auth/index.md",
          "source": "vendure",
          "context": "e-commerce backend",
          "title": "Authentication Guide"
        },
        "location": {
          "paragraphIndex": 12,
          "headingHierarchy": ["API Reference", "Authentication", "Login"]
        }
      },
      "context": {
        "before": ["Previous paragraph..."],
        "after": ["Next paragraph..."]
      },
      "section": {
        "headingHash": "a1b2c3d4",
        "chunks": ["Other paragraphs in the same section..."]
      }
    }
  ],
  "id": 1
}

Example: Markdown Search Results

# Search with markdown format (default collection)
curl "http://localhost:3000/search/default?q=authentication&format=markdown"

# Search in specific collection
curl "http://localhost:3000/search/emails?q=invoice&format=markdown"

Returns a nicely formatted markdown document:

# Search Results

**Query:** "authentication"

---

## Authentication Guide
**Source:** vendure | **Context:** e-commerce backend | **Path:** `guides/auth/index.md`

### Authentication > Login

The login mutation allows users to authenticate with the system...

### Authentication > JWT Tokens

JWT tokens are used for maintaining session state...

*[Section expanded - 3 relevant matches, scores: 0.92, 0.89, 0.87]*

---

*Found 5 relevant results*

Development

npm run dev       # Start dev server with hot reload
npm run build     # Build TypeScript
npm run test      # Run tests
npm run lint      # Lint code
npm run format    # Format code

Advanced Features

Multiple Collections (Namespaces)

Signal Stash supports multiple collections to organize different types of content:

  • Collection Naming: All collections use the pattern docs-<collection> internally
  • Default Collection: If no collection is specified, default is used
  • Isolation: Each collection is completely isolated from others
  • Use Cases:
    • Separate documentation from emails
    • Isolate different projects or domains
    • Create test vs production collections
# Ingest different content types to separate collections
npm run ingest -- /docs/api --collection=api-docs
npm run ingest -- /emails/archive --collection=emails
npm run ingest -- /notes/personal --collection=notes

# Search within specific collections
curl "http://localhost:3000/search/api-docs?q=authentication"
curl "http://localhost:3000/search/emails?q=invoice"
curl "http://localhost:3000/search/notes?q=todo"

Multi-Source Documentation Support

Signal Stash handles multiple documentation sources intelligently:

  1. Context-Aware Embeddings: When you specify a source and context during ingestion, this information is included in the embedding text. For example, a chunk about "user authentication" from Vendure docs will be embedded with context like "Vendure (e-commerce backend) > API Reference > Authentication > Login :: The login mutation..."

  2. Automatic Relevance: When searching, the embeddings naturally favor results from the relevant documentation set without requiring explicit filters. A search for "user authentication" will automatically rank Vendure auth docs higher than React auth docs based on the full context.

  3. Optional Filtering: If needed, you can explicitly filter by source using the source parameter in search queries.

Importance Scoring

Control which information gets prioritized in search results:

  • Set importance during ingestion with --importance=<number> (default: 10)
  • Higher numbers indicate more important content
  • Useful for prioritizing:
    • Critical API documentation (importance: 20)
    • Standard documentation (importance: 10)
    • Archived or legacy content (importance: 5)

Automatic Qdrant Replication

Signal Stash automatically detects your Qdrant cluster configuration and sets appropriate replication factors for high availability.

Architecture

  • TypeScript for type safety
  • Express for HTTP server
  • Unified/Remark for Markdown AST parsing
  • OpenAI for embeddings
  • Qdrant for vector storage
  • Pino for structured logging

License

MIT