HamStudy/signalstash
If you are the rightful owner of signalstash and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Signal Stash is a platform designed to process Markdown documentation into a searchable vector database, offering REST and MCP APIs for efficient querying.
searchDocs
Search documents with semantic similarity.
getSection
Get all paragraphs under a specific heading.
getFile
Get all paragraphs from a specific file.
listCollections
List all available collections.
Signal Stash
Process Markdown documentation into a searchable vector database with REST and MCP APIs.
Features
- 🗂️ Markdown Ingestion: Process
.md
files with frontmatter support - 🔍 Semantic Search: Vector-based search using OpenAI embeddings
- 🌐 REST API: Query documents via HTTP endpoints
- 🔌 HTTP MCP API: Model Context Protocol via JSON-RPC 2.0 over HTTP
- 📊 Qdrant Integration: Efficient vector storage and retrieval with automatic replication
- 🎯 Multi-Source Support: Handle multiple documentation sources with intelligent relevance
- ⚡ Importance Scoring: Prioritize critical information in search results
Quick Start
-
Install dependencies
npm install
-
Set up environment
cp .env.example .env # Edit .env with your API keys
-
Start Qdrant (using Docker)
docker run -p 6333:6333 qdrant/qdrant
-
Ingest documents
# Basic ingestion (to default collection) npm run ingest -- /path/to/markdown/files # Ingest to specific collection npm run ingest -- /path/to/emails --collection=emails npm run ingest -- /path/to/documentation --collection=docs # With source context (improves search relevance) npm run ingest -- /path/to/vendure-docs --source="vendure" --context="e-commerce backend" npm run ingest -- /path/to/react-docs --source="react" --context="frontend framework" # With importance scoring (higher = more important, default: 10) npm run ingest -- /path/to/critical-docs --importance=20 npm run ingest -- /path/to/archive-docs --importance=5 # Combine all options npm run ingest -- /path/to/emails --collection=emails --source="gmail" --context="personal emails" --importance=15
-
Start the server
npm run dev
Configuration
Environment variables (see .env.example
):
OPENAI_API_KEY
- OpenAI API key for embeddingsQDRANT_HOST
- Qdrant server URL (default:http://localhost:6333
)QDRANT_API_KEY
- Qdrant API key (optional)QDRANT_COLLECTION
- Collection name (default:docs
)PORT
- Server port (default:3000
)HOST
- Server host/interface to bind to (default:0.0.0.0
)LOG_LEVEL
- Logging level (default:info
)EMBEDDING_MODEL
- Model to use (openai
orhuggingface
, default:openai
)
API Endpoints
REST API
Search Endpoints:
GET /search/:collection?q=<query>
- Semantic search (JSON format)GET /search/:collection?q=<query>&format=markdown
- Semantic search (Markdown format)GET /search/:collection?q=<query>&source=<source>
- Search within specific documentation source
Search Parameters:
q
- Search query (required)format
- Response format:json
(default) ormarkdown
limit
- Maximum results (default: 5)source
- Filter by documentation sourcescoreThreshold
- Minimum relevance score 0-1 (default: 0.7, markdown only)expandSections
- Expand full sections with multiple matches (default: true, markdown only)maxResponseChars
- Maximum response size (default: 50000, markdown only)
Other Endpoints:
GET /sources/:collection
- List all ingested documentation sourcesGET /section/:collection/:hash
- Get section by heading hashGET /document/:collection/:filename
- Get document by filenameGET /health
- Health check
Note: The :collection
parameter is optional in all routes. If omitted, the default collection (default
) is used. For backward compatibility, the old routes without collection (e.g., /search?q=...
) continue to work.
MCP API (HTTP-based)
The MCP API is available as an HTTP endpoint at /mcp/:collection
using JSON-RPC 2.0 protocol:
Available Methods:
searchDocs
- Search documents with semantic similaritygetSection
- Get all paragraphs under a specific headinggetFile
- Get all paragraphs from a specific filelistCollections
- List all available collections
Collection Handling:
- The collection in the URL (e.g.,
/mcp/emails
) serves as the default collection for all tools - Each tool accepts an optional
collection
parameter to override the URL default - Use
listCollections
to discover available collections
Example Request:
# Search in default collection
curl -X POST http://localhost:3000/mcp/default \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "searchDocs",
"params": {
"query": "authentication",
"limit": 5
},
"id": 1
}'
# Search in emails collection
curl -X POST http://localhost:3000/mcp/emails \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "searchDocs",
"params": {
"query": "invoice",
"limit": 5
},
"id": 1
}'
# List all collections
curl -X POST http://localhost:3000/mcp/default \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "listCollections",
"params": {},
"id": 1
}'
# Search with collection override (search "notes" collection while using emails endpoint)
curl -X POST http://localhost:3000/mcp/emails \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "searchDocs",
"params": {
"query": "todo",
"collection": "notes"
},
"id": 1
}'
Example Response:
{
"jsonrpc": "2.0",
"result": [
{
"match": {
"text": "The login mutation allows...",
"score": 0.89,
"document": {
"path": "guides/auth/index.md",
"source": "vendure",
"context": "e-commerce backend",
"title": "Authentication Guide"
},
"location": {
"paragraphIndex": 12,
"headingHierarchy": ["API Reference", "Authentication", "Login"]
}
},
"context": {
"before": ["Previous paragraph..."],
"after": ["Next paragraph..."]
},
"section": {
"headingHash": "a1b2c3d4",
"chunks": ["Other paragraphs in the same section..."]
}
}
],
"id": 1
}
Example: Markdown Search Results
# Search with markdown format (default collection)
curl "http://localhost:3000/search/default?q=authentication&format=markdown"
# Search in specific collection
curl "http://localhost:3000/search/emails?q=invoice&format=markdown"
Returns a nicely formatted markdown document:
# Search Results
**Query:** "authentication"
---
## Authentication Guide
**Source:** vendure | **Context:** e-commerce backend | **Path:** `guides/auth/index.md`
### Authentication > Login
The login mutation allows users to authenticate with the system...
### Authentication > JWT Tokens
JWT tokens are used for maintaining session state...
*[Section expanded - 3 relevant matches, scores: 0.92, 0.89, 0.87]*
---
*Found 5 relevant results*
Development
npm run dev # Start dev server with hot reload
npm run build # Build TypeScript
npm run test # Run tests
npm run lint # Lint code
npm run format # Format code
Advanced Features
Multiple Collections (Namespaces)
Signal Stash supports multiple collections to organize different types of content:
- Collection Naming: All collections use the pattern
docs-<collection>
internally - Default Collection: If no collection is specified,
default
is used - Isolation: Each collection is completely isolated from others
- Use Cases:
- Separate documentation from emails
- Isolate different projects or domains
- Create test vs production collections
# Ingest different content types to separate collections
npm run ingest -- /docs/api --collection=api-docs
npm run ingest -- /emails/archive --collection=emails
npm run ingest -- /notes/personal --collection=notes
# Search within specific collections
curl "http://localhost:3000/search/api-docs?q=authentication"
curl "http://localhost:3000/search/emails?q=invoice"
curl "http://localhost:3000/search/notes?q=todo"
Multi-Source Documentation Support
Signal Stash handles multiple documentation sources intelligently:
-
Context-Aware Embeddings: When you specify a source and context during ingestion, this information is included in the embedding text. For example, a chunk about "user authentication" from Vendure docs will be embedded with context like "Vendure (e-commerce backend) > API Reference > Authentication > Login :: The login mutation..."
-
Automatic Relevance: When searching, the embeddings naturally favor results from the relevant documentation set without requiring explicit filters. A search for "user authentication" will automatically rank Vendure auth docs higher than React auth docs based on the full context.
-
Optional Filtering: If needed, you can explicitly filter by source using the
source
parameter in search queries.
Importance Scoring
Control which information gets prioritized in search results:
- Set importance during ingestion with
--importance=<number>
(default: 10) - Higher numbers indicate more important content
- Useful for prioritizing:
- Critical API documentation (importance: 20)
- Standard documentation (importance: 10)
- Archived or legacy content (importance: 5)
Automatic Qdrant Replication
Signal Stash automatically detects your Qdrant cluster configuration and sets appropriate replication factors for high availability.
Architecture
- TypeScript for type safety
- Express for HTTP server
- Unified/Remark for Markdown AST parsing
- OpenAI for embeddings
- Qdrant for vector storage
- Pino for structured logging
License
MIT