documenters-mcp by rajivsinclair - MCP Server

Documenters MCP Server

An intelligent Model Context Protocol (MCP) server that provides AI-powered search across all Documenters Network data. This server enables semantic and hybrid search through thousands of local government meetings from 19+ cities.

Features

Hybrid Search: Combines semantic (Gemini embeddings) and keyword search
Comprehensive Data: All historical meeting notes, assignments, and documents
Real-time Sync: Automated sync from production database every 2 hours
Rich Context: Program/city, agency, and temporal filtering
MCP Compatible: Works with Claude and other MCP-enabled clients

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Production    │    │    Supabase      │    │   MCP Client    │
│   Database      │───▶│   (Sync + RAG)   │───▶│   (Claude)      │
│   (Heroku)      │    │                  │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌──────────────────┐
                       │  Ask Docsy Web   │
                       │   (Vercel)       │
                       └──────────────────┘

Quick Start

1. Deploy Infrastructure

# Clone and navigate to the project
cd /Users/j/GitHub/docs-ai/documenters-mcp

# Run deployment script (requires supabase CLI)
./deploy.sh

2. Run MCP Server

# Install dependencies
pip install mcp httpx

# Run the server
python src/server.py

3. Use with Claude

Add to your Claude MCP configuration:

{
  "mcpServers": {
    "documenters": {
      "command": "python",
      "args": ["/path/to/documenters-mcp/src/server.py"],
      "env": {
        "SUPABASE_URL": "https://gvluxfxoxiauztcpkznh.supabase.co",
        "SUPABASE_ANON_KEY": "your_anon_key"
      }
    }
  }
}

MCP Tools Available

`search_documenters`

Search across all Documenters content with filters:

query: Natural language or keyword search
programs: Filter by cities (e.g., ["Chicago", "Detroit"])
agencies: Filter by agency type (e.g., ["City Council"])
date_from/date_to: Time range filters
search_method: "semantic", "keyword", or "hybrid"

`get_agency_timeline`

Get chronological timeline of agency activities:

agency: Agency name to track
program: Specific city to filter by
months_back: How far back to search (default: 12)
topic: Optional topic focus

`track_topic_across_cities`

Compare how topics are discussed across cities:

topic: Subject to track (e.g., "housing", "police reform")
programs: Cities to compare (leave empty for all)
time_period: "last_month", "last_3_months", etc.

`get_meeting_insights`

Detailed analysis of specific meetings:

meeting_query: Description to find meeting
analysis_type: "summary", "key_decisions", "public_comments", "action_items"

`compare_policies`

Compare policy approaches across cities:

policy_topic: Policy area to compare
cities: Cities to compare (leave empty for all)

Data Sync

The system automatically syncs data from the production database:

Tables Synced

programs: All Documenters programs/cities
agencies: Government agencies and departments
assignments: Meeting coverage assignments
roles: Individual documenter roles
submissions: Meeting notes and checklist data
documents: Additional documents and content

Sync Schedule

Every 2 hours: Incremental sync of changed data
Daily at 3 AM: Health check and validation
Every 6 hours: Generate embeddings for new content

Monitoring

Check sync health:

SELECT * FROM sync_health_check();

View recent sync logs:

SELECT * FROM sync_logs ORDER BY started_at DESC LIMIT 10;

Vector Embeddings

Uses Gemini embedding-001 for semantic search:

3072-dimensional vectors
Chunk-based text processing (1000 chars with 200 overlap)
Rich metadata for filtering (program, agency, date)
Hybrid search combining semantic similarity and keyword relevance

Example Queries

Ask the MCP server:

"What housing policies are being discussed in Chicago?"
"Show me recent police reform discussions across cities"
"What did the Detroit City Council decide about budget cuts?"
"Compare climate change initiatives across all cities"
"Track mentions of 'affordable housing' over the past 6 months"

Development

Project Structure

documenters-mcp/
├── sql/                    # Database migrations and functions
│   ├── 01_comprehensive_schema.sql
│   └── 02_sync_functions.sql
├── supabase/functions/     # Edge functions
│   ├── generate-embeddings/
│   └── mcp-search/
├── src/                    # MCP server code
│   └── server.py
├── deploy.sh              # Deployment script
└── README.md

Local Development

Set up environment variables:

export SUPABASE_URL="https://gvluxfxoxiauztcpkznh.supabase.co"
export SUPABASE_ANON_KEY="your_anon_key"
export GEMINI_API_KEY="your_gemini_key"

Install dependencies:

pip install mcp httpx supabase

Run locally:

python src/server.py

Production Deployment

The system is deployed on Supabase with:

Database: PostgreSQL with pgvector extension
Edge Functions: Deno-based serverless functions
Cron Jobs: Automated sync scheduling
SSL: Secure connections to production database

Troubleshooting

Sync Issues

-- Check for failed syncs
SELECT * FROM sync_logs WHERE status = 'failed' ORDER BY started_at DESC;

-- Reset sync if needed
UPDATE sync_metadata SET sync_status = 'ready' WHERE sync_status = 'running';

Embedding Issues

-- Check embedding counts
SELECT COUNT(*) FROM document_embeddings;

-- Manually trigger embedding generation
SELECT net.http_post(
    'https://gvluxfxoxiauztcpkznh.supabase.co/functions/v1/generate-embeddings',
    '{"batch_size": 50}'
);

Search Issues

Check Supabase logs for edge function errors
Verify Gemini API key is set correctly
Test search function directly via curl

Contributing

Fork the repository
Create a feature branch
Make changes with proper tests
Submit a pull request

License

MIT License - see LICENSE file for details

Support

For issues and questions:

Check the troubleshooting section above
Review Supabase project logs
Contact the development team