ca-codes-mcp-server by qter21 - MCP Server

CA Legal Codes MCP Server

A Model Context Protocol (MCP) server that provides Claude with access to California legal codes through MongoDB Atlas and Voyage AI embeddings, with built-in anti-hallucination features.

Features

🎯 Core Capabilities

Semantic Search: Natural language search across 51,802 CA legal code sections using voyage-law-2 embeddings
Exact Retrieval: Direct access to specific code sections by citation
Related Statutes: Discover similar and related legal code sections
Citation Validation: Mandatory validation to prevent hallucination
Document Drafting: AI-assisted legal document creation with verified citations

🛡️ Anti-Hallucination System

Retrieval-First Workflow: Forces retrieval before citation
Citation Tracking: Monitors all retrievals vs citations
Validation Enforcement: Validates every legal claim
Structured Responses: Separates retrieved facts from analysis
System Prompts: Explicit instructions to prevent hallucination

Architecture

ca-codes-mcp-server/
├── server.py                    # Main MCP server with stdio protocol
├── config.py                    # Configuration management
├── tools/
│   ├── retrieval.py             # Semantic search, get_section, find_similar
│   ├── validation.py            # Citation validation
│   ├── drafting.py              # Document drafting (declarations, MPAs, strategy)
│   └── workflows.py             # Orchestrated multi-step workflows
├── anti_hallucination/
│   ├── tracker.py               # Citation tracking
│   ├── validator.py             # Validation logic
│   └── prompts.py               # System prompts for Claude
├── context/
│   ├── session.py               # Session management
│   └── cache.py                 # Result caching
├── db/
│   ├── mongodb.py               # MongoDB Atlas operations
│   └── vector_ops.py            # Vector search
├── utils/
│   ├── voyage.py                # Voyage AI client
│   └── formatters.py            # Response formatting
└── models/
    ├── responses.py             # Structured response models
    └── documents.py             # Document templates

Installation

Prerequisites

Python 3.9+
MongoDB Atlas cluster with CA legal codes data
Voyage AI API key

Setup

Clone and navigate to project:

cd /path/to/ca-codes-mcp-server

Create virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configure environment:

cp .env.example .env
# Edit .env with your credentials

Required environment variables:

# MongoDB Atlas
MONGODB_URI=mongodb+srv://user:pass@cluster.mongodb.net/
DATABASE_NAME=ca_codes_db
COLLECTION_NAME=section_contents
VECTOR_INDEX_NAME=legal_codes_vector_index

# Voyage AI
VOYAGE_API_KEY=pa-your-api-key
VOYAGE_MODEL=voyage-law-2

# Anti-Hallucination Settings
MANDATORY_VALIDATION=true
CITATION_TRACKING=true
MIN_RETRIEVAL_SCORE=0.7

# Agent Configuration
CONTEXT_TIMEOUT=3600
CACHE_TTL=1800

Usage

Running the Server

python server.py

The server communicates via stdio using the MCP protocol.

Available Tools

1. semantic_search

Search legal codes using natural language:

{
  "query": "What are the rules about meal breaks for employees?",
  "code_filter": "LAB",
  "limit": 5
}

2. get_section

Retrieve specific code section:

{
  "code": "LAB",
  "section": "2922"
}

3. find_similar

Find related statutes:

{
  "code": "LAB",
  "section": "2922",
  "limit": 5
}

4. validate_citation

Validate a legal citation:

{
  "code": "LAB",
  "section": "2922",
  "claimed_content": "Employment may be terminated at will..."
}

5. research_workflow

Comprehensive research with validation:

{
  "question": "What are the statutory requirements for wrongful termination claims?",
  "depth": "thorough",
  "validate_all": true
}

6. draft_declaration

Draft legal declaration:

{
  "declarant_name": "John Doe",
  "facts": [
    "I was employed by XYZ Corp from 2020 to 2023",
    "I was terminated without cause on March 1, 2023"
  ],
  "purpose": "wrongful termination",
  "party_type": "plaintiff"
}

7. draft_mpa

Draft Memorandum of Points and Authorities:

{
  "issue": "Whether plaintiff's termination violated Labor Code § 2922",
  "position": "Plaintiff's termination was wrongful",
  "facts": "Plaintiff was employed for 3 years..."
}

8. plan_strategy

Develop legal strategy:

{
  "situation": "Client was terminated after reporting safety violations",
  "goals": [
    "Obtain compensation for lost wages",
    "Reinstatement to position"
  ]
}

9. get_validation_report

Get citation accuracy report:

{
  "session_id": "optional-session-id"
}

Anti-Hallucination Strategy

How It Works

1. Retrieval-First Pattern

When user mentions specific citations:

User: "What does LAB § 2922 say?"
     ↓
1. Call get_section("LAB", "2922")
2. Retrieve actual text from database
3. Use retrieved text as context
4. Generate response from actual content
✓ No hallucination possible

2. Generation-With-Tools Pattern

For open-ended questions:

User: "What are the meal break rules?"
     ↓
1. Call semantic_search("meal breaks employment")
2. Retrieve top sections: LAB § 512, LAB § 226.7
3. Get full text for each
4. Generate response using retrieved content
5. Validate all citations
✓ All claims backed by database

3. Hybrid-Draft Pattern

For document creation:

User: "Draft a declaration for wrongful termination"
     ↓
1. Search for applicable laws
2. Retrieve all relevant sections
3. Draft using ONLY retrieved content
4. Validate every citation
5. Return with validation report
✓ Every claim has database backing

System Prompts

The server provides explicit prompts to Claude:

Key Rules:

NEVER cite without retrieving first
MANDATORY validation before asserting claims
Separate retrieved facts from analysis
Track all citations
Flag uncertain claims

Citation Tracking

Every tool call logs:

Retrieved sections
Citations used
Validation status
Accuracy rate

Get report anytime:

{
  "total_retrievals": 15,
  "total_citations": 15,
  "unvalidated_claims": [],
  "accuracy_rate": 1.0,
  "is_clean": true
}

Integration with Claude

Claude Desktop Configuration

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "ca-legal-codes": {
      "command": "python",
      "args": ["/path/to/ca-codes-mcp-server/server.py"],
      "env": {
        "MONGODB_URI": "your-mongodb-uri",
        "VOYAGE_API_KEY": "your-voyage-key"
      }
    }
  }
}

Example Conversation

User: "I need to understand California's at-will employment doctrine."

Claude (using MCP server):

Calls semantic_search("california at-will employment doctrine")
Retrieves LAB § 2922
Calls get_section("LAB", "2922") to get full text
Responds with actual code content

Result: No hallucination - response based on actual database content.

Development

Running Tests

pytest tests/ -v

Logging

Structured logging with structlog:

logger.info("tool_called", tool="semantic_search", query="meal breaks")

Adding New Tools

Create tool handler in appropriate module (tools/)
Register tool in server.py TOOLS list
Add route in call_tool() function
Update documentation

Deployment

Docker

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "server.py"]

Docker Compose

ca-codes-mcp-server:
  build:
    context: ./ca-codes-mcp-server
  environment:
    - MONGODB_URI=${MONGODB_URI}
    - VOYAGE_API_KEY=${VOYAGE_API_KEY}
  networks:
    - ca-codes-network

Performance

Vector Search: ~100-200ms per query
Direct Retrieval: ~50-100ms per section
Validation: ~50ms per citation
Caching: 30min TTL for frequent queries

Security

Environment variables for sensitive data
Read-only database access recommended
Input validation on all tools
Rate limiting recommended for production

Troubleshooting

MongoDB Connection Issues

# Test connection
python -c "from db import get_db_client; import asyncio; asyncio.run(get_db_client())"

Voyage AI API Issues

# Test API key
python -c "from utils import get_voyage_client; client = get_voyage_client(); print('OK')"

MCP Protocol Issues

Ensure stdio communication
Check Claude Desktop logs
Verify tool schemas match MCP spec

License

MIT License - See LICENSE file

Support

Issues: GitHub Issues
Documentation: This README
Contact: [Your contact info]

Changelog

v1.0.0 (2025-01-XX)

Initial release
10 core tools
Anti-hallucination system
MongoDB Atlas + Voyage AI integration
Session management
Citation tracking