ca-codes-mcp-server

qter21/ca-codes-mcp-server

3.2

If you are the rightful owner of ca-codes-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The CA Legal Codes MCP Server provides access to California legal codes using MongoDB Atlas and Voyage AI embeddings, with features to prevent hallucination.

Tools
9
Resources
0
Prompts
0

CA Legal Codes MCP Server

A Model Context Protocol (MCP) server that provides Claude with access to California legal codes through MongoDB Atlas and Voyage AI embeddings, with built-in anti-hallucination features.

Features

🎯 Core Capabilities

  • Semantic Search: Natural language search across 51,802 CA legal code sections using voyage-law-2 embeddings
  • Exact Retrieval: Direct access to specific code sections by citation
  • Related Statutes: Discover similar and related legal code sections
  • Citation Validation: Mandatory validation to prevent hallucination
  • Document Drafting: AI-assisted legal document creation with verified citations

🛡️ Anti-Hallucination System

  • Retrieval-First Workflow: Forces retrieval before citation
  • Citation Tracking: Monitors all retrievals vs citations
  • Validation Enforcement: Validates every legal claim
  • Structured Responses: Separates retrieved facts from analysis
  • System Prompts: Explicit instructions to prevent hallucination

Architecture

ca-codes-mcp-server/
├── server.py                    # Main MCP server with stdio protocol
├── config.py                    # Configuration management
├── tools/
│   ├── retrieval.py             # Semantic search, get_section, find_similar
│   ├── validation.py            # Citation validation
│   ├── drafting.py              # Document drafting (declarations, MPAs, strategy)
│   └── workflows.py             # Orchestrated multi-step workflows
├── anti_hallucination/
│   ├── tracker.py               # Citation tracking
│   ├── validator.py             # Validation logic
│   └── prompts.py               # System prompts for Claude
├── context/
│   ├── session.py               # Session management
│   └── cache.py                 # Result caching
├── db/
│   ├── mongodb.py               # MongoDB Atlas operations
│   └── vector_ops.py            # Vector search
├── utils/
│   ├── voyage.py                # Voyage AI client
│   └── formatters.py            # Response formatting
└── models/
    ├── responses.py             # Structured response models
    └── documents.py             # Document templates

Installation

Prerequisites

  • Python 3.9+
  • MongoDB Atlas cluster with CA legal codes data
  • Voyage AI API key

Setup

  1. Clone and navigate to project:
cd /path/to/ca-codes-mcp-server
  1. Create virtual environment:
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure environment:
cp .env.example .env
# Edit .env with your credentials

Required environment variables:

# MongoDB Atlas
MONGODB_URI=mongodb+srv://user:pass@cluster.mongodb.net/
DATABASE_NAME=ca_codes_db
COLLECTION_NAME=section_contents
VECTOR_INDEX_NAME=legal_codes_vector_index

# Voyage AI
VOYAGE_API_KEY=pa-your-api-key
VOYAGE_MODEL=voyage-law-2

# Anti-Hallucination Settings
MANDATORY_VALIDATION=true
CITATION_TRACKING=true
MIN_RETRIEVAL_SCORE=0.7

# Agent Configuration
CONTEXT_TIMEOUT=3600
CACHE_TTL=1800

Usage

Running the Server

python server.py

The server communicates via stdio using the MCP protocol.

Available Tools

1. semantic_search

Search legal codes using natural language:

{
  "query": "What are the rules about meal breaks for employees?",
  "code_filter": "LAB",
  "limit": 5
}
2. get_section

Retrieve specific code section:

{
  "code": "LAB",
  "section": "2922"
}
3. find_similar

Find related statutes:

{
  "code": "LAB",
  "section": "2922",
  "limit": 5
}
4. validate_citation

Validate a legal citation:

{
  "code": "LAB",
  "section": "2922",
  "claimed_content": "Employment may be terminated at will..."
}
5. research_workflow

Comprehensive research with validation:

{
  "question": "What are the statutory requirements for wrongful termination claims?",
  "depth": "thorough",
  "validate_all": true
}
6. draft_declaration

Draft legal declaration:

{
  "declarant_name": "John Doe",
  "facts": [
    "I was employed by XYZ Corp from 2020 to 2023",
    "I was terminated without cause on March 1, 2023"
  ],
  "purpose": "wrongful termination",
  "party_type": "plaintiff"
}
7. draft_mpa

Draft Memorandum of Points and Authorities:

{
  "issue": "Whether plaintiff's termination violated Labor Code § 2922",
  "position": "Plaintiff's termination was wrongful",
  "facts": "Plaintiff was employed for 3 years..."
}
8. plan_strategy

Develop legal strategy:

{
  "situation": "Client was terminated after reporting safety violations",
  "goals": [
    "Obtain compensation for lost wages",
    "Reinstatement to position"
  ]
}
9. get_validation_report

Get citation accuracy report:

{
  "session_id": "optional-session-id"
}

Anti-Hallucination Strategy

How It Works

1. Retrieval-First Pattern

When user mentions specific citations:

User: "What does LAB § 2922 say?"
     ↓
1. Call get_section("LAB", "2922")
2. Retrieve actual text from database
3. Use retrieved text as context
4. Generate response from actual content
✓ No hallucination possible
2. Generation-With-Tools Pattern

For open-ended questions:

User: "What are the meal break rules?"
     ↓
1. Call semantic_search("meal breaks employment")
2. Retrieve top sections: LAB § 512, LAB § 226.7
3. Get full text for each
4. Generate response using retrieved content
5. Validate all citations
✓ All claims backed by database
3. Hybrid-Draft Pattern

For document creation:

User: "Draft a declaration for wrongful termination"
     ↓
1. Search for applicable laws
2. Retrieve all relevant sections
3. Draft using ONLY retrieved content
4. Validate every citation
5. Return with validation report
✓ Every claim has database backing

System Prompts

The server provides explicit prompts to Claude:

Key Rules:

  1. NEVER cite without retrieving first
  2. MANDATORY validation before asserting claims
  3. Separate retrieved facts from analysis
  4. Track all citations
  5. Flag uncertain claims

Citation Tracking

Every tool call logs:

  • Retrieved sections
  • Citations used
  • Validation status
  • Accuracy rate

Get report anytime:

{
  "total_retrievals": 15,
  "total_citations": 15,
  "unvalidated_claims": [],
  "accuracy_rate": 1.0,
  "is_clean": true
}

Integration with Claude

Claude Desktop Configuration

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "ca-legal-codes": {
      "command": "python",
      "args": ["/path/to/ca-codes-mcp-server/server.py"],
      "env": {
        "MONGODB_URI": "your-mongodb-uri",
        "VOYAGE_API_KEY": "your-voyage-key"
      }
    }
  }
}

Example Conversation

User: "I need to understand California's at-will employment doctrine."

Claude (using MCP server):

  1. Calls semantic_search("california at-will employment doctrine")
  2. Retrieves LAB § 2922
  3. Calls get_section("LAB", "2922") to get full text
  4. Responds with actual code content

Result: No hallucination - response based on actual database content.

Development

Running Tests

pytest tests/ -v

Logging

Structured logging with structlog:

logger.info("tool_called", tool="semantic_search", query="meal breaks")

Adding New Tools

  1. Create tool handler in appropriate module (tools/)
  2. Register tool in server.py TOOLS list
  3. Add route in call_tool() function
  4. Update documentation

Deployment

Docker

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "server.py"]

Docker Compose

ca-codes-mcp-server:
  build:
    context: ./ca-codes-mcp-server
  environment:
    - MONGODB_URI=${MONGODB_URI}
    - VOYAGE_API_KEY=${VOYAGE_API_KEY}
  networks:
    - ca-codes-network

Performance

  • Vector Search: ~100-200ms per query
  • Direct Retrieval: ~50-100ms per section
  • Validation: ~50ms per citation
  • Caching: 30min TTL for frequent queries

Security

  • Environment variables for sensitive data
  • Read-only database access recommended
  • Input validation on all tools
  • Rate limiting recommended for production

Troubleshooting

MongoDB Connection Issues

# Test connection
python -c "from db import get_db_client; import asyncio; asyncio.run(get_db_client())"

Voyage AI API Issues

# Test API key
python -c "from utils import get_voyage_client; client = get_voyage_client(); print('OK')"

MCP Protocol Issues

  • Ensure stdio communication
  • Check Claude Desktop logs
  • Verify tool schemas match MCP spec

License

MIT License - See LICENSE file

Support

  • Issues: GitHub Issues
  • Documentation: This README
  • Contact: [Your contact info]

Changelog

v1.0.0 (2025-01-XX)

  • Initial release
  • 10 core tools
  • Anti-hallucination system
  • MongoDB Atlas + Voyage AI integration
  • Session management
  • Citation tracking