ayunis-core/ayunis-legal-mcp
If you are the rightful owner of ayunis-legal-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Legal MCP Server is a modern Model Context Protocol (MCP) server designed to facilitate AI assistants in querying and interacting with German legal texts.
Legal MCP - German Legal Texts Search System
A comprehensive system for searching and analyzing German legal texts using vector embeddings and semantic search, consisting of:
- Store API: FastAPI backend with PostgreSQL, pgvector, and Ollama embeddings
- MCP Server: FastMCP server providing tools for AI assistants to query legal texts
- CLI Tool: Command-line interface for importing and querying legal texts
- Web Scraper: Automatic extraction of legal texts from gesetze-im-internet.de
- XML Parser: Comprehensive parser for German legal XML format (gii-norm.dtd)
Table of Contents
- Features
- Architecture
- Quick Start
- CLI Tool
- Environment Configuration
- API Documentation
- Legal Text Features
- XML Parser
- Development
- Docker Commands
- Troubleshooting
- Project Structure
- Technology Stack
Features
Store API Features
- 🗄️ PostgreSQL + pgvector - Vector database for semantic search
- 🤖 Ollama Integration - Generate embeddings for legal texts
- 🌐 Web Scraping - Automatic extraction from gesetze-im-internet.de
- 📄 XML Parsing - Comprehensive parser for German legal XML format
- 🔍 Semantic Search - Vector-based similarity search for legal texts
- 📊 Metadata Tracking - Full document metadata and versioning
- 📝 RESTful API - FastAPI with automatic documentation
- 🐳 Docker Support - Easy deployment with containerization
MCP Server Features
- 🔧 FastMCP - Modern MCP server implementation
- 🤝 AI Assistant Integration - Provides tools for querying legal texts
- 🔌 HTTP API Client - Connects to Store API for data access
CLI Tool Features
- 📋 List Commands - View imported codes and available catalog
- 📥 Import Commands - Import legal codes with progress indication
- 🔍 Query Commands - Retrieve texts by code, section, and sub-section
- 🔎 Search Commands - Semantic search with similarity scoring
- 📊 Multiple Output Formats - Table view or JSON output
- ⚙️ Configurable - Custom API URL support via flag or environment variable
Architecture
┌──────────────────────────────────────────────────┐
│ │
│ Docker Network: legal-mcp-network │
│ │
│ ┌────────────────┐ │
│ │ MCP Server │ :8001 │
│ │ (FastMCP) │ │
│ └───────┬────────┘ │
│ │ │
│ │ LEGAL_API_BASE_URL │
│ │ http://store-api:8000 │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ Store API │ :8000 │
│ │ (FastAPI) │ │
│ └───────┬────────┘ │
│ │ │
│ │ DATABASE_URL │
│ │ postgresql://postgres:5432 │
│ │ OLLAMA_BASE_URL │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ PostgreSQL │ :5432 │
│ │ + pgvector │ │
│ └────────────────┘ │
│ │
└──────────────────────────────────────────────────┘
│
│ External Ollama Service
│ (for embeddings)
▼
┌────────────────┐
│ Ollama API │
│ (Remote/Local)│
└────────────────┘
Quick Start
Prerequisites
- Docker and Docker Compose
- Ollama (local or remote endpoint for embeddings)
- Git
⚠️ Important: Ollama Embedding Model
By default, this project uses the embedding model:
ryanshillington/Qwen3-Embedding-4B:latestYou must pull this model (or your configured alternative) before importing legal texts:
ollama pull ryanshillington/Qwen3-Embedding-4B:latestYou can use a different model by setting the
OLLAMA_EMBEDDING_MODELenvironment variable, but the model must produce 2560-dimensional vectors. Using a model with different dimensions will cause errors, as the database schema is fixed at 2560 dimensions. Changing to a model with different dimensions would require database schema modifications and re-importing all legal texts.
1. Clone and Setup
# Clone the repository
git clone <repository-url>
cd legal-mcp
# Copy environment file
cp .env.example .env
# Edit .env with your configuration
# Update OLLAMA_BASE_URL and OLLAMA_AUTH_TOKEN if needed
2. Start All Services
# Build and start all services
docker-compose up -d
# Check service status
docker-compose ps
This will start:
- PostgreSQL (port 5432) - Database with pgvector extension
- Store API (port 8000) - FastAPI backend for legal texts
- MCP Server (port 8001) - FastMCP server for AI assistants
3. Run Database Migrations
# Run Alembic migrations to set up the database
docker-compose exec store-api alembic upgrade head
4. Import Legal Texts
# Import a test legal code (e.g., rag_1)
curl -X POST http://localhost:8000/legal-texts/gesetze-im-internet/rag_1
# Import German Civil Code (BGB)
curl -X POST http://localhost:8000/legal-texts/gesetze-im-internet/bgb
# Import other legal codes
curl -X POST http://localhost:8000/legal-texts/gesetze-im-internet/stgb # Criminal Code
curl -X POST http://localhost:8000/legal-texts/gesetze-im-internet/gg # Constitution
5. Test the API
# Check API health
curl http://localhost:8000/health
# Query legal texts by section
curl "http://localhost:8000/legal-texts/gesetze-im-internet/rag_1?section=%C2%A7%201"
# Semantic search (requires embeddings)
curl "http://localhost:8000/legal-texts/gesetze-im-internet/rag_1/search?q=Versicherung&limit=5"
# Access interactive API documentation
open http://localhost:8000/docs
CLI Tool
The CLI provides a convenient command-line interface for managing legal texts without writing code.
Installation
# Install in development mode (from project root)
pip install -e .
# Verify installation
legal-mcp --help
Prerequisites
The CLI requires the Store API to be running:
# Start all services
docker-compose up -d
# Verify Store API is running
curl http://localhost:8000/health
Available Commands
List Commands
List Imported Codes
# Show all imported legal codes in table format
legal-mcp list codes
# Output as JSON
legal-mcp list codes --json
List Available Catalog
# Show all available legal codes that can be imported
legal-mcp list catalog
# Output as JSON
legal-mcp list catalog --json
Import Command
# Import a single legal code
legal-mcp import --code bgb
# Import multiple legal codes
legal-mcp import --code bgb --code stgb --code gg
# Import with JSON output
legal-mcp import --code bgb --json
The import command displays a spinner while processing and shows progress for each code.
Query Command
# Query all texts for a legal code
legal-mcp query bgb
# Query specific section
legal-mcp query bgb --section "§ 1"
# Query specific sub-section
legal-mcp query bgb --section "§ 1" --sub-section "1"
# Output as JSON
legal-mcp query bgb --section "§ 1" --json
Search Command
# Semantic search in a legal code
legal-mcp search bgb "Kaufvertrag"
# Limit number of results
legal-mcp search bgb "Kaufvertrag" --limit 5
# Set similarity cutoff threshold (0-2, lower = stricter)
legal-mcp search bgb "Kaufvertrag" --cutoff 0.5
# Output as JSON
legal-mcp search bgb "Kaufvertrag" --json
Configuration
Default API URL: http://localhost:8000
Override with environment variable:
export LEGAL_API_BASE_URL=http://custom-host:8000
legal-mcp list codes
Override with command flag:
legal-mcp list codes --api-url http://custom-host:8000
Output Formats
Table Format (default):
- Clean, formatted tables with Rich library
- Text truncation for readability
- Color-coded output
JSON Format:
- Complete data with full text content
- Machine-readable for scripting
- Use
--jsonflag with any command
Example Workflow
# 1. Check available legal codes
legal-mcp list catalog
# 2. Import desired codes
legal-mcp import --code bgb --code stgb
# 3. Verify imports
legal-mcp list codes
# 4. Query specific sections
legal-mcp query bgb --section "§ 433"
# 5. Perform semantic search
legal-mcp search bgb "Kaufvertrag" --limit 10
Environment Configuration
The application uses a .env file for configuration. See .env.example for a template.
Required Environment Variables
# Ollama Configuration
OLLAMA_BASE_URL=https://your-ollama-endpoint.com
OLLAMA_AUTH_TOKEN=your-auth-token-here
OLLAMA_EMBEDDING_MODEL=ryanshillington/Qwen3-Embedding-4B:latest # Optional, this is the default
# PostgreSQL Configuration
POSTGRES_HOST=postgres # Use 'postgres' in Docker, 'localhost' for local dev
Note: The
OLLAMA_EMBEDDING_MODELvariable allows you to use a different embedding model. However, any alternative model must produce 2560-dimensional vectors to be compatible with the database schema. The default model (ryanshillington/Qwen3-Embedding-4B:latest) is recommended.
Additional Configuration (set in docker-compose.yml)
# Database URL (automatically constructed)
DATABASE_URL=postgresql+asyncpg://legal_mcp:legal_mcp_password@postgres:5432/legal_mcp_db
# MCP Server Configuration
LEGAL_API_BASE_URL=http://store-api:8000
API Documentation
Once running, access the interactive API documentation:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Main Endpoints
Legal Texts
POST /legal-texts/gesetze-im-internet/{book}- Import legal text with embeddingsGET /legal-texts/gesetze-im-internet/{code}- Query legal texts by code/sectionGET /legal-texts/gesetze-im-internet/{code}/search- Semantic search with embeddings
System
GET /health- Health check endpointGET /- API information
MCP Server
The MCP Server provides tools for AI assistants to interact with the legal text database through the Model Context Protocol.
Available Tools
The MCP Server exposes the following tools:
-
search_legal_texts- Perform semantic search on legal texts- Parameters:
query,code,limit(1-20),cutoff(0-2) - Returns: List of matching legal text sections with similarity scores
- Parameters:
-
get_legal_section- Retrieve specific legal text sections- Parameters:
code,section,sub_section(optional) - Returns: List of legal text sections matching the criteria
- Parameters:
-
import_legal_code- Import a complete legal code from Gesetze im Internet- Parameters:
code - Returns: Success message with import statistics
- Parameters:
-
get_available_codes- Get all available legal codes in the database- Returns: List of legal code identifiers
Using the MCP Server
The MCP Server runs on port 8001 and can be accessed by MCP-compatible clients:
# Check MCP server is running
curl http://localhost:8001/health
# The MCP server automatically connects to the Store API
# using LEGAL_API_BASE_URL environment variable
For AI assistants, configure the MCP client to connect to http://localhost:8001 (or the appropriate host/port for your deployment).
Legal Text Features
Importing Legal Texts
The system automatically:
- Scrapes legal text XML from gesetze-im-internet.de
- Parses the XML into structured legal text sections
- Generates embeddings for each text section using Ollama
- Stores the texts with their embeddings in PostgreSQL with pgvector
Querying Legal Texts
Query by section identifier:
curl "http://localhost:8000/legal-texts/gesetze-im-internet/bgb?section=%C2%A7%201"
Semantic Search
Search using natural language with vector similarity:
curl "http://localhost:8000/legal-texts/gesetze-im-internet/bgb/search?q=Kaufvertrag&limit=5&cutoff=0.7"
Parameters:
q- Search query (required)limit- Maximum results (1-100, default: 10)cutoff- Similarity threshold (0-2, default: 0.5)- Lower values = stricter matching
- 0.3-0.5: Very strict
- 0.6-0.7: Good balance
- 0.8-1.0: More permissive
XML Parser
The system includes a comprehensive parser for the gii-norm.dtd format used by gesetze-im-internet.de.
Parser Features
- Complete DTD Coverage - All major elements from gii-norm.dtd
- Structured Data - Type-safe dataclasses for all structures
- Text Extraction - Handles complex nested text with formatting
- Table Support - Captures table structures
- Footnote Handling - Extracts footnotes with references
- Metadata Parsing - Complete metadata extraction
Using the Parser
from app.scrapers import GesetzteImInternetScraper
# The scraper automatically uses the XML parser
scraper = GesetzteImInternetScraper()
legal_texts = scraper.scrape('bgb')
for text in legal_texts:
print(f"Section: {text.section}")
print(f"Text: {text.text}")
Parsed Metadata
The parser extracts:
- Legal abbreviations (jurabk, amtabk)
- Dates (ausfertigung-datum)
- Citations (fundstelle)
- Titles (kurzue, langue, titel)
- Structural classification (gliederungseinheit)
- Section designations (enbez)
- Version information (standangabe)
Development
Local Development (without Docker)
-
Install dependencies:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements.txt # Install CLI tool in development mode pip install -e . -
Set up local database:
# Start only PostgreSQL docker-compose up postgres -d # Update .env to use localhost # POSTGRES_HOST=localhost -
Run migrations:
cd store alembic upgrade head -
Start Store API:
cd store uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 -
Start MCP Server:
cd mcp export LEGAL_API_BASE_URL=http://localhost:8000 python -m server.main
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=app tests/
# Run specific test file
pytest tests/test_main.py -v
# Run CLI tests specifically
pytest tests/cli/ -v
Contributing
We welcome contributions from the community! Please see our for details on:
- How to report bugs
- How to suggest features
- How to submit pull requests
- Development setup instructions
- Code style guidelines
Code of Conduct
This project adheres to the Contributor Covenant . By participating, you are expected to uphold this code. Please report unacceptable behavior through the project's reporting mechanisms.
Security
Security is important to us. If you discover a security vulnerability, please follow our for responsible disclosure. Do not open public issues for security vulnerabilities.
License
This project is licensed under the MIT License - see the file for details.
Acknowledgments
- Legal texts sourced from Gesetze im Internet
- Built with FastAPI, FastMCP, and Ollama
- Vector similarity search powered by pgvector
Support
- Issues: Open an issue on GitHub for bugs or feature requests
- Discussions: Use GitHub Discussions for questions and community chat
Made with ❤️ for the gov tech community