data-ingestion-contract-generator

JacobJNilsson/data-ingestion-contract-generator

3.2

If you are the rightful owner of data-ingestion-contract-generator and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Data Ingestion Contract Generator is an MCP server designed to automate the generation and validation of contracts for data ingestion pipelines.

Data Ingestion Contract Generator

CI Python 3.13+

An MCP (Model Context Protocol) server that automatically generates and validates contracts for data ingestion pipelines. Designed to help AI agents and developers build reliable, type-safe data workflows with automated schema detection and quality assessment.

Overview

Building data ingestion pipelines is complex and error-prone, especially when dealing with:

  • Diverse file formats and encodings (CSV, JSON, etc.)
  • International number formats (European vs US)
  • Data quality issues (UTF-8 BOMs, sparse data, missing values)
  • Schema mismatches between source and destination
  • Transformation logic that's hard to track and maintain

This MCP server solves these problems with a three-contract architecture that separates concerns and makes data pipelines explicit, validated, and maintainable.

Three-Contract Architecture

1. Source Contracts

Describe where data comes from. Automatically analyzes and documents:

  • File sources: CSV, JSON, and other file formats with encoding detection
  • Database sources: PostgreSQL, MySQL, SQLite tables and queries
  • Schema inference with data types
  • Quality metrics and data profiling
  • Format-specific handling (UTF-8 BOM, European numbers, etc.)

2. Destination Contracts

Define where data goes. Specifies:

  • Target schema and data types
  • Validation rules and constraints
  • Required fields and uniqueness constraints
  • Data quality requirements

3. Transformation Contracts

Map source to destination. Defines:

  • Field mappings between source and destination
  • Transformation logic (type conversions, formatting)
  • Enrichment rules (derived fields, lookups)
  • Execution configuration (batch size, error handling)

See the for detailed usage and examples.

Quick Start

Prerequisites

  • Python 3.13+
  • uv package manager

Installation

# Install with uv tool
uv tool install https://github.com/JacobJNilsson/data-ingestion-contract-generator/releases/download/v0.1.0/ingestion_contract_mcp-0.1.0-py3-none-any.whl

# Verify
contract-gen --version

Download the latest release from GitHub Releases.

Note: Requires Python 3.13+ and uv (curl -LsSf https://astral.sh/uv/install.sh | sh)

For Developers
# Clone the repository
git clone https://github.com/JacobJNilsson/data-ingestion-contract-generator.git
cd data-ingestion-contract-generator

# Install dependencies
uv sync

# Run tests to verify setup
make test

Command Line Interface (CLI)

The contract-gen CLI provides direct access to contract generation and validation.

For local development: Prefix commands with uv run (e.g., uv run contract-gen --help)

# Generate source contract from CSV
contract-gen source csv data/transactions.csv --id transactions --output contracts/source.json

# Generate destination contract
contract-gen destination csv --id output_data --output contracts/destination.json

# Validate contracts
contract-gen validate contracts/source.json

# Get help
contract-gen --help

Key features:

  • Auto-detects CSV encoding and delimiters
  • Multiple output formats (JSON, YAML)
  • Pretty-printed output with syntax highlighting
  • Comprehensive validation with detailed error messages
  • Batch processing support

Using with Cursor (MCP Server)

Add to your .cursor/mcp.json:

{
  "mcpServers": {
    "contract-generator": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/data-ingestion-contract-generator",
        "run",
        "mcp_server/server.py"
      ]
    }
  }
}

Then restart Cursor, and the contract generation tools will be available to AI assistants.

Development

Setup Development Environment

# Install dev dependencies
uv sync --all-extras

# Install pre-commit hooks (optional)
pre-commit install

Available Commands

make check         # Run all checks (lint + format-check + mypy)
make test          # Run pytest test suite
make format        # Format code with Ruff

All code is fully typed with Python 3.13+ type hints. CI runs automatically on pull requests.

Documentation

  • - Detailed tool documentation
  • - Architecture decisions and conventions
  • - Commit guidelines and workflow
  • - Coding standards

Use Cases

Command Line / CI/CD:

  • Generate contracts from scripts and pipelines
  • Validate contracts before deployment
  • Integrate with GitHub Actions or GitLab CI

Interactive Development:

  • Use the MCP server with Cursor for AI-assisted contract generation
  • Real-time validation and schema analysis

Data Engineering:

  • Document data sources automatically
  • Track schema changes over time
  • Ensure data quality across pipelines

Requirements

  • Python: 3.13+
  • MCP: 1.0.0+
  • Pydantic: 2.0.0+
  • Typer: 0.12.0+ (for CLI)
  • Rich: 13.0.0+ (for CLI pretty output)

See pyproject.toml for complete dependency list.

License

This project is licensed under AGPL-3.0 for open source use. See for details.

Commercial licenses are available if you want to use this software in a closed-source product. Contact jacobjnilsson@gmail.com for inquiries.