JacobJNilsson/data-ingestion-contract-generator
If you are the rightful owner of data-ingestion-contract-generator and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Data Ingestion Contract Generator is an MCP server designed to automate the generation and validation of contracts for data ingestion pipelines.
Data Ingestion Contract Generator
An MCP (Model Context Protocol) server that automatically generates and validates contracts for data ingestion pipelines. Designed to help AI agents and developers build reliable, type-safe data workflows with automated schema detection and quality assessment.
Overview
Building data ingestion pipelines is complex and error-prone, especially when dealing with:
- Diverse file formats and encodings (CSV, JSON, etc.)
- International number formats (European vs US)
- Data quality issues (UTF-8 BOMs, sparse data, missing values)
- Schema mismatches between source and destination
- Transformation logic that's hard to track and maintain
This MCP server solves these problems with a three-contract architecture that separates concerns and makes data pipelines explicit, validated, and maintainable.
Three-Contract Architecture
1. Source Contracts
Describe where data comes from. Automatically analyzes and documents:
- File sources: CSV, JSON, and other file formats with encoding detection
- Database sources: PostgreSQL, MySQL, SQLite tables and queries
- Schema inference with data types
- Quality metrics and data profiling
- Format-specific handling (UTF-8 BOM, European numbers, etc.)
2. Destination Contracts
Define where data goes. Specifies:
- Target schema and data types
- Validation rules and constraints
- Required fields and uniqueness constraints
- Data quality requirements
3. Transformation Contracts
Map source to destination. Defines:
- Field mappings between source and destination
- Transformation logic (type conversions, formatting)
- Enrichment rules (derived fields, lookups)
- Execution configuration (batch size, error handling)
See the for detailed usage and examples.
Quick Start
Prerequisites
- Python 3.13+
- uv package manager
Installation
# Install with uv tool
uv tool install https://github.com/JacobJNilsson/data-ingestion-contract-generator/releases/download/v0.1.0/ingestion_contract_mcp-0.1.0-py3-none-any.whl
# Verify
contract-gen --version
Download the latest release from GitHub Releases.
Note: Requires Python 3.13+ and uv (
curl -LsSf https://astral.sh/uv/install.sh | sh)
For Developers
# Clone the repository
git clone https://github.com/JacobJNilsson/data-ingestion-contract-generator.git
cd data-ingestion-contract-generator
# Install dependencies
uv sync
# Run tests to verify setup
make test
Command Line Interface (CLI)
The contract-gen CLI provides direct access to contract generation and validation.
For local development: Prefix commands with
uv run(e.g.,uv run contract-gen --help)
# Generate source contract from CSV
contract-gen source csv data/transactions.csv --id transactions --output contracts/source.json
# Generate destination contract
contract-gen destination csv --id output_data --output contracts/destination.json
# Validate contracts
contract-gen validate contracts/source.json
# Get help
contract-gen --help
Key features:
- Auto-detects CSV encoding and delimiters
- Multiple output formats (JSON, YAML)
- Pretty-printed output with syntax highlighting
- Comprehensive validation with detailed error messages
- Batch processing support
Using with Cursor (MCP Server)
Add to your .cursor/mcp.json:
{
"mcpServers": {
"contract-generator": {
"command": "uv",
"args": [
"--directory",
"/absolute/path/to/data-ingestion-contract-generator",
"run",
"mcp_server/server.py"
]
}
}
}
Then restart Cursor, and the contract generation tools will be available to AI assistants.
Development
Setup Development Environment
# Install dev dependencies
uv sync --all-extras
# Install pre-commit hooks (optional)
pre-commit install
Available Commands
make check # Run all checks (lint + format-check + mypy)
make test # Run pytest test suite
make format # Format code with Ruff
All code is fully typed with Python 3.13+ type hints. CI runs automatically on pull requests.
Documentation
- - Detailed tool documentation
- - Architecture decisions and conventions
- - Commit guidelines and workflow
- - Coding standards
Use Cases
Command Line / CI/CD:
- Generate contracts from scripts and pipelines
- Validate contracts before deployment
- Integrate with GitHub Actions or GitLab CI
Interactive Development:
- Use the MCP server with Cursor for AI-assisted contract generation
- Real-time validation and schema analysis
Data Engineering:
- Document data sources automatically
- Track schema changes over time
- Ensure data quality across pipelines
Requirements
- Python: 3.13+
- MCP: 1.0.0+
- Pydantic: 2.0.0+
- Typer: 0.12.0+ (for CLI)
- Rich: 13.0.0+ (for CLI pretty output)
See pyproject.toml for complete dependency list.
License
This project is licensed under AGPL-3.0 for open source use. See for details.
Commercial licenses are available if you want to use this software in a closed-source product. Contact jacobjnilsson@gmail.com for inquiries.