data-quality-mcp-server

aCuriousProgrammer/data-quality-mcp-server

3.1

If you are the rightful owner of data-quality-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Data Quality Analytics Platform is a comprehensive solution for data quality analysis and validation, integrating Model Context Protocol (MCP) for seamless operation with various clients.

Tools
7
Resources
0
Prompts
0

Data Quality Analytics Platform

A comprehensive data quality analysis and validation platform built with Model Context Protocol (MCP) integration. This project provides intelligent data profiling, quality rule generation, and validation capabilities using advanced analytics and Great Expectations framework.

🎯 Overview

This platform offers a complete data quality solution with:

  • Intelligent Data Profiling: Advanced data analysis using custom algorithms and ydata-profiling
  • Quality Rule Generation: Automated generation of Great Expectations-compatible validation rules
  • Multi-dimensional Quality Assessment: Coverage across all 6 data quality dimensions
  • MCP Integration: Seamless integration with Claude Desktop and other MCP clients
  • Comprehensive Validation: Data validation against generated expectation suites

🏗️ Architecture

data-quality/
├── data-quality-mcp-server/          # Main MCP server application
│   ├── src/                          # Source code
│   │   ├── server.py                 # Main MCP server implementation
│   │   └── data_quality/            # Core data quality modules
│   │       ├── profiler.py          # Advanced data profiling
│   │       ├── rule_generator.py    # Quality rule generation
│   │       └── validator.py         # Data validation engine
│   ├── docs/                        # Documentation
│   ├── examples/                    # Sample data and examples
│   ├── tests/                       # Test suite
│   └── requirements.txt             # Python dependencies
└── README.md                        # This file

🚀 Quick Start

Prerequisites

  • Python 3.8+
  • Git

Installation

  1. Clone the repository

    git clone https://github.com/aCuriousProgrammer/data-quality-mcp-server.git
    cd data-quality-mcp-server
    
  2. Set up virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies

    pip install -r requirements.txt
    
  4. Configure Claude Desktop

    • Copy claude_config.template.json to claude_config.json
    • Update the paths in the configuration file
    • Restart Claude Desktop

Start the Server

# For Claude Desktop (stdio transport)
./run_stdio_server.sh

# For HTTP transport
python run_server.py --port 8000

📊 Core Features

1. Data Profiling & Analysis

  • Comprehensive Data Profiling: Detailed analysis of data structure, statistics, and quality metrics
  • Quality Issue Detection: Automatic identification of missing values, outliers, duplicates, and inconsistencies
  • Statistical Analysis: Advanced statistical measures including skewness, kurtosis, and distribution analysis
  • Pattern Recognition: Detection of data patterns, formats, and anomalies

2. Quality Rule Generation

  • Multi-dimensional Rules: Coverage across all 6 data quality dimensions:
    • Completeness: Null value checks, missing data detection
    • Accuracy: Range validation, outlier detection, statistical accuracy
    • Consistency: Data type consistency, format standardization
    • Validity: Format validation, business rule compliance
    • Uniqueness: Duplicate detection, key uniqueness
    • Timeliness: Data freshness, temporal validation
  • Great Expectations Integration: Automatic generation of GE-compatible expectation suites
  • Confidence-based Filtering: Rules filtered by confidence thresholds
  • Custom Business Rules: Support for domain-specific validation rules

3. Data Validation

  • Expectation Suite Validation: Comprehensive validation against generated rules
  • Detailed Reporting: Success/failure analysis with detailed metrics
  • Quality Scoring: Overall data quality scores and recommendations
  • Export Capabilities: HTML and JSON report generation

4. MCP Integration

  • Claude Desktop Support: Direct integration with Claude AI assistant
  • Tool-based Interface: Easy-to-use tools for data quality analysis
  • Resource Management: Persistent storage of analysis results and rules
  • Real-time Analysis: Immediate feedback and recommendations

🛠️ Available Tools

Analysis Tools

  • analyze_data_file(file_path): Comprehensive file-based analysis
  • analyze_sample_data(data_rows): Direct data row analysis
  • get_data_profile(data_source): Retrieve comprehensive data profiles
  • create_comprehensive_profile(file_path, ...): Generate detailed profiles

Rule Generation Tools

  • suggest_quality_rules(analysis_results): Generate initial quality rules
  • generate_expectation_suite(analysis_results, ...): Create Great Expectations suites

Validation Tools

  • validate_data_quality(file_path, expectation_suite, ...): Validate data against rules

Resource Access

  • schema://current: Current data schema
  • profile://summary: Latest profiling results
  • rules://suggested: Suggested quality rules
  • expectations://suite/{suite_name}: Specific expectation suites
  • validation://results/{validation_name}: Validation results

📈 Quality Dimensions

Completeness

  • Null value detection and analysis
  • Missing data pattern identification
  • Required field validation
  • Data coverage assessment

Accuracy

  • Statistical accuracy validation
  • Outlier detection and analysis
  • Range and boundary validation
  • Cross-field consistency checks

Consistency

  • Data type consistency validation
  • Format standardization checks
  • Cross-dataset consistency
  • Temporal consistency validation

Validity

  • Format validation (email, phone, etc.)
  • Business rule compliance
  • Domain-specific validation
  • Regulatory compliance checks

Uniqueness

  • Duplicate record detection
  • Primary key validation
  • Unique constraint checking
  • Referential integrity validation

Timeliness

  • Data freshness assessment
  • Temporal pattern analysis
  • Update frequency monitoring
  • Real-time data validation

🔧 Configuration

Environment Variables

Create a .env file in the data-quality-mcp-server directory:

# API Configuration
ANTHROPIC_API_KEY=your_api_key_here

# Server Configuration
SERVER_HOST=localhost
SERVER_PORT=8000

# Logging Configuration
LOG_LEVEL=INFO

Claude Desktop Configuration

Update claude_config.json with your specific paths:

{
  "mcpServers": {
    "data-quality-analytics-server": {
      "command": "./run_stdio_server.sh",
      "args": [],
      "env": {},
      "cwd": "."
    }
  }
}

📚 Documentation

  • : Detailed usage instructions and examples
  • : Complete API reference
  • : Sample data and usage examples

🧪 Testing

# Run the test suite
cd data-quality-mcp-server
python -m pytest tests/

# Test server functionality
python test_server.py

🔒 Security

  • Sensitive File Exclusion: .gitignore excludes sensitive files and paths
  • Template Configuration: Template files provided for configuration
  • Environment Variables: Secure handling of API keys and secrets
  • Path Sanitization: Validation of file paths and data sources

🔗 Links


Built with ❤️ for data quality excellence