aCuriousProgrammer/data-quality-mcp-server
If you are the rightful owner of data-quality-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Data Quality Analytics Platform is a comprehensive solution for data quality analysis and validation, integrating Model Context Protocol (MCP) for seamless operation with various clients.
Data Quality Analytics Platform
A comprehensive data quality analysis and validation platform built with Model Context Protocol (MCP) integration. This project provides intelligent data profiling, quality rule generation, and validation capabilities using advanced analytics and Great Expectations framework.
🎯 Overview
This platform offers a complete data quality solution with:
- Intelligent Data Profiling: Advanced data analysis using custom algorithms and ydata-profiling
- Quality Rule Generation: Automated generation of Great Expectations-compatible validation rules
- Multi-dimensional Quality Assessment: Coverage across all 6 data quality dimensions
- MCP Integration: Seamless integration with Claude Desktop and other MCP clients
- Comprehensive Validation: Data validation against generated expectation suites
🏗️ Architecture
data-quality/
├── data-quality-mcp-server/ # Main MCP server application
│ ├── src/ # Source code
│ │ ├── server.py # Main MCP server implementation
│ │ └── data_quality/ # Core data quality modules
│ │ ├── profiler.py # Advanced data profiling
│ │ ├── rule_generator.py # Quality rule generation
│ │ └── validator.py # Data validation engine
│ ├── docs/ # Documentation
│ ├── examples/ # Sample data and examples
│ ├── tests/ # Test suite
│ └── requirements.txt # Python dependencies
└── README.md # This file
🚀 Quick Start
Prerequisites
- Python 3.8+
- Git
Installation
-
Clone the repository
git clone https://github.com/aCuriousProgrammer/data-quality-mcp-server.git cd data-quality-mcp-server -
Set up virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install dependencies
pip install -r requirements.txt -
Configure Claude Desktop
- Copy
claude_config.template.jsontoclaude_config.json - Update the paths in the configuration file
- Restart Claude Desktop
- Copy
Start the Server
# For Claude Desktop (stdio transport)
./run_stdio_server.sh
# For HTTP transport
python run_server.py --port 8000
📊 Core Features
1. Data Profiling & Analysis
- Comprehensive Data Profiling: Detailed analysis of data structure, statistics, and quality metrics
- Quality Issue Detection: Automatic identification of missing values, outliers, duplicates, and inconsistencies
- Statistical Analysis: Advanced statistical measures including skewness, kurtosis, and distribution analysis
- Pattern Recognition: Detection of data patterns, formats, and anomalies
2. Quality Rule Generation
- Multi-dimensional Rules: Coverage across all 6 data quality dimensions:
- Completeness: Null value checks, missing data detection
- Accuracy: Range validation, outlier detection, statistical accuracy
- Consistency: Data type consistency, format standardization
- Validity: Format validation, business rule compliance
- Uniqueness: Duplicate detection, key uniqueness
- Timeliness: Data freshness, temporal validation
- Great Expectations Integration: Automatic generation of GE-compatible expectation suites
- Confidence-based Filtering: Rules filtered by confidence thresholds
- Custom Business Rules: Support for domain-specific validation rules
3. Data Validation
- Expectation Suite Validation: Comprehensive validation against generated rules
- Detailed Reporting: Success/failure analysis with detailed metrics
- Quality Scoring: Overall data quality scores and recommendations
- Export Capabilities: HTML and JSON report generation
4. MCP Integration
- Claude Desktop Support: Direct integration with Claude AI assistant
- Tool-based Interface: Easy-to-use tools for data quality analysis
- Resource Management: Persistent storage of analysis results and rules
- Real-time Analysis: Immediate feedback and recommendations
🛠️ Available Tools
Analysis Tools
analyze_data_file(file_path): Comprehensive file-based analysisanalyze_sample_data(data_rows): Direct data row analysisget_data_profile(data_source): Retrieve comprehensive data profilescreate_comprehensive_profile(file_path, ...): Generate detailed profiles
Rule Generation Tools
suggest_quality_rules(analysis_results): Generate initial quality rulesgenerate_expectation_suite(analysis_results, ...): Create Great Expectations suites
Validation Tools
validate_data_quality(file_path, expectation_suite, ...): Validate data against rules
Resource Access
schema://current: Current data schemaprofile://summary: Latest profiling resultsrules://suggested: Suggested quality rulesexpectations://suite/{suite_name}: Specific expectation suitesvalidation://results/{validation_name}: Validation results
📈 Quality Dimensions
Completeness
- Null value detection and analysis
- Missing data pattern identification
- Required field validation
- Data coverage assessment
Accuracy
- Statistical accuracy validation
- Outlier detection and analysis
- Range and boundary validation
- Cross-field consistency checks
Consistency
- Data type consistency validation
- Format standardization checks
- Cross-dataset consistency
- Temporal consistency validation
Validity
- Format validation (email, phone, etc.)
- Business rule compliance
- Domain-specific validation
- Regulatory compliance checks
Uniqueness
- Duplicate record detection
- Primary key validation
- Unique constraint checking
- Referential integrity validation
Timeliness
- Data freshness assessment
- Temporal pattern analysis
- Update frequency monitoring
- Real-time data validation
🔧 Configuration
Environment Variables
Create a .env file in the data-quality-mcp-server directory:
# API Configuration
ANTHROPIC_API_KEY=your_api_key_here
# Server Configuration
SERVER_HOST=localhost
SERVER_PORT=8000
# Logging Configuration
LOG_LEVEL=INFO
Claude Desktop Configuration
Update claude_config.json with your specific paths:
{
"mcpServers": {
"data-quality-analytics-server": {
"command": "./run_stdio_server.sh",
"args": [],
"env": {},
"cwd": "."
}
}
}
📚 Documentation
- : Detailed usage instructions and examples
- : Complete API reference
- : Sample data and usage examples
🧪 Testing
# Run the test suite
cd data-quality-mcp-server
python -m pytest tests/
# Test server functionality
python test_server.py
🔒 Security
- Sensitive File Exclusion:
.gitignoreexcludes sensitive files and paths - Template Configuration: Template files provided for configuration
- Environment Variables: Secure handling of API keys and secrets
- Path Sanitization: Validation of file paths and data sources
🔗 Links
- Repository: https://github.com/aCuriousProgrammer/data-quality-mcp-server
- Great Expectations: https://greatexpectations.io/
- ydata-profiling: https://ydata-profiling.ydata.ai/
- Model Context Protocol: https://modelcontextprotocol.io/
Built with ❤️ for data quality excellence