savourylie/eddy
If you are the rightful owner of eddy and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Eddy is an MCP server designed for instant EDA across various data formats, providing a seamless experience directly from your IDE.
Eddy ๐
Instant EDA (Exploratory Data Analysis) MCP Server
Eddy is a Model Context Protocol (MCP) server that provides comprehensive exploratory data analysis for CSV, Parquet, JSON, and Excel files. Point it at your data to get instant insights - preview tables, infer schemas, detect types, handle nulls, compute statistics, and create visualizations right from your AI assistant.
โจ Features
๐ Universal Data Loading
- Auto-detection for CSV, JSON, Parquet, and Excel files
- Smart previews with configurable row limits
- Schema extraction with data types and structure
- File metadata including size and format details
๐ Data Quality Assessment
- Missing data analysis - identify and quantify nulls
- Duplicate detection - find and count duplicate rows
- Completeness metrics - assess data integrity
- Memory usage analysis - optimize performance
๐ Statistical Analysis
- Descriptive statistics - mean, median, std dev, quartiles
- Advanced metrics - skewness, kurtosis, variance
- Correlation analysis - relationship matrices
- Distribution analysis - understand your data's shape
๐ง Smart Type Inference
- Automatic optimization - suggest better data types
- Performance recommendations - categorical detection
- DateTime recognition - find hidden temporal data
- Numeric conversion - identify numbers stored as text
๐จ Rich Visualizations
- Multiple chart types - scatter, bar, histogram, line, box, heatmap
- Customizable plots - titles, colors, axis selection
- Base64 outputs - ready for display anywhere
- Interactive exploration - analyze any column relationships
๐ Quick Start
Installation
Using uv
(recommended):
git clone <repository-url>
cd eddy
uv install
Using pip:
git clone <repository-url>
cd eddy
pip install -e .
Local Testing
# Run the MCP server
uv run python main.py
# Or directly
python main.py
# Run tests
uv run pytest
# Or
python tests/test_basic.py
Claude Desktop Integration
Add Eddy to your Claude Desktop MCP configuration:
{
"mcpServers": {
"eddy": {
"command": "python",
"args": ["/absolute/path/to/eddy/main.py"],
"env": {}
}
}
}
Then restart Claude Desktop and start analyzing data!
๐ฌ Usage Examples
With Claude Desktop:
"Load my sales_data.csv and give me a quality assessment"
โ Comprehensive data quality report with missing values, duplicates, types
"Create a scatter plot of price vs quantity colored by category"
โ Interactive visualization showing relationships in your data
"What are the statistics for the revenue column?"
โ Detailed numeric analysis with distribution metrics
"Suggest better data types for this dataset"
โ Performance optimization recommendations
"Show me a correlation heatmap of all numeric columns"
โ Visual correlation matrix to identify relationships
๐ ๏ธ MCP Server Capabilities
Tools (Functions AI can call)
Tool | Purpose | Example |
---|---|---|
load_file | Load and preview data | File info, schema, first 10 rows |
analyze_data_quality | Data quality assessment | Missing values, duplicates, completeness |
compute_statistics | Statistical analysis | Mean, std, correlations, distributions |
infer_data_types | Type optimization | Suggest categorical, datetime, numeric types |
create_visualization | Generate charts | Scatter, bar, histogram, box, heatmap plots |
get_column_summary | Column deep-dive | Detailed analysis of specific columns |
Resources (Data AI can access)
schema://{file_path}
- Direct schema accesspreview://{file_path}
- Data preview access
Prompts (Analysis templates)
analyze_dataset_prompt
- Comprehensive analysis workflows
๐ Project Structure
eddy/
โโโ main.py # MCP server implementation
โโโ pyproject.toml # Project configuration
โโโ requirements.txt # Dependencies
โโโ .env # Environment variables
โโโ ARCHITECTURE.md # Technical architecture
โโโ CLAUDE.md # Development notes
โโโ tests/
โ โโโ test_basic.py # Test suite
โโโ tools/ # Extensible tools (future)
โโโ utils/ # Utility functions (future)
๐ง Configuration
Environment variables in .env
:
# File Processing
MAX_FILE_SIZE_MB=500
PREVIEW_ROWS_DEFAULT=10
MAX_PREVIEW_ROWS=100
# Supported Formats
SUPPORTED_FORMATS=csv,json,parquet,xlsx,xls
# Visualization
FIGURE_DPI=100
FIGURE_WIDTH=10
FIGURE_HEIGHT=6
๐ Supported File Formats
Format | Extensions | Features |
---|---|---|
CSV | .csv | Encoding detection, delimiter inference |
JSON | .json | Multiple orientations, nested data |
Parquet | .parquet | Column pruning, metadata preservation |
Excel | .xlsx , .xls | Multi-sheet support, formatting |
๐จ Chart Types
Chart | Use Case | Required Columns | Optional |
---|---|---|---|
scatter | Relationships | x_column, y_column | color_column |
histogram | Distributions | x_column | bins |
bar | Categories | x_column | y_column |
line | Trends over time | x_column, y_column | - |
box | Distribution by group | x_column | y_column |
correlation_heatmap | Feature relationships | (auto) | - |
๐งช Development
Package Management
- Use
uv
for dependency management - Use
pytest
for testing
Testing
# Run all tests
uv run pytest
# Run specific test
python tests/test_basic.py
# Test with coverage
uv run pytest --cov=main
Adding New Features
- Implement in
main.py
following existing patterns - Add tests in
tests/
- Update documentation
- Test with Claude Desktop integration
๐บ๏ธ Roadmap
โ Phase 1: Core MCP Server (Complete)
- Multi-format data loading (CSV, JSON, Parquet, Excel)
- Data quality assessment and missing value analysis
- Statistical analysis and correlation matrices
- Smart data type inference and optimization
- Rich visualization suite (6 chart types)
- Column-level deep dive analysis
- MCP resources and prompts
- Comprehensive test suite
- Architecture documentation
๐ง Phase 2: Enhanced Analytics (Next)
- Advanced statistical tests (t-tests, ANOVA, chi-square)
- Outlier detection and anomaly analysis
- Data profiling and distribution testing
- Time series analysis capabilities
- Multi-file dataset support
- Data sampling strategies for large files
๐ฎ Phase 3: Commercial Web Service (Future)
- REST API with FastAPI/Flask
- Authentication and API token management
- Rate limiting and usage tracking
- Cloud storage integration (S3, GCS, Azure)
- Async processing for large datasets
- Caching layer with Redis
- Multi-tenant architecture
- Usage analytics and billing
๐ Phase 4: Advanced Features (Future)
- Machine learning model integration
- Automated insight generation
- Interactive dashboard creation
- Data transformation suggestions
- Export to various formats
- Collaborative analysis features
๐ค Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make changes and add tests
- Ensure tests pass (
uv run pytest
) - Commit changes (
git commit -m 'Add amazing feature'
) - Push to branch (
git push origin feature/amazing-feature
) - Open a Pull Request
๐ License
MIT License - see LICENSE file for details.
๐ Acknowledgments
- Built with Model Context Protocol
- Powered by FastMCP framework
- Data processing via pandas
- Visualizations with matplotlib
Ready to explore your data? Install Eddy and start getting instant insights! ๐