eddy

savourylie/eddy

3.2

If you are the rightful owner of eddy and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Eddy is an MCP server designed for instant EDA across various data formats, providing a seamless experience directly from your IDE.

Eddy ๐Ÿ”

Instant EDA (Exploratory Data Analysis) MCP Server

Eddy is a Model Context Protocol (MCP) server that provides comprehensive exploratory data analysis for CSV, Parquet, JSON, and Excel files. Point it at your data to get instant insights - preview tables, infer schemas, detect types, handle nulls, compute statistics, and create visualizations right from your AI assistant.

โœจ Features

๐Ÿ“Š Universal Data Loading

  • Auto-detection for CSV, JSON, Parquet, and Excel files
  • Smart previews with configurable row limits
  • Schema extraction with data types and structure
  • File metadata including size and format details

๐Ÿ” Data Quality Assessment

  • Missing data analysis - identify and quantify nulls
  • Duplicate detection - find and count duplicate rows
  • Completeness metrics - assess data integrity
  • Memory usage analysis - optimize performance

๐Ÿ“ˆ Statistical Analysis

  • Descriptive statistics - mean, median, std dev, quartiles
  • Advanced metrics - skewness, kurtosis, variance
  • Correlation analysis - relationship matrices
  • Distribution analysis - understand your data's shape

๐Ÿง  Smart Type Inference

  • Automatic optimization - suggest better data types
  • Performance recommendations - categorical detection
  • DateTime recognition - find hidden temporal data
  • Numeric conversion - identify numbers stored as text

๐ŸŽจ Rich Visualizations

  • Multiple chart types - scatter, bar, histogram, line, box, heatmap
  • Customizable plots - titles, colors, axis selection
  • Base64 outputs - ready for display anywhere
  • Interactive exploration - analyze any column relationships

๐Ÿš€ Quick Start

Installation

Using uv (recommended):

git clone <repository-url>
cd eddy
uv install

Using pip:

git clone <repository-url>
cd eddy
pip install -e .

Local Testing

# Run the MCP server
uv run python main.py

# Or directly
python main.py

# Run tests
uv run pytest
# Or
python tests/test_basic.py

Claude Desktop Integration

Add Eddy to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "eddy": {
      "command": "python",
      "args": ["/absolute/path/to/eddy/main.py"],
      "env": {}
    }
  }
}

Then restart Claude Desktop and start analyzing data!

๐Ÿ’ฌ Usage Examples

With Claude Desktop:

"Load my sales_data.csv and give me a quality assessment"
โ†’ Comprehensive data quality report with missing values, duplicates, types

"Create a scatter plot of price vs quantity colored by category"  
โ†’ Interactive visualization showing relationships in your data

"What are the statistics for the revenue column?"
โ†’ Detailed numeric analysis with distribution metrics

"Suggest better data types for this dataset"
โ†’ Performance optimization recommendations

"Show me a correlation heatmap of all numeric columns"
โ†’ Visual correlation matrix to identify relationships

๐Ÿ› ๏ธ MCP Server Capabilities

Tools (Functions AI can call)

ToolPurposeExample
load_fileLoad and preview dataFile info, schema, first 10 rows
analyze_data_qualityData quality assessmentMissing values, duplicates, completeness
compute_statisticsStatistical analysisMean, std, correlations, distributions
infer_data_typesType optimizationSuggest categorical, datetime, numeric types
create_visualizationGenerate chartsScatter, bar, histogram, box, heatmap plots
get_column_summaryColumn deep-diveDetailed analysis of specific columns

Resources (Data AI can access)

  • schema://{file_path} - Direct schema access
  • preview://{file_path} - Data preview access

Prompts (Analysis templates)

  • analyze_dataset_prompt - Comprehensive analysis workflows

๐Ÿ“ Project Structure

eddy/
โ”œโ”€โ”€ main.py              # MCP server implementation
โ”œโ”€โ”€ pyproject.toml       # Project configuration  
โ”œโ”€โ”€ requirements.txt     # Dependencies
โ”œโ”€โ”€ .env                 # Environment variables
โ”œโ”€โ”€ ARCHITECTURE.md      # Technical architecture
โ”œโ”€โ”€ CLAUDE.md           # Development notes
โ”œโ”€โ”€ tests/              
โ”‚   โ””โ”€โ”€ test_basic.py   # Test suite
โ”œโ”€โ”€ tools/              # Extensible tools (future)
โ””โ”€โ”€ utils/              # Utility functions (future)

๐Ÿ”ง Configuration

Environment variables in .env:

# File Processing
MAX_FILE_SIZE_MB=500
PREVIEW_ROWS_DEFAULT=10
MAX_PREVIEW_ROWS=100

# Supported Formats  
SUPPORTED_FORMATS=csv,json,parquet,xlsx,xls

# Visualization
FIGURE_DPI=100
FIGURE_WIDTH=10
FIGURE_HEIGHT=6

๐Ÿ“ˆ Supported File Formats

FormatExtensionsFeatures
CSV.csvEncoding detection, delimiter inference
JSON.jsonMultiple orientations, nested data
Parquet.parquetColumn pruning, metadata preservation
Excel.xlsx, .xlsMulti-sheet support, formatting

๐ŸŽจ Chart Types

ChartUse CaseRequired ColumnsOptional
scatterRelationshipsx_column, y_columncolor_column
histogramDistributionsx_columnbins
barCategoriesx_columny_column
lineTrends over timex_column, y_column-
boxDistribution by groupx_columny_column
correlation_heatmapFeature relationships(auto)-

๐Ÿงช Development

Package Management

  • Use uv for dependency management
  • Use pytest for testing

Testing

# Run all tests
uv run pytest

# Run specific test
python tests/test_basic.py

# Test with coverage
uv run pytest --cov=main

Adding New Features

  1. Implement in main.py following existing patterns
  2. Add tests in tests/
  3. Update documentation
  4. Test with Claude Desktop integration

๐Ÿ—บ๏ธ Roadmap

โœ… Phase 1: Core MCP Server (Complete)

  • Multi-format data loading (CSV, JSON, Parquet, Excel)
  • Data quality assessment and missing value analysis
  • Statistical analysis and correlation matrices
  • Smart data type inference and optimization
  • Rich visualization suite (6 chart types)
  • Column-level deep dive analysis
  • MCP resources and prompts
  • Comprehensive test suite
  • Architecture documentation

๐Ÿšง Phase 2: Enhanced Analytics (Next)

  • Advanced statistical tests (t-tests, ANOVA, chi-square)
  • Outlier detection and anomaly analysis
  • Data profiling and distribution testing
  • Time series analysis capabilities
  • Multi-file dataset support
  • Data sampling strategies for large files

๐Ÿ”ฎ Phase 3: Commercial Web Service (Future)

  • REST API with FastAPI/Flask
  • Authentication and API token management
  • Rate limiting and usage tracking
  • Cloud storage integration (S3, GCS, Azure)
  • Async processing for large datasets
  • Caching layer with Redis
  • Multi-tenant architecture
  • Usage analytics and billing

๐ŸŒŸ Phase 4: Advanced Features (Future)

  • Machine learning model integration
  • Automated insight generation
  • Interactive dashboard creation
  • Data transformation suggestions
  • Export to various formats
  • Collaborative analysis features

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make changes and add tests
  4. Ensure tests pass (uv run pytest)
  5. Commit changes (git commit -m 'Add amazing feature')
  6. Push to branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments


Ready to explore your data? Install Eddy and start getting instant insights! ๐Ÿš€