eddy

savourylie/eddy

3.1

If you are the rightful owner of eddy and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Eddy is an MCP server designed for instant EDA across various data formats, providing a seamless experience directly from your IDE.

Eddy 🔍

Instant EDA (Exploratory Data Analysis) MCP Server

Eddy is a Model Context Protocol (MCP) server that provides comprehensive exploratory data analysis for CSV, Parquet, JSON, and Excel files. Point it at your data to get instant insights - preview tables, infer schemas, detect types, handle nulls, compute statistics, and create visualizations right from your AI assistant.

✨ Features

📊 Universal Data Loading

  • Auto-detection for CSV, JSON, Parquet, and Excel files
  • Smart previews with configurable row limits
  • Schema extraction with data types and structure
  • File metadata including size and format details

🔍 Data Quality Assessment

  • Missing data analysis - identify and quantify nulls
  • Duplicate detection - find and count duplicate rows
  • Completeness metrics - assess data integrity
  • Memory usage analysis - optimize performance

📈 Statistical Analysis

  • Descriptive statistics - mean, median, std dev, quartiles
  • Advanced metrics - skewness, kurtosis, variance
  • Correlation analysis - relationship matrices
  • Distribution analysis - understand your data's shape

🧠 Smart Type Inference

  • Automatic optimization - suggest better data types
  • Performance recommendations - categorical detection
  • DateTime recognition - find hidden temporal data
  • Numeric conversion - identify numbers stored as text

🎨 Rich Visualizations

  • Multiple chart types - scatter, bar, histogram, line, box, heatmap
  • Customizable plots - titles, colors, axis selection
  • Base64 outputs - ready for display anywhere
  • Interactive exploration - analyze any column relationships

🚀 Quick Start

Installation

Using uv (recommended):

git clone <repository-url>
cd eddy
uv install

Using pip:

git clone <repository-url>
cd eddy
pip install -e .

Local Testing

# Run the MCP server
uv run python main.py

# Or directly
python main.py

# Run tests
uv run pytest
# Or
python tests/test_basic.py

Claude Desktop Integration

Add Eddy to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "eddy": {
      "command": "python",
      "args": ["/absolute/path/to/eddy/main.py"],
      "env": {}
    }
  }
}

Then restart Claude Desktop and start analyzing data!

💬 Usage Examples

With Claude Desktop:

"Load my sales_data.csv and give me a quality assessment"
→ Comprehensive data quality report with missing values, duplicates, types

"Create a scatter plot of price vs quantity colored by category"  
→ Interactive visualization showing relationships in your data

"What are the statistics for the revenue column?"
→ Detailed numeric analysis with distribution metrics

"Suggest better data types for this dataset"
→ Performance optimization recommendations

"Show me a correlation heatmap of all numeric columns"
→ Visual correlation matrix to identify relationships

🛠️ MCP Server Capabilities

Tools (Functions AI can call)

ToolPurposeExample
load_fileLoad and preview dataFile info, schema, first 10 rows
analyze_data_qualityData quality assessmentMissing values, duplicates, completeness
compute_statisticsStatistical analysisMean, std, correlations, distributions
infer_data_typesType optimizationSuggest categorical, datetime, numeric types
create_visualizationGenerate chartsScatter, bar, histogram, box, heatmap plots
get_column_summaryColumn deep-diveDetailed analysis of specific columns

Resources (Data AI can access)

  • schema://{file_path} - Direct schema access
  • preview://{file_path} - Data preview access

Prompts (Analysis templates)

  • analyze_dataset_prompt - Comprehensive analysis workflows

📁 Project Structure

eddy/
├── main.py              # MCP server implementation
├── pyproject.toml       # Project configuration  
├── requirements.txt     # Dependencies
├── .env                 # Environment variables
├── ARCHITECTURE.md      # Technical architecture
├── CLAUDE.md           # Development notes
├── tests/              
│   └── test_basic.py   # Test suite
├── tools/              # Extensible tools (future)
└── utils/              # Utility functions (future)

🔧 Configuration

Environment variables in .env:

# File Processing
MAX_FILE_SIZE_MB=500
PREVIEW_ROWS_DEFAULT=10
MAX_PREVIEW_ROWS=100

# Supported Formats  
SUPPORTED_FORMATS=csv,json,parquet,xlsx,xls

# Visualization
FIGURE_DPI=100
FIGURE_WIDTH=10
FIGURE_HEIGHT=6

📈 Supported File Formats

FormatExtensionsFeatures
CSV.csvEncoding detection, delimiter inference
JSON.jsonMultiple orientations, nested data
Parquet.parquetColumn pruning, metadata preservation
Excel.xlsx, .xlsMulti-sheet support, formatting

🎨 Chart Types

ChartUse CaseRequired ColumnsOptional
scatterRelationshipsx_column, y_columncolor_column
histogramDistributionsx_columnbins
barCategoriesx_columny_column
lineTrends over timex_column, y_column-
boxDistribution by groupx_columny_column
correlation_heatmapFeature relationships(auto)-

🧪 Development

Package Management

  • Use uv for dependency management
  • Use pytest for testing

Testing

# Run all tests
uv run pytest

# Run specific test
python tests/test_basic.py

# Test with coverage
uv run pytest --cov=main

Adding New Features

  1. Implement in main.py following existing patterns
  2. Add tests in tests/
  3. Update documentation
  4. Test with Claude Desktop integration

🗺️ Roadmap

Phase 1: Core MCP Server (Complete)

  • Multi-format data loading (CSV, JSON, Parquet, Excel)
  • Data quality assessment and missing value analysis
  • Statistical analysis and correlation matrices
  • Smart data type inference and optimization
  • Rich visualization suite (6 chart types)
  • Column-level deep dive analysis
  • MCP resources and prompts
  • Comprehensive test suite
  • Architecture documentation

🚧 Phase 2: Enhanced Analytics (Next)

  • Advanced statistical tests (t-tests, ANOVA, chi-square)
  • Outlier detection and anomaly analysis
  • Data profiling and distribution testing
  • Time series analysis capabilities
  • Multi-file dataset support
  • Data sampling strategies for large files

🔮 Phase 3: Commercial Web Service (Future)

  • REST API with FastAPI/Flask
  • Authentication and API token management
  • Rate limiting and usage tracking
  • Cloud storage integration (S3, GCS, Azure)
  • Async processing for large datasets
  • Caching layer with Redis
  • Multi-tenant architecture
  • Usage analytics and billing

🌟 Phase 4: Advanced Features (Future)

  • Machine learning model integration
  • Automated insight generation
  • Interactive dashboard creation
  • Data transformation suggestions
  • Export to various formats
  • Collaborative analysis features

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make changes and add tests
  4. Ensure tests pass (uv run pytest)
  5. Commit changes (git commit -m 'Add amazing feature')
  6. Push to branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments


Ready to explore your data? Install Eddy and start getting instant insights! 🚀