eda-mcp-server

plijtmaer/eda-mcp-server

3.2

If you are the rightful owner of eda-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The EDA MCP Server is a comprehensive Model Context Protocol server that provides Exploratory Data Analysis tools for CSV and structured TXT files, powered by Python's data science stack and an intelligent TypeScript AI agent.

Tools
  1. eda-tool

    Main tool for performing exploratory data analysis

  2. echo

    Testing tool for basic connectivity

EDA MCP Server ๐Ÿ“Š๐Ÿค–

A comprehensive Model Context Protocol (MCP) server that provides Exploratory Data Analysis (EDA) tools for CSV and structured TXT files, powered by Python's data science stack and an intelligent TypeScript AI agent.

๐ŸŽฏ What This Project Does

This project demonstrates a complete AI-powered data analysis workflow using modern web technologies:

  • ๐Ÿ”ง MCP Server: TypeScript/Next.js server that hosts intelligent tools
  • ๐Ÿ Python Integration: Seamlessly execute pandas/matplotlib/seaborn from TypeScript
  • ๐Ÿค– AI Agent: OpenAI-powered agent that plans and executes complex data workflows
  • ๐Ÿ“Š Advanced EDA: 6 different analysis types with comprehensive statistics
  • ๐ŸŒ Web Interface: Modern Next.js web app with beautiful UI

๐Ÿ—๏ธ Current Tech Stack

Backend & MCP Server

  • TypeScript - Type-safe server logic
  • Next.js 15 - Modern React framework for server-side rendering
  • @vercel/mcp-adapter - Official Vercel MCP integration
  • @modelcontextprotocol/sdk - MCP protocol implementation
  • Zod - Runtime type validation

Data Analysis Engine

  • Python 3 - Data processing runtime
  • pandas - Data manipulation and analysis
  • matplotlib - Statistical plotting and visualization
  • seaborn - Advanced statistical visualizations
  • numpy - Numerical computing foundation

AI & Agent System

  • OpenAI API - GPT-4o-mini for intelligent planning
  • TypeScript Agent - Autonomous workflow execution
  • Dynamic Tool Discovery - Runtime MCP tool detection

Development & Testing

  • tsx - Fast TypeScript execution
  • pnpm - Efficient package management
  • dotenv - Environment variable management
  • Node.js Streams - Subprocess communication

Deployment Ready

  • Vercel - Serverless deployment platform
  • Redis (Optional) - SSE transport support only
  • Docker Support - Containerizable architecture

๐Ÿ“‹ Table of Contents

๐Ÿ› ๏ธ Installation & Prerequisites

Required Software

  • Node.js 18+ - JavaScript runtime
  • Python 3.8+ - Data analysis engine
  • pnpm - Package manager (npm install -g pnpm)

Python Data Science Stack

# Install required Python packages
pip3 install pandas matplotlib seaborn numpy

# Verify installation
python3 -c "import pandas, matplotlib, seaborn, numpy; print('โœ… All packages installed')"

Environment Setup

Create .env.local file:

# Required for AI agent
OPENAI_API_KEY=your_openai_api_key_here

# Optional: Redis for SSE transport (communication only, not data storage)
REDIS_URL=redis://localhost:6379

Setup & Start

# 1. Clone and install dependencies
git clone https://github.com/plijtmaer/eda-mcp-server.git
cd eda-mcp-server
pnpm install

# 2. Start the MCP server (localhost:3000 for local testing)
pnpm dev

# 3. Test EDA capabilities - Available analysis types:
#    โ€ข basic_info - Dataset overview and structure
#    โ€ข statistical_summary - Descriptive statistics  
#    โ€ข correlation_analysis - Correlation matrix and relationships
#    โ€ข distribution_plots - Distribution analysis and outliers
#    โ€ข missing_data_analysis - Missing value patterns
#    โ€ข custom_analysis - Execute custom Python code

# Quick test shortcuts (use predefined file/analysis combinations):
pnpm test:eda                    # โ†’ sample_data.csv + basic_info
pnpm test:eda statistical_summary # โ†’ sample_data.csv + statistical_summary
pnpm test:eda correlation_analysis # โ†’ sample_data.csv + correlation_analysis

# Or use full command for custom file/analysis combinations (localhost for local testing):
node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "data/sample_data.csv", "analysis_type": "basic_info"}'

node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "data/sales_data.csv", "analysis_type": "statistical_summary"}'

node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "data/financial_data.csv", "analysis_type": "correlation_analysis"}'

# Or test with the deployed Vercel server:
node test-mcp.mjs https://eda-mcp-server.vercel.app exploratory-data-analysis '{"file_path": "https://eda-mcp-server.vercel.app/data/sample_data.csv", "analysis_type": "basic_info"}'

node test-mcp.mjs https://eda-mcp-server.vercel.app exploratory-data-analysis '{"file_path": "https://eda-mcp-server.vercel.app/data/sales_data.csv", "analysis_type": "statistical_summary"}'

# 4. Run the simple AI agent (basic implementation for testing and demos)
pnpm agent

๐Ÿ”ฌ EDA Tool Deep Dive

The Exploratory Data Analysis tool is the heart of this system, offering 6 comprehensive analysis types:

๐Ÿ“‹ 1. Basic Information (basic_info)

What it analyzes:

  • Dataset shape (rows ร— columns)
  • Column names and data types
  • Memory usage optimization
  • First 5 rows preview
  • Data loading diagnostics

Perfect for: Initial data exploration and quality assessment

๐Ÿ“ˆ 2. Statistical Summary (statistical_summary)

Numerical columns:

  • Descriptive statistics (mean, median, std, min, max, quartiles)
  • Count of non-null values
  • Distribution insights

Categorical columns:

  • Unique value counts
  • Most frequent values (top 3)
  • Category distribution analysis

Perfect for: Understanding data distributions and central tendencies

๐Ÿ”— 3. Correlation Analysis (correlation_analysis)

Advanced correlation detection:

  • Complete correlation matrix for numerical variables
  • Automatic high correlation identification (|r| > 0.7)
  • Relationship strength interpretation
  • Multicollinearity detection

Perfect for: Feature selection and relationship discovery

๐Ÿ“Š 4. Distribution Analysis (distribution_plots)

Per-column statistical analysis:

  • Central tendency (mean, median)
  • Variability (standard deviation, range)
  • Shape analysis (skewness)
  • Outlier detection using IQR method
  • Distribution normality assessment

Perfect for: Data quality assessment and anomaly detection

โ“ 5. Missing Data Analysis (missing_data_analysis)

Comprehensive missing data audit:

  • Missing values per column (count & percentage)
  • Total dataset completeness
  • Missing data patterns
  • Data quality scoring

Perfect for: Data cleaning strategy development

๐Ÿ”ง 6. Custom Analysis (custom_analysis)

Flexible Python execution:

  • Custom pandas operations
  • Advanced statistical tests
  • Specialized data transformations
  • Domain-specific calculations

Available variables:

  • df - Your loaded DataFrame
  • pd - pandas library
  • np - numpy library
  • plt - matplotlib.pyplot
  • sns - seaborn library

Perfect for: Specialized analysis requirements

๐Ÿ“Š Current Data Support

โœ… Currently Supported

  • CSV files - Comma-separated values
  • TXT files - Tab/comma/semicolon-separated structured data
  • File-based analysis - Local file processing
  • Python pandas integration - Full pandas ecosystem

๐Ÿ“ Using Your Own Data

๐Ÿ  Local Development (Full Flexibility)
# Any local file path works
node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "data/my_data.csv", "analysis_type": "basic_info"}'

node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "/Users/yourname/Documents/sales.csv", "analysis_type": "statistical_summary"}'

# HTTP URLs also work
node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "https://raw.githubusercontent.com/yourname/repo/main/data.csv", "analysis_type": "correlation_analysis"}'
๐ŸŒ Deployed Server (HTTP URLs Only)
# Use the provided sample data
node test-mcp.mjs https://eda-mcp-server.vercel.app exploratory-data-analysis '{"file_path": "https://eda-mcp-server.vercel.app/data/sample_data.csv", "analysis_type": "basic_info"}'

# Use your own data hosted online
node test-mcp.mjs https://eda-mcp-server.vercel.app exploratory-data-analysis '{"file_path": "https://raw.githubusercontent.com/yourname/yourrepo/main/yourdata.csv", "analysis_type": "statistical_summary"}'

# Upload to Dropbox/Google Drive and use direct download links
node test-mcp.mjs https://eda-mcp-server.vercel.app exploratory-data-analysis '{"file_path": "https://www.dropbox.com/s/abc123/data.csv?dl=1", "analysis_type": "correlation_analysis"}'
๐Ÿ’ก Ways to Host Your Data Online:
  1. GitHub: Upload CSV to repository, use raw.githubusercontent.com URL
  2. Dropbox: Share file, add ?dl=1 to end of URL for direct download
  3. Google Drive: Share as public, use direct download link
  4. Your own website: Host CSV files in a public directory
  5. Data hosting services: Kaggle, data.world, etc.

โŒ Not Yet Supported (Future Development)

  • Database connectivity (PostgreSQL, MySQL, MongoDB)
  • Real-time streaming data
  • Cloud storage integration (S3, GCS, Azure)
  • API data ingestion

๐Ÿค– AI Agent Features

The TypeScript AI Agent provides intelligent automation of data analysis workflows. This is a simple implementation designed for testing and demonstration purposes:

๐Ÿง  Core Capabilities

  • Autonomous Planning - Uses OpenAI to create analysis strategies
  • Tool Discovery - Automatically detects available MCP tools
  • Smart Execution - Executes multi-step analysis workflows
  • Error Recovery - Handles tool failures gracefully
  • Natural Language Interface - Plain English analysis requests

๐ŸŽฎ Agent Modes

Interactive Mode
pnpm agent

Features:

  • Natural language queries
  • Real-time analysis execution
  • Iterative exploration
  • Context-aware suggestions

Example queries:

  • "Analyze the financial data and show me correlations between revenue and profit"
  • "Find outliers in the employee salary data"
  • "Compare weather patterns across all days"

Using your own data with the AI agent:

๐Ÿ  Local development:

  • "Analyze /Users/me/Documents/sales_2024.csv with statistical summary"
  • "Show me basic info for data/customer_data.csv"

๐ŸŒ With deployed server (use HTTP URLs):

Demo Mode
pnpm agent:demo

Features:

  • Predefined analysis workflows
  • Showcase of all capabilities
  • Automated report generation
  • Multiple dataset analysis

Note: This agent serves as a proof-of-concept for MCP-based data analysis automation. For production use, consider implementing more sophisticated planning, memory, and error handling.

๐Ÿงช Testing & Development

๐Ÿ”ง MCP Server Testing

# List all available tools (local)
pnpm test:mcp

# Test basic connectivity (local)
pnpm test:mcp http://localhost:3000 echo "Hello World"

# Test deployed server
node test-mcp.mjs https://eda-mcp-server.vercel.app echo "Hello from deployed server"

# Verify tool discovery (local)
curl -X POST http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{"jsonrpc":"2.0","method":"tools/list","id":1}'

# Verify deployed server tools
curl -X POST https://eda-mcp-server.vercel.app/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{"jsonrpc":"2.0","method":"tools/list","id":1}'

๐Ÿ“Š EDA Tool Testing

# Test each analysis type
pnpm test:eda basic_info
pnpm test:eda statistical_summary  
pnpm test:eda correlation_analysis
pnpm test:eda distribution_plots
pnpm test:eda missing_data_analysis

# Test different datasets
# Edit test-eda.mjs to customize file/analysis combinations

๐Ÿค– Agent Testing

# Interactive mode
pnpm agent

# Demo mode (automated workflows)
pnpm agent:demo

# Debug mode (with verbose logging)
DEBUG=* pnpm agent

๐Ÿ“Š Sample Datasets

The /data folder contains 4 diverse datasets for comprehensive testing:

๐Ÿ‘ฅ Employee Data (sample_data.csv)

15 rows ร— 7 columns

Columns: name, age, salary, department, years_experience, satisfaction_score, city
Data Types: Mixed (string, int, float)
Use Cases: HR analytics, correlation analysis, salary distribution

๐Ÿ’ฐ Sales Data (sales_data.csv)

10 rows ร— 6 columns

Columns: product, region, sales_amount, quantity, date, sales_rep
Data Types: Mixed (string, int, date)
Use Cases: Revenue analysis, regional performance, sales trends

๐ŸŒค๏ธ Weather Data (weather_data.txt)

7 rows ร— 6 columns

Columns: day, condition, temperature_f, humidity, wind_speed, precipitation  
Data Types: Mixed (string, int, float)
Use Cases: Environmental analysis, pattern detection, forecasting

๐Ÿข Financial Data (financial_data.csv)

8 rows ร— 7 columns

Columns: company, sector, revenue_million, profit_margin, employees, market_cap_billion, debt_ratio
Data Types: Mixed (string, float, int)
Use Cases: Corporate analysis, sector comparison, financial ratios

๐Ÿ“ Adding Your Own Data

  1. Place CSV/TXT files in /data folder
  2. Ensure structured format (comma/semicolon/tab separated)
  3. Use file path: "./data/your_file.csv"

๐Ÿ“ Project Architecture

eda-mcp-server/                     # ๐Ÿ  Project root
โ”œโ”€โ”€ agent/                          # ๐Ÿค– AI Agent system
โ”‚   โ””โ”€โ”€ simple-agent.ts            # TypeScript AI agent
โ”œโ”€โ”€ app/                           # ๐ŸŒ Next.js application  
โ”‚   โ”œโ”€โ”€ [transport]/route.ts       # MCP protocol handler
โ”‚   โ”œโ”€โ”€ layout.tsx                 # App layout
โ”‚   โ””โ”€โ”€ page.tsx                   # Web interface
โ”œโ”€โ”€ tools/                         # ๐Ÿ”ง MCP Tools
โ”‚   โ”œโ”€โ”€ eda-tool.ts                # Main EDA tool
โ”‚   โ”œโ”€โ”€ echo.ts                    # Testing tool
โ”‚   โ””โ”€โ”€ index.ts                   # Tool exports
โ”œโ”€โ”€ data/                          # ๐Ÿ“Š Sample datasets
โ”‚   โ”œโ”€โ”€ sample_data.csv            # Employee data
โ”‚   โ”œโ”€โ”€ sales_data.csv             # Sales data  
โ”‚   โ”œโ”€โ”€ weather_data.txt           # Weather data
โ”‚   โ””โ”€โ”€ financial_data.csv         # Financial data
โ”œโ”€โ”€ lib/                           # ๐Ÿ“š Utility libraries
โ”‚   โ””โ”€โ”€ redis.ts                   # Redis configuration (SSE transport only)
โ”œโ”€โ”€ test-mcp.mjs                   # ๐Ÿงช MCP server tester
โ”œโ”€โ”€ test-eda.mjs                   # ๐Ÿงช EDA tool tester
โ”œโ”€โ”€ package.json                   # ๐Ÿ“ฆ Dependencies & scripts
โ””โ”€โ”€ README.md                      # ๐Ÿ“– Documentation

๐Ÿ”ฎ Future Development Roadmap

๐ŸŽฏ Planned Features (Not Yet Implemented)

๐Ÿ“Š Data Sources
  • Database connectivity (PostgreSQL, MySQL, MongoDB)
  • API data ingestion with authentication
  • Real-time streaming data processing
  • Cloud storage integration (S3, GCS, Azure)
๐Ÿง  Agent Intelligence
  • Memory for conversation context
  • Multi-step workflow planning with dependencies
  • Custom analysis templates and presets
  • Domain-specific expertise modules
๐ŸŽจ Visualization Enhancements
  • Save matplotlib plots to files and URLs
  • Interactive charts with Plotly integration
  • Dashboard creation with real-time updates
  • Custom chart templates
๐Ÿ”ง Advanced Analytics
  • Machine learning model training and evaluation
  • Statistical hypothesis testing suite
  • Time series analysis and forecasting
  • Geospatial data processing capabilities

๐Ÿš€ Contributing

This project is open for contributions! Priority areas:

  1. Database connectors for PostgreSQL/MySQL
  2. Advanced visualization features
  3. Machine learning integrations
  4. Performance optimizations

๐Ÿ“œ License & Contact

Built with โค๏ธ using: TypeScript, Next.js, Python, OpenAI, and the Model Context Protocol.

Repository: https://github.com/plijtmaer/eda-mcp-server

๐Ÿš€ Ready for production deployment and continuous enhancement!