plijtmaer/eda-mcp-server
If you are the rightful owner of eda-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The EDA MCP Server is a comprehensive Model Context Protocol server that provides Exploratory Data Analysis tools for CSV and structured TXT files, powered by Python's data science stack and an intelligent TypeScript AI agent.
eda-tool
Main tool for performing exploratory data analysis
echo
Testing tool for basic connectivity
EDA MCP Server ๐๐ค
A comprehensive Model Context Protocol (MCP) server that provides Exploratory Data Analysis (EDA) tools for CSV and structured TXT files, powered by Python's data science stack and an intelligent TypeScript AI agent.
๐ฏ What This Project Does
This project demonstrates a complete AI-powered data analysis workflow using modern web technologies:
- ๐ง MCP Server: TypeScript/Next.js server that hosts intelligent tools
- ๐ Python Integration: Seamlessly execute pandas/matplotlib/seaborn from TypeScript
- ๐ค AI Agent: OpenAI-powered agent that plans and executes complex data workflows
- ๐ Advanced EDA: 6 different analysis types with comprehensive statistics
- ๐ Web Interface: Modern Next.js web app with beautiful UI
๐๏ธ Current Tech Stack
Backend & MCP Server
- TypeScript - Type-safe server logic
- Next.js 15 - Modern React framework for server-side rendering
- @vercel/mcp-adapter - Official Vercel MCP integration
- @modelcontextprotocol/sdk - MCP protocol implementation
- Zod - Runtime type validation
Data Analysis Engine
- Python 3 - Data processing runtime
- pandas - Data manipulation and analysis
- matplotlib - Statistical plotting and visualization
- seaborn - Advanced statistical visualizations
- numpy - Numerical computing foundation
AI & Agent System
- OpenAI API - GPT-4o-mini for intelligent planning
- TypeScript Agent - Autonomous workflow execution
- Dynamic Tool Discovery - Runtime MCP tool detection
Development & Testing
- tsx - Fast TypeScript execution
- pnpm - Efficient package management
- dotenv - Environment variable management
- Node.js Streams - Subprocess communication
Deployment Ready
- Vercel - Serverless deployment platform
- Redis (Optional) - SSE transport support only
- Docker Support - Containerizable architecture
๐ Table of Contents
- ๐ ๏ธ Installation & Prerequisites
- ๐ฌ EDA Tool Deep Dive
- ๐ค AI Agent Features
- ๐งช Testing & Development
- ๐ Sample Datasets
- ๐ Project Architecture
๐ ๏ธ Installation & Prerequisites
Required Software
- Node.js 18+ - JavaScript runtime
- Python 3.8+ - Data analysis engine
- pnpm - Package manager (
npm install -g pnpm
)
Python Data Science Stack
# Install required Python packages
pip3 install pandas matplotlib seaborn numpy
# Verify installation
python3 -c "import pandas, matplotlib, seaborn, numpy; print('โ
All packages installed')"
Environment Setup
Create .env.local
file:
# Required for AI agent
OPENAI_API_KEY=your_openai_api_key_here
# Optional: Redis for SSE transport (communication only, not data storage)
REDIS_URL=redis://localhost:6379
Setup & Start
# 1. Clone and install dependencies
git clone https://github.com/plijtmaer/eda-mcp-server.git
cd eda-mcp-server
pnpm install
# 2. Start the MCP server (localhost:3000 for local testing)
pnpm dev
# 3. Test EDA capabilities - Available analysis types:
# โข basic_info - Dataset overview and structure
# โข statistical_summary - Descriptive statistics
# โข correlation_analysis - Correlation matrix and relationships
# โข distribution_plots - Distribution analysis and outliers
# โข missing_data_analysis - Missing value patterns
# โข custom_analysis - Execute custom Python code
# Quick test shortcuts (use predefined file/analysis combinations):
pnpm test:eda # โ sample_data.csv + basic_info
pnpm test:eda statistical_summary # โ sample_data.csv + statistical_summary
pnpm test:eda correlation_analysis # โ sample_data.csv + correlation_analysis
# Or use full command for custom file/analysis combinations (localhost for local testing):
node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "data/sample_data.csv", "analysis_type": "basic_info"}'
node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "data/sales_data.csv", "analysis_type": "statistical_summary"}'
node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "data/financial_data.csv", "analysis_type": "correlation_analysis"}'
# Or test with the deployed Vercel server:
node test-mcp.mjs https://eda-mcp-server.vercel.app exploratory-data-analysis '{"file_path": "https://eda-mcp-server.vercel.app/data/sample_data.csv", "analysis_type": "basic_info"}'
node test-mcp.mjs https://eda-mcp-server.vercel.app exploratory-data-analysis '{"file_path": "https://eda-mcp-server.vercel.app/data/sales_data.csv", "analysis_type": "statistical_summary"}'
# 4. Run the simple AI agent (basic implementation for testing and demos)
pnpm agent
๐ฌ EDA Tool Deep Dive
The Exploratory Data Analysis tool is the heart of this system, offering 6 comprehensive analysis types:
๐ 1. Basic Information (basic_info
)
What it analyzes:
- Dataset shape (rows ร columns)
- Column names and data types
- Memory usage optimization
- First 5 rows preview
- Data loading diagnostics
Perfect for: Initial data exploration and quality assessment
๐ 2. Statistical Summary (statistical_summary
)
Numerical columns:
- Descriptive statistics (mean, median, std, min, max, quartiles)
- Count of non-null values
- Distribution insights
Categorical columns:
- Unique value counts
- Most frequent values (top 3)
- Category distribution analysis
Perfect for: Understanding data distributions and central tendencies
๐ 3. Correlation Analysis (correlation_analysis
)
Advanced correlation detection:
- Complete correlation matrix for numerical variables
- Automatic high correlation identification (|r| > 0.7)
- Relationship strength interpretation
- Multicollinearity detection
Perfect for: Feature selection and relationship discovery
๐ 4. Distribution Analysis (distribution_plots
)
Per-column statistical analysis:
- Central tendency (mean, median)
- Variability (standard deviation, range)
- Shape analysis (skewness)
- Outlier detection using IQR method
- Distribution normality assessment
Perfect for: Data quality assessment and anomaly detection
โ 5. Missing Data Analysis (missing_data_analysis
)
Comprehensive missing data audit:
- Missing values per column (count & percentage)
- Total dataset completeness
- Missing data patterns
- Data quality scoring
Perfect for: Data cleaning strategy development
๐ง 6. Custom Analysis (custom_analysis
)
Flexible Python execution:
- Custom pandas operations
- Advanced statistical tests
- Specialized data transformations
- Domain-specific calculations
Available variables:
df
- Your loaded DataFramepd
- pandas librarynp
- numpy libraryplt
- matplotlib.pyplotsns
- seaborn library
Perfect for: Specialized analysis requirements
๐ Current Data Support
โ Currently Supported
- CSV files - Comma-separated values
- TXT files - Tab/comma/semicolon-separated structured data
- File-based analysis - Local file processing
- Python pandas integration - Full pandas ecosystem
๐ Using Your Own Data
๐ Local Development (Full Flexibility)
# Any local file path works
node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "data/my_data.csv", "analysis_type": "basic_info"}'
node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "/Users/yourname/Documents/sales.csv", "analysis_type": "statistical_summary"}'
# HTTP URLs also work
node test-mcp.mjs http://localhost:3000 exploratory-data-analysis '{"file_path": "https://raw.githubusercontent.com/yourname/repo/main/data.csv", "analysis_type": "correlation_analysis"}'
๐ Deployed Server (HTTP URLs Only)
# Use the provided sample data
node test-mcp.mjs https://eda-mcp-server.vercel.app exploratory-data-analysis '{"file_path": "https://eda-mcp-server.vercel.app/data/sample_data.csv", "analysis_type": "basic_info"}'
# Use your own data hosted online
node test-mcp.mjs https://eda-mcp-server.vercel.app exploratory-data-analysis '{"file_path": "https://raw.githubusercontent.com/yourname/yourrepo/main/yourdata.csv", "analysis_type": "statistical_summary"}'
# Upload to Dropbox/Google Drive and use direct download links
node test-mcp.mjs https://eda-mcp-server.vercel.app exploratory-data-analysis '{"file_path": "https://www.dropbox.com/s/abc123/data.csv?dl=1", "analysis_type": "correlation_analysis"}'
๐ก Ways to Host Your Data Online:
- GitHub: Upload CSV to repository, use raw.githubusercontent.com URL
- Dropbox: Share file, add
?dl=1
to end of URL for direct download - Google Drive: Share as public, use direct download link
- Your own website: Host CSV files in a public directory
- Data hosting services: Kaggle, data.world, etc.
โ Not Yet Supported (Future Development)
- Database connectivity (PostgreSQL, MySQL, MongoDB)
- Real-time streaming data
- Cloud storage integration (S3, GCS, Azure)
- API data ingestion
๐ค AI Agent Features
The TypeScript AI Agent provides intelligent automation of data analysis workflows. This is a simple implementation designed for testing and demonstration purposes:
๐ง Core Capabilities
- Autonomous Planning - Uses OpenAI to create analysis strategies
- Tool Discovery - Automatically detects available MCP tools
- Smart Execution - Executes multi-step analysis workflows
- Error Recovery - Handles tool failures gracefully
- Natural Language Interface - Plain English analysis requests
๐ฎ Agent Modes
Interactive Mode
pnpm agent
Features:
- Natural language queries
- Real-time analysis execution
- Iterative exploration
- Context-aware suggestions
Example queries:
- "Analyze the financial data and show me correlations between revenue and profit"
- "Find outliers in the employee salary data"
- "Compare weather patterns across all days"
Using your own data with the AI agent:
๐ Local development:
- "Analyze /Users/me/Documents/sales_2024.csv with statistical summary"
- "Show me basic info for data/customer_data.csv"
๐ With deployed server (use HTTP URLs):
- "Analyze https://raw.githubusercontent.com/me/myrepo/main/sales.csv with correlation analysis"
- "Show distribution analysis for https://www.dropbox.com/s/abc123/data.csv?dl=1"
Demo Mode
pnpm agent:demo
Features:
- Predefined analysis workflows
- Showcase of all capabilities
- Automated report generation
- Multiple dataset analysis
Note: This agent serves as a proof-of-concept for MCP-based data analysis automation. For production use, consider implementing more sophisticated planning, memory, and error handling.
๐งช Testing & Development
๐ง MCP Server Testing
# List all available tools (local)
pnpm test:mcp
# Test basic connectivity (local)
pnpm test:mcp http://localhost:3000 echo "Hello World"
# Test deployed server
node test-mcp.mjs https://eda-mcp-server.vercel.app echo "Hello from deployed server"
# Verify tool discovery (local)
curl -X POST http://localhost:3000/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}'
# Verify deployed server tools
curl -X POST https://eda-mcp-server.vercel.app/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}'
๐ EDA Tool Testing
# Test each analysis type
pnpm test:eda basic_info
pnpm test:eda statistical_summary
pnpm test:eda correlation_analysis
pnpm test:eda distribution_plots
pnpm test:eda missing_data_analysis
# Test different datasets
# Edit test-eda.mjs to customize file/analysis combinations
๐ค Agent Testing
# Interactive mode
pnpm agent
# Demo mode (automated workflows)
pnpm agent:demo
# Debug mode (with verbose logging)
DEBUG=* pnpm agent
๐ Sample Datasets
The /data
folder contains 4 diverse datasets for comprehensive testing:
๐ฅ Employee Data (sample_data.csv
)
15 rows ร 7 columns
Columns: name, age, salary, department, years_experience, satisfaction_score, city
Data Types: Mixed (string, int, float)
Use Cases: HR analytics, correlation analysis, salary distribution
๐ฐ Sales Data (sales_data.csv
)
10 rows ร 6 columns
Columns: product, region, sales_amount, quantity, date, sales_rep
Data Types: Mixed (string, int, date)
Use Cases: Revenue analysis, regional performance, sales trends
๐ค๏ธ Weather Data (weather_data.txt
)
7 rows ร 6 columns
Columns: day, condition, temperature_f, humidity, wind_speed, precipitation
Data Types: Mixed (string, int, float)
Use Cases: Environmental analysis, pattern detection, forecasting
๐ข Financial Data (financial_data.csv
)
8 rows ร 7 columns
Columns: company, sector, revenue_million, profit_margin, employees, market_cap_billion, debt_ratio
Data Types: Mixed (string, float, int)
Use Cases: Corporate analysis, sector comparison, financial ratios
๐ Adding Your Own Data
- Place CSV/TXT files in
/data
folder - Ensure structured format (comma/semicolon/tab separated)
- Use file path:
"./data/your_file.csv"
๐ Project Architecture
eda-mcp-server/ # ๐ Project root
โโโ agent/ # ๐ค AI Agent system
โ โโโ simple-agent.ts # TypeScript AI agent
โโโ app/ # ๐ Next.js application
โ โโโ [transport]/route.ts # MCP protocol handler
โ โโโ layout.tsx # App layout
โ โโโ page.tsx # Web interface
โโโ tools/ # ๐ง MCP Tools
โ โโโ eda-tool.ts # Main EDA tool
โ โโโ echo.ts # Testing tool
โ โโโ index.ts # Tool exports
โโโ data/ # ๐ Sample datasets
โ โโโ sample_data.csv # Employee data
โ โโโ sales_data.csv # Sales data
โ โโโ weather_data.txt # Weather data
โ โโโ financial_data.csv # Financial data
โโโ lib/ # ๐ Utility libraries
โ โโโ redis.ts # Redis configuration (SSE transport only)
โโโ test-mcp.mjs # ๐งช MCP server tester
โโโ test-eda.mjs # ๐งช EDA tool tester
โโโ package.json # ๐ฆ Dependencies & scripts
โโโ README.md # ๐ Documentation
๐ฎ Future Development Roadmap
๐ฏ Planned Features (Not Yet Implemented)
๐ Data Sources
- Database connectivity (PostgreSQL, MySQL, MongoDB)
- API data ingestion with authentication
- Real-time streaming data processing
- Cloud storage integration (S3, GCS, Azure)
๐ง Agent Intelligence
- Memory for conversation context
- Multi-step workflow planning with dependencies
- Custom analysis templates and presets
- Domain-specific expertise modules
๐จ Visualization Enhancements
- Save matplotlib plots to files and URLs
- Interactive charts with Plotly integration
- Dashboard creation with real-time updates
- Custom chart templates
๐ง Advanced Analytics
- Machine learning model training and evaluation
- Statistical hypothesis testing suite
- Time series analysis and forecasting
- Geospatial data processing capabilities
๐ Contributing
This project is open for contributions! Priority areas:
- Database connectors for PostgreSQL/MySQL
- Advanced visualization features
- Machine learning integrations
- Performance optimizations
๐ License & Contact
Built with โค๏ธ using: TypeScript, Next.js, Python, OpenAI, and the Model Context Protocol.
Repository: https://github.com/plijtmaer/eda-mcp-server
๐ Ready for production deployment and continuous enhancement!