marlonluo2018/pandas-mcp-server
If you are the rightful owner of pandas-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Excel-MCP Server Visualization Project provides visualization capabilities for Excel data through an MCP server interface, generating interactive charts using Pyecharts.
Pandas-MCP Server
A comprehensive Model Context Protocol (MCP) server that enables LLMs to execute pandas code through a standardized workflow for data analysis and visualization.
๐ฏ MCP Server Overview
The Pandas-MCP Server is designed as a Model Context Protocol (MCP) server that provides LLMs with powerful data processing capabilities. MCP is a standardized protocol that allows AI models to interact with external tools and services in a secure, structured way.
๐ ๏ธ Installation
Prerequisites
- Python 3.8+
- pip package manager
- Git (for cloning the repository)
Step 1: Clone the Repository
git clone https://github.com/marlonluo2018/pandas-mcp-server.git
cd pandas-mcp-server
Step 2: Install Dependencies
pip install -r requirements.txt
Step 3: Verify Installation
# Test the CLI interface
python cli.py
# Or test the MCP server directly
python server.py
Dependencies
- pandas>=2.0.0 - Data manipulation and analysis
- fastmcp>=1.0.0 - MCP server framework
- chardet>=5.0.0 - Character encoding detection
- psutil - System monitoring for memory optimization
Claude Desktop Configuration
Add this configuration to your Claude Desktop settings:
{
"mcpServers": {
"pandas-server": {
"type": "stdio",
"command": "python",
"args": ["/path/to/your/pandas-mcp-server/server.py"]
}
}
}
Note: Replace /path/to/your/pandas-mcp-server/server.py
with the actual path where you cloned the repository.
Example paths:
- Windows:
"C:\\Users\\YourName\\pandas-mcp-server\\server.py"
- macOS/Linux:
"/home/username/pandas-mcp-server/server.py"
Configuration File Location
- Windows:
%APPDATA%\Claude\claude_desktop_config.json
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
- Linux:
~/.config/Claude/claude_desktop_config.json
Verification
After configuration, restart Claude Desktop. The server should appear in the MCP tools list with three available tools:
read_metadata_tool
- File analysisrun_pandas_code_tool
- Code executiongenerate_chartjs_tool
- Chart generation
๐ Workflow
The pandas MCP server follows a structured three-step workflow for data analysis and visualization:
Step 1: Read File Metadata
LLM calls read_metadata_tool
to understand the file structure:
- Extract file type, size, encoding, and column information
- Get data types, sample values, and statistical summaries
- Receive data quality warnings and suggested operations
- Understand the dataset structure before processing
Step 2: Execute Pandas Operations
LLM calls run_pandas_code_tool
based on metadata analysis:
- Formulate pandas operations using the understood file structure
- Execute data processing, filtering, aggregation, or analysis
- Receive results in DataFrame, Series, or dictionary format
- Get optimized output with memory management
Step 3: Generate Visualizations
LLM calls generate_chartjs_tool
to create interactive charts:
- Transform processed data into Chart.js compatible format
- Generate interactive HTML charts with customization controls
- Create bar, line, or pie charts based on data characteristics
- Output responsive visualizations for analysis presentation
๐ MCP Server Tools
The server exposes three main tools for LLM integration:
1. read_metadata_tool
- File Analysis
Extract comprehensive metadata from Excel and CSV files including:
- File type, size, encoding, and structure
- Column names, data types, and sample values
- Statistical summaries (null counts, unique values, min/max/mean)
- Data quality warnings and suggested operations
- Memory-optimized processing for large files
MCP Tool Usage:
{
"tool": "read_metadata_tool",
"args": {
"file_path": "/path/to/sales_data.xlsx"
}
}
2. run_pandas_code_tool
- Secure Code Execution
Execute pandas operations with:
- Security filtering against malicious code
- Memory optimization for large datasets
- Comprehensive error handling and debugging
- Support for DataFrame, Series, and dictionary results
Forbidden Operations
The following operations are blocked for security reasons:
- System Access:
os.
,sys.
,subprocess.
- Prevents file system and system access - Code Execution:
open()
,exec()
,eval()
- Blocks dynamic code execution - Dangerous Imports:
import os
,import sys
- Prevents specific harmful imports - Browser/DOM Access:
document.
,window.
,XMLHttpRequest
- Blocks browser operations - JavaScript/Remote:
fetch()
,eval()
,Function()
- Prevents remote code execution - Script Injection:
script
,javascript:
- Blocks script injection attempts
Requirements:
- Final result must be assigned to
result
variable - Code should include necessary imports (pandas available as
pd
) - All code goes through security filtering before execution
MCP Tool Usage:
{
"tool": "run_pandas_code_tool",
"args": {
"code": "import pandas as pd\ndf = pd.read_excel('/path/to/data.xlsx')\nresult = df.groupby('Region')['Sales'].sum()"
}
}
3. generate_chartjs_tool
- Interactive Visualizations
Generate interactive charts with Chart.js:
- Bar charts - For categorical comparisons
- Line charts - For trend analysis
- Pie charts - For proportional data
- Interactive HTML templates with customization controls
MCP Tool Usage:
{
"tool": "generate_chartjs_tool",
"args": {
"data": {
"columns": [
{
"name": "Region",
"type": "string",
"examples": ["North", "South", "East", "West"]
},
{
"name": "Sales",
"type": "number",
"examples": [15000, 12000, 18000, 9000]
}
]
},
"chart_types": ["bar"],
"title": "Sales by Region"
}
}
๐ Usage
CLI Interface (Testing & Development)
The cli.py
provides a convenient command-line interface for testing the MCP server functionality without requiring an MCP client:
Interactive Mode
python cli.py
Launches a guided menu system with:
- Step-by-step workflow guidance
- Automatic input validation
- Clear error messages
- Support for file paths with spaces
Command-Line Mode
# Read metadata
python cli.py metadata data.xlsx
# Execute pandas code
python cli.py execute analysis.py
# Generate charts
python cli.py chart data.json --type bar --title "Sales Analysis"
๐ Code Logic & Architecture
Core Components
1. Server Architecture (server.py
)
- FastMCP Integration: Uses FastMCP framework for MCP protocol implementation
- Logging System: Unified logging with rotation and memory tracking
- Tool Registration: Exposes three main tools with proper error handling
- Memory Monitoring: Tracks memory usage before/after operations
2. Metadata Processing (core/metadata.py
)
Key Logic:
- File validation (existence, size limits)
- Encoding detection for CSV files
- Memory-optimized data processing (100-row samples)
- Comprehensive statistical analysis
- Data quality assessment and warnings
Memory Optimization:
- Uses
category
dtype for string columns with low cardinality - Converts float64 to float32 for memory efficiency
- Processes only first 100 rows for metadata extraction
- Forces garbage collection after processing
3. Code Execution (core/execution.py
)
Security Features:
- Blacklist filtering for dangerous operations
- Sandboxed execution environment
- Output capture and error handling
- Memory monitoring for large results
Execution Flow:
- Security check against BLACKLIST patterns
- Syntax validation through compilation
- Code execution in isolated environment
- Result formatting and memory optimization
- Output capture and error reporting
4. Chart Generation (core/visualization.py
)
Architecture:
- Template-based HTML generation
- Chart.js integration via CDN
- Interactive controls for customization
- Automatic file naming and organization
Chart Types:
- Bar Charts: Categorical data with bar width and Y-axis controls
- Line Charts: Trend analysis with line styling options
- Pie Charts: Proportional data with donut hole and percentage display
5. Chart Generators (core/chart_generators/
)
Base Class (base.py
):
- Abstract base class for all chart generators
- Template management and file I/O
- Common chart configuration
Specific Generators:
BarChartGenerator
: Bar charts with interactive controlsLineChartGenerator
: Line charts with tension and stylingPieChartGenerator
: Pie charts with legend and percentage options
Data Flow Architecture
User Input โ Security Check โ Processing โ Result โ Output
โ โ โ โ โ
CLI/MCP โ BLACKLIST โ Memory Opt โ Format โ Log/Display
Memory Management Strategy
- Chunked Processing: Large files processed in 10KB chunks
- Type Optimization: Automatic dtype conversion (float64โfloat32, objectโcategory)
- Limited Sampling: Only first 100 rows processed for metadata
- Garbage Collection: Forced cleanup after major operations
- Memory Monitoring: PSutil integration for tracking usage
๐ Project Structure
pandas-mcp-server/
โโโ server.py # MCP server implementation
โโโ cli.py # CLI interface for testing
โโโ requirements.txt # Python dependencies
โโโ core/ # Core functionality
โ โโโ config.py # Configuration and constants
โ โโโ data_types.py # Data type utilities
โ โโโ metadata.py # File metadata extraction
โ โโโ execution.py # Pandas code execution
โ โโโ visualization.py # Chart generation orchestration
โ โโโ chart_generators/ # Chart-specific implementations
โ โโโ __init__.py
โ โโโ base.py # Base chart generator
โ โโโ bar.py # Bar chart generator
โ โโโ line.py # Line chart generator
โ โโโ pie.py # Pie chart generator
โ โโโ templates/ # HTML templates for charts
โโโ charts/ # Generated chart files
โโโ logs/ # Application logs
โโโ csv_metadata_format.md # CSV metadata documentation
โโโ test_*.py # Test files
๐ง Configuration
Core Configuration (core/config.py
)
- MAX_FILE_SIZE: 100MB file size limit
- BLACKLIST: Security restrictions for code execution
- CHARTS_DIR: Directory for generated charts
- Logging: Comprehensive logging with rotation
Security Features
- Code execution sandboxing
- Blacklisted operations (file system, network, eval)
- Memory usage monitoring
- Input validation and sanitization
๐ Chart Generation Details
Template System
Charts are generated using HTML templates with:
- Chart.js integration via CDN
- Interactive controls for customization
- Responsive design with mobile support
- Real-time parameter adjustment
Chart Types
Bar Charts
- Interactive controls for bar width and Y-axis scaling
- Responsive design with zoom capabilities
- Data labels and tooltips
- Multiple dataset support
Line Charts
- Multiple line series support
- Adjustable line tension and styling
- Point size and style customization
- Stepped line options
Pie Charts
- Interactive donut hole adjustment
- Percentage/value toggle display
- Legend positioning and styling
- Border width and color controls
๐งช Testing
Running Tests
# Test metadata extraction
python test_metadata.py
# Test pandas code execution
python test_execution.py
# Test chart generation
python test_generate_barchart.py
# Test all chart types
python test_generate_pyecharts.py
Test Data Requirements
- Excel files (.xlsx) with multiple sheets
- CSV files with various encodings
- JSON files with structured data for chart generation
๐ Performance Optimization
Memory Management
- Chunked processing for large files
- Automatic garbage collection
- Memory usage logging
- Dataset size limits
File Processing
- Optimized dtype inference
- Category encoding for string columns
- Float32 precision for numeric data
- Streaming CSV reading
๐ Logging
Log Structure
- mcp_server.log: Main application log
- memory_usage: Memory consumption tracking
- metadata: File processing details
Log Levels
- DEBUG: Detailed processing information
- INFO: General operation status
- WARNING: Non-critical issues
- ERROR: Processing failures
๐ Troubleshooting
Common Issues
MCP Connection Issues
- Verify server path in Claude Desktop configuration
- Check Python environment and dependencies
- Ensure server.py is executable
- Review MCP server logs for connection errors
File Not Found
- Verify file path is absolute
- Check file permissions
- Ensure file exists before processing
Memory Issues
- Reduce file size or use chunked processing
- Monitor memory usage in logs
- Consider data sampling for large datasets
Chart Generation Errors
- Verify data structure matches expected format
- Check for required columns (string + numeric)
- Ensure Chart.js CDN accessibility
Debug Mode
Enable debug logging by setting environment variable:
export LOG_LEVEL=DEBUG
python server.py
๐ค Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
๐ Additional Documentation
- CSV Metadata Format: See
csv_metadata_format.md
for detailed CSV processing documentation
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Support
For issues and questions:
- Check the troubleshooting section
- Review log files in the
logs/
directory - Open an issue on GitHub with reproduction steps