pandas-mcp-server

marlonluo2018/pandas-mcp-server

3.4

If you are the rightful owner of pandas-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Excel-MCP Server Visualization Project provides visualization capabilities for Excel data through an MCP server interface, generating interactive charts using Pyecharts.

Pandas-MCP Server

A comprehensive Model Context Protocol (MCP) server that enables LLMs to execute pandas code through a standardized workflow for data analysis and visualization.

๐ŸŽฏ MCP Server Overview

The Pandas-MCP Server is designed as a Model Context Protocol (MCP) server that provides LLMs with powerful data processing capabilities. MCP is a standardized protocol that allows AI models to interact with external tools and services in a secure, structured way.

๐Ÿ› ๏ธ Installation

Prerequisites

  • Python 3.8+
  • pip package manager
  • Git (for cloning the repository)

Step 1: Clone the Repository

git clone https://github.com/marlonluo2018/pandas-mcp-server.git
cd pandas-mcp-server

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Verify Installation

# Test the CLI interface
python cli.py

# Or test the MCP server directly
python server.py

Dependencies

  • pandas>=2.0.0 - Data manipulation and analysis
  • fastmcp>=1.0.0 - MCP server framework
  • chardet>=5.0.0 - Character encoding detection
  • psutil - System monitoring for memory optimization

Claude Desktop Configuration

Add this configuration to your Claude Desktop settings:

{
  "mcpServers": {
    "pandas-server": {
      "type": "stdio",
      "command": "python",
      "args": ["/path/to/your/pandas-mcp-server/server.py"]
    }
  }
}

Note: Replace /path/to/your/pandas-mcp-server/server.py with the actual path where you cloned the repository.

Example paths:

  • Windows: "C:\\Users\\YourName\\pandas-mcp-server\\server.py"
  • macOS/Linux: "/home/username/pandas-mcp-server/server.py"

Configuration File Location

  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Verification

After configuration, restart Claude Desktop. The server should appear in the MCP tools list with three available tools:

  • read_metadata_tool - File analysis
  • run_pandas_code_tool - Code execution
  • generate_chartjs_tool - Chart generation

๐Ÿ”„ Workflow

The pandas MCP server follows a structured three-step workflow for data analysis and visualization:

Step 1: Read File Metadata

LLM calls read_metadata_tool to understand the file structure:

  • Extract file type, size, encoding, and column information
  • Get data types, sample values, and statistical summaries
  • Receive data quality warnings and suggested operations
  • Understand the dataset structure before processing

Step 2: Execute Pandas Operations

LLM calls run_pandas_code_tool based on metadata analysis:

  • Formulate pandas operations using the understood file structure
  • Execute data processing, filtering, aggregation, or analysis
  • Receive results in DataFrame, Series, or dictionary format
  • Get optimized output with memory management

Step 3: Generate Visualizations

LLM calls generate_chartjs_tool to create interactive charts:

  • Transform processed data into Chart.js compatible format
  • Generate interactive HTML charts with customization controls
  • Create bar, line, or pie charts based on data characteristics
  • Output responsive visualizations for analysis presentation

๐Ÿš€ MCP Server Tools

The server exposes three main tools for LLM integration:

1. read_metadata_tool - File Analysis

Extract comprehensive metadata from Excel and CSV files including:

  • File type, size, encoding, and structure
  • Column names, data types, and sample values
  • Statistical summaries (null counts, unique values, min/max/mean)
  • Data quality warnings and suggested operations
  • Memory-optimized processing for large files

MCP Tool Usage:

{
  "tool": "read_metadata_tool",
  "args": {
    "file_path": "/path/to/sales_data.xlsx"
  }
}

2. run_pandas_code_tool - Secure Code Execution

Execute pandas operations with:

  • Security filtering against malicious code
  • Memory optimization for large datasets
  • Comprehensive error handling and debugging
  • Support for DataFrame, Series, and dictionary results
Forbidden Operations

The following operations are blocked for security reasons:

  • System Access: os., sys., subprocess. - Prevents file system and system access
  • Code Execution: open(), exec(), eval() - Blocks dynamic code execution
  • Dangerous Imports: import os, import sys - Prevents specific harmful imports
  • Browser/DOM Access: document., window., XMLHttpRequest - Blocks browser operations
  • JavaScript/Remote: fetch(), eval(), Function() - Prevents remote code execution
  • Script Injection: script, javascript: - Blocks script injection attempts

Requirements:

  • Final result must be assigned to result variable
  • Code should include necessary imports (pandas available as pd)
  • All code goes through security filtering before execution

MCP Tool Usage:

{
  "tool": "run_pandas_code_tool",
  "args": {
    "code": "import pandas as pd\ndf = pd.read_excel('/path/to/data.xlsx')\nresult = df.groupby('Region')['Sales'].sum()"
  }
}

3. generate_chartjs_tool - Interactive Visualizations

Generate interactive charts with Chart.js:

  • Bar charts - For categorical comparisons
  • Line charts - For trend analysis
  • Pie charts - For proportional data
  • Interactive HTML templates with customization controls

MCP Tool Usage:

{
  "tool": "generate_chartjs_tool",
  "args": {
    "data": {
      "columns": [
        {
          "name": "Region",
          "type": "string",
          "examples": ["North", "South", "East", "West"]
        },
        {
          "name": "Sales",
          "type": "number",
          "examples": [15000, 12000, 18000, 9000]
        }
      ]
    },
    "chart_types": ["bar"],
    "title": "Sales by Region"
  }
}

๐Ÿš€ Usage

CLI Interface (Testing & Development)

The cli.py provides a convenient command-line interface for testing the MCP server functionality without requiring an MCP client:

Interactive Mode
python cli.py

Launches a guided menu system with:

  • Step-by-step workflow guidance
  • Automatic input validation
  • Clear error messages
  • Support for file paths with spaces
Command-Line Mode
# Read metadata
python cli.py metadata data.xlsx

# Execute pandas code
python cli.py execute analysis.py

# Generate charts
python cli.py chart data.json --type bar --title "Sales Analysis"

๐Ÿ” Code Logic & Architecture

Core Components

1. Server Architecture (server.py)
  • FastMCP Integration: Uses FastMCP framework for MCP protocol implementation
  • Logging System: Unified logging with rotation and memory tracking
  • Tool Registration: Exposes three main tools with proper error handling
  • Memory Monitoring: Tracks memory usage before/after operations
2. Metadata Processing (core/metadata.py)

Key Logic:

  • File validation (existence, size limits)
  • Encoding detection for CSV files
  • Memory-optimized data processing (100-row samples)
  • Comprehensive statistical analysis
  • Data quality assessment and warnings

Memory Optimization:

  • Uses category dtype for string columns with low cardinality
  • Converts float64 to float32 for memory efficiency
  • Processes only first 100 rows for metadata extraction
  • Forces garbage collection after processing
3. Code Execution (core/execution.py)

Security Features:

  • Blacklist filtering for dangerous operations
  • Sandboxed execution environment
  • Output capture and error handling
  • Memory monitoring for large results

Execution Flow:

  1. Security check against BLACKLIST patterns
  2. Syntax validation through compilation
  3. Code execution in isolated environment
  4. Result formatting and memory optimization
  5. Output capture and error reporting
4. Chart Generation (core/visualization.py)

Architecture:

  • Template-based HTML generation
  • Chart.js integration via CDN
  • Interactive controls for customization
  • Automatic file naming and organization

Chart Types:

  • Bar Charts: Categorical data with bar width and Y-axis controls
  • Line Charts: Trend analysis with line styling options
  • Pie Charts: Proportional data with donut hole and percentage display
5. Chart Generators (core/chart_generators/)

Base Class (base.py):

  • Abstract base class for all chart generators
  • Template management and file I/O
  • Common chart configuration

Specific Generators:

  • BarChartGenerator: Bar charts with interactive controls
  • LineChartGenerator: Line charts with tension and styling
  • PieChartGenerator: Pie charts with legend and percentage options

Data Flow Architecture

User Input โ†’ Security Check โ†’ Processing โ†’ Result โ†’ Output
    โ†“              โ†“            โ†“         โ†“         โ†“
  CLI/MCP โ†’ BLACKLIST โ†’ Memory Opt โ†’ Format โ†’ Log/Display

Memory Management Strategy

  1. Chunked Processing: Large files processed in 10KB chunks
  2. Type Optimization: Automatic dtype conversion (float64โ†’float32, objectโ†’category)
  3. Limited Sampling: Only first 100 rows processed for metadata
  4. Garbage Collection: Forced cleanup after major operations
  5. Memory Monitoring: PSutil integration for tracking usage

๐Ÿ“ Project Structure

pandas-mcp-server/
โ”œโ”€โ”€ server.py                 # MCP server implementation
โ”œโ”€โ”€ cli.py                    # CLI interface for testing
โ”œโ”€โ”€ requirements.txt          # Python dependencies
โ”œโ”€โ”€ core/                     # Core functionality
โ”‚   โ”œโ”€โ”€ config.py            # Configuration and constants
โ”‚   โ”œโ”€โ”€ data_types.py        # Data type utilities
โ”‚   โ”œโ”€โ”€ metadata.py          # File metadata extraction
โ”‚   โ”œโ”€โ”€ execution.py         # Pandas code execution
โ”‚   โ”œโ”€โ”€ visualization.py     # Chart generation orchestration
โ”‚   โ””โ”€โ”€ chart_generators/    # Chart-specific implementations
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ base.py          # Base chart generator
โ”‚       โ”œโ”€โ”€ bar.py           # Bar chart generator
โ”‚       โ”œโ”€โ”€ line.py          # Line chart generator
โ”‚       โ””โ”€โ”€ pie.py           # Pie chart generator
โ”‚       โ””โ”€โ”€ templates/       # HTML templates for charts
โ”œโ”€โ”€ charts/                  # Generated chart files
โ”œโ”€โ”€ logs/                    # Application logs
โ”œโ”€โ”€ csv_metadata_format.md   # CSV metadata documentation
โ””โ”€โ”€ test_*.py               # Test files

๐Ÿ”ง Configuration

Core Configuration (core/config.py)

  • MAX_FILE_SIZE: 100MB file size limit
  • BLACKLIST: Security restrictions for code execution
  • CHARTS_DIR: Directory for generated charts
  • Logging: Comprehensive logging with rotation

Security Features

  • Code execution sandboxing
  • Blacklisted operations (file system, network, eval)
  • Memory usage monitoring
  • Input validation and sanitization

๐Ÿ“Š Chart Generation Details

Template System

Charts are generated using HTML templates with:

  • Chart.js integration via CDN
  • Interactive controls for customization
  • Responsive design with mobile support
  • Real-time parameter adjustment

Chart Types

Bar Charts
  • Interactive controls for bar width and Y-axis scaling
  • Responsive design with zoom capabilities
  • Data labels and tooltips
  • Multiple dataset support
Line Charts
  • Multiple line series support
  • Adjustable line tension and styling
  • Point size and style customization
  • Stepped line options
Pie Charts
  • Interactive donut hole adjustment
  • Percentage/value toggle display
  • Legend positioning and styling
  • Border width and color controls

๐Ÿงช Testing

Running Tests

# Test metadata extraction
python test_metadata.py

# Test pandas code execution
python test_execution.py

# Test chart generation
python test_generate_barchart.py

# Test all chart types
python test_generate_pyecharts.py

Test Data Requirements

  • Excel files (.xlsx) with multiple sheets
  • CSV files with various encodings
  • JSON files with structured data for chart generation

๐Ÿ“ˆ Performance Optimization

Memory Management

  • Chunked processing for large files
  • Automatic garbage collection
  • Memory usage logging
  • Dataset size limits

File Processing

  • Optimized dtype inference
  • Category encoding for string columns
  • Float32 precision for numeric data
  • Streaming CSV reading

๐Ÿ” Logging

Log Structure

  • mcp_server.log: Main application log
  • memory_usage: Memory consumption tracking
  • metadata: File processing details

Log Levels

  • DEBUG: Detailed processing information
  • INFO: General operation status
  • WARNING: Non-critical issues
  • ERROR: Processing failures

๐Ÿ› Troubleshooting

Common Issues

MCP Connection Issues
  • Verify server path in Claude Desktop configuration
  • Check Python environment and dependencies
  • Ensure server.py is executable
  • Review MCP server logs for connection errors
File Not Found
  • Verify file path is absolute
  • Check file permissions
  • Ensure file exists before processing
Memory Issues
  • Reduce file size or use chunked processing
  • Monitor memory usage in logs
  • Consider data sampling for large datasets
Chart Generation Errors
  • Verify data structure matches expected format
  • Check for required columns (string + numeric)
  • Ensure Chart.js CDN accessibility

Debug Mode

Enable debug logging by setting environment variable:

export LOG_LEVEL=DEBUG
python server.py

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

๐Ÿ“„ Additional Documentation

  • CSV Metadata Format: See csv_metadata_format.md for detailed CSV processing documentation

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ†˜ Support

For issues and questions:

  • Check the troubleshooting section
  • Review log files in the logs/ directory
  • Open an issue on GitHub with reproduction steps