xiaosongz/r-mcp-data-explorer
If you are the rightful owner of r-mcp-data-explorer and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The R MCP Data Explorer is a Model Context Protocol server designed for Claude Desktop, enabling data exploration and analysis using R and tidyverse syntax, with support for Arrow and DuckDB for handling large datasets.
R MCP Data Explorer
A Model Context Protocol (MCP) server for Claude Desktop that enables data exploration and analysis using R and tidyverse syntax, with support for Arrow and DuckDB for handling large datasets.
Overview
This project is an R implementation of an MCP server, inspired by the JavaScript MCP Data Explorer. It provides Claude with the ability to:
- Load and analyze CSV, Parquet, and other data formats
- Execute R code with tidyverse syntax
- Handle large datasets efficiently using Arrow and DuckDB
- Create visualizations with ggplot2
- Run SQL queries on loaded data
Quick Start
Prerequisites
- R 4.0 or higher
- Claude Desktop application
Installation
- Clone this repository:
git clone https://github.com/yourusername/r-mcp-data-explorer.git
cd r-mcp-data-explorer
- Install required R packages:
install.packages(c("jsonlite", "tidyverse", "base64enc", "callr"))
# For full version (optional):
# install.packages(c("arrow", "duckdb"))
- Run the setup script:
# For minimal version (recommended for initial setup):
Rscript setup_minimal.R
# For full version with arrow/duckdb support:
# Rscript inst/setup.R
- Restart Claude Desktop
Usage
Once configured, the R MCP Data Explorer will be available in Claude Desktop. You can use it through two main tools:
1. load_data
Load CSV files into memory:
Use the load_data tool to load "path/to/your/data.csv" as "mydata"
2. run_tidyverse
Execute R code on loaded datasets:
Use run_tidyverse to execute:
mydata %>%
group_by(category) %>%
summarise(
count = n(),
mean_value = mean(value, na.rm = TRUE)
)
3. query_duckdb (Full version only)
Run SQL queries on loaded data:
Use query_duckdb to run:
SELECT category, COUNT(*) as count, AVG(value) as avg_value
FROM mydata
GROUP BY category
ORDER BY count DESC
Examples
Basic Data Analysis
- Load sample data:
Load the file "data/sample_data.csv" as "df"
- Explore the data:
Run this tidyverse code:
# View structure
glimpse(df)
# Summary statistics
df %>%
summary()
- Create visualizations:
Run this code to create a plot:
library(ggplot2)
ggplot(df, aes(x = category, y = value, fill = region)) +
geom_boxplot() +
theme_minimal() +
labs(title = "Value Distribution by Category and Region")
Data Transformation
Run this tidyverse code:
df %>%
filter(value > 50) %>%
mutate(
value_squared = value^2,
month = format(date, "%Y-%m")
) %>%
group_by(category, region, month) %>%
summarise(
n = n(),
mean_value = mean(value),
mean_squared = mean(value_squared),
.groups = "drop"
) %>%
arrange(desc(mean_squared))
Architecture
The project uses a three-tier storage system:
- Small data (<100MB): In-memory tibbles for fast access
- Medium data (100MB-1GB): Arrow datasets with memory mapping
- Large data (>1GB): DuckDB for SQL-based operations
Project Structure
r-mcp-data-explorer/
├── R/
│ ├── server.R # Full MCP server implementation
│ ├── server_minimal.R # Minimal version (no arrow/duckdb)
│ ├── tools/
│ │ ├── data_loader.R # Handles data loading
│ │ ├── script_runner.R # Executes R code
│ │ └── query_runner.R # SQL query execution
│ ├── utils/
│ │ ├── mcp_transport.R # MCP protocol handling
│ │ ├── data_manager.R # Data storage management
│ │ ├── security.R # Sandboxing utilities
│ │ ├── visualization.R # Plot capture
│ │ └── logging.R # Logging utilities
│ └── prompts/
│ └── explore_data.R # Data exploration prompts
├── data/ # Sample data directory
├── logs/ # Server logs
├── inst/ # Installation files
│ ├── setup.R # Full setup script
│ └── config/
│ └── allowed_packages.txt
├── tests/ # Test files
└── setup_minimal.R # Minimal setup script
Minimal vs Full Version
This repository includes two versions:
Minimal Version (server_minimal.R)
- ✅ No external dependencies beyond tidyverse
- ✅ Quick to set up and debug
- ✅ Supports CSV files
- ✅ Basic R code execution
- ❌ No support for large files
- ❌ No SQL queries
Full Version (server.R)
- ✅ Supports multiple file formats (CSV, Parquet, Arrow, DuckDB)
- ✅ Three-tier storage for efficient large data handling
- ✅ SQL query support via DuckDB
- ✅ Advanced security sandboxing
- ❌ Requires arrow and duckdb packages
- ❌ More complex setup
Troubleshooting
Server not appearing in Claude Desktop
- Ensure Claude Desktop is fully closed before running setup
- Check the configuration at:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
- macOS:
- Verify R is in your PATH:
which Rscript
Package installation issues
- For arrow package compilation issues, see: https://arrow.apache.org/docs/r/articles/install.html
- Consider using the minimal version if you encounter installation problems
Errors when loading data
- Check file paths are absolute or relative to the working directory
- Ensure CSV files are properly formatted
- Check server logs in
R/logs/(minimal) orlogs/(full version)
Code execution errors
- The error message will indicate which packages need to be loaded
- Check that dataset names match exactly (R is case-sensitive)
- Ensure your tidyverse syntax is correct
Development Status
✅ Implemented:
- Core MCP protocol handling
- CSV data loading
- Tidyverse code execution
- Basic data storage
- Logging system
- Minimal server version
🚧 In Progress:
- Full arrow/duckdb integration
- Advanced sandboxing with callr
- Plot capture and base64 encoding
- Comprehensive test suite
📋 Planned:
- Support for more file formats (Excel, JSON)
- Performance optimizations
- Interactive plot support
- Memory usage monitoring
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
License
MIT License - see LICENSE file for details
Acknowledgments
- Built for use with Claude Desktop and the Model Context Protocol
- Inspired by the JavaScript MCP data explorer
- Uses the tidyverse ecosystem for data analysis
- Leverages Apache Arrow and DuckDB for large data handling