pradeepiyer/ray-mcp
If you are the rightful owner of ray-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Ray MCP Server is a Model Context Protocol server designed for Ray distributed computing, enabling LLM agents to manage Ray clusters, submit jobs, and monitor workloads programmatically.
Ray MCP Server
Model Context Protocol (MCP) server for Ray distributed computing. Enables LLM agents to programmatically manage Ray clusters, submit jobs, and monitor distributed workloads.
Overview
Ray MCP provides a bridge between LLM agents and Ray distributed computing through the MCP protocol. Built with a modular, maintainable architecture using Domain-Driven Design principles.
Features
- Cluster Management: Initialize, connect to, and stop Ray clusters
- Job Operations: Submit, monitor, cancel, and inspect distributed jobs
- Worker Node Control: Manage worker nodes with custom resource configurations
- Comprehensive Logging: Retrieve and analyze logs with error detection
- Multi-Node Support: Handle head-only or multi-worker cluster topologies
Installation
# Install with uv (recommended)
git clone https://github.com/pradeepiyer/ray-mcp.git
cd ray-mcp
uv sync
# Or with pip
pip install -e .
Quick Start
1. Configure MCP Client
Add to your MCP client configuration (e.g., Claude Desktop):
{
"mcpServers": {
"ray-mcp": {
"command": "uv",
"args": ["run", "ray-mcp"],
"cwd": "/path/to/ray-mcp"
}
}
}
2. Basic Usage
# Initialize a Ray cluster
init_ray()
# Submit a distributed job
submit_job(entrypoint="python my_script.py")
# Monitor cluster status
inspect_ray()
# Retrieve job logs
retrieve_logs(identifier="job_123")
Available Tools
init_ray
- Initialize or connect to Ray clusterstop_ray
- Stop Ray clusterinspect_ray
- Get cluster status and informationsubmit_job
- Submit jobs to the clusterlist_jobs
- List all jobsinspect_job
- Inspect specific job with logs/debug infocancel_job
- Cancel running jobsretrieve_logs
- Get logs with optional pagination and error analysis
Architecture
Ray MCP uses a modular architecture with focused components:
βββββββββββββββββββ MCP Protocol βββββββββββββββββββ
β LLM Agent βββββββββββββββββββββΊβ Ray MCP β
β β β Server β
βββββββββββββββββββ βββββββββββ¬ββββββββ
β
βΌ
βββββββββββββββββββ
β Core Layer β
β β
β βββββββββββββββ β
β βStateManager β β
β βββββββββββββββ β
β βββββββββββββββ β
β βClusterMgr β β
β βββββββββββββββ β
β βββββββββββββββ β
β βJobManager β β
β βββββββββββββββ β
β βββββββββββββββ β
β βLogManager β β
β βββββββββββββββ β
β βββββββββββββββ β
β βPortManager β β
β βββββββββββββββ β
βββββββββββ¬ββββββββ
β
Ray API β
βΌ
βββββββββββββββββββ
β Ray Cluster β
β β
β ββββββββββββ β
β βHead Node β β
β ββββββββββββ β
β ββββββββββββ β
β βWorker 1 β β
β ββββββββββββ β
β ββββββββββββ β
β βWorker N β β
β ββββββββββββ β
βββββββββββββββββββ
Core Components
- StateManager: Thread-safe cluster state management
- ClusterManager: Pure cluster lifecycle operations
- JobManager: Job operations and lifecycle management
- LogManager: Centralized log retrieval with memory protection
- PortManager: Port allocation with race condition prevention
- UnifiedManager: Backward compatibility facade
Development
# Run tests
make test # Complete test suite
make test-fast # Unit tests only
make test-smoke # Critical functionality validation
# Code quality
make lint # Linting checks
make format # Code formatting
Documentation
- - Setup and configuration options
- - Complete tool documentation
- - Usage examples and patterns
- - Development setup and testing
- - Common issues and solutions
Requirements
- Python β₯ 3.10
- Ray β₯ 2.47.0
- MCP β₯ 1.0.0
License
Apache-2.0 License
Contributing
Contributions welcome! See for setup instructions.