ray-mcp

pradeepiyer/ray-mcp

3.3

If you are the rightful owner of ray-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Ray MCP Server is a Model Context Protocol server designed for Ray distributed computing, enabling LLM agents to manage Ray clusters, submit jobs, and monitor workloads programmatically.

Ray MCP Server

Model Context Protocol (MCP) server for Ray distributed computing. Enables LLM agents to programmatically manage Ray clusters, submit jobs, and monitor distributed workloads.

Overview

Ray MCP provides a bridge between LLM agents and Ray distributed computing through the MCP protocol. Built with a modular, maintainable architecture using Domain-Driven Design principles.

Features

  • Cluster Management: Initialize, connect to, and stop Ray clusters
  • Job Operations: Submit, monitor, cancel, and inspect distributed jobs
  • Worker Node Control: Manage worker nodes with custom resource configurations
  • Comprehensive Logging: Retrieve and analyze logs with error detection
  • Multi-Node Support: Handle head-only or multi-worker cluster topologies

Installation

# Install with uv (recommended)
git clone https://github.com/pradeepiyer/ray-mcp.git
cd ray-mcp
uv sync

# Or with pip
pip install -e .

Quick Start

1. Configure MCP Client

Add to your MCP client configuration (e.g., Claude Desktop):

{
  "mcpServers": {
    "ray-mcp": {
      "command": "uv",
      "args": ["run", "ray-mcp"],
      "cwd": "/path/to/ray-mcp"
    }
  }
}

2. Basic Usage

# Initialize a Ray cluster
init_ray()

# Submit a distributed job
submit_job(entrypoint="python my_script.py")

# Monitor cluster status
inspect_ray()

# Retrieve job logs
retrieve_logs(identifier="job_123")

Available Tools

  • init_ray - Initialize or connect to Ray cluster
  • stop_ray - Stop Ray cluster
  • inspect_ray - Get cluster status and information
  • submit_job - Submit jobs to the cluster
  • list_jobs - List all jobs
  • inspect_job - Inspect specific job with logs/debug info
  • cancel_job - Cancel running jobs
  • retrieve_logs - Get logs with optional pagination and error analysis

Architecture

Ray MCP uses a modular architecture with focused components:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    MCP Protocol    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   LLM Agent     │◄──────────────────►│   Ray MCP       β”‚
β”‚                 β”‚                    β”‚   Server        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                                                 β”‚
                                                 β–Ό
                                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                       β”‚   Core Layer    β”‚
                                       β”‚                 β”‚
                                       β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                                       β”‚ β”‚StateManager β”‚ β”‚
                                       β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                                       β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                                       β”‚ β”‚ClusterMgr   β”‚ β”‚
                                       β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                                       β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                                       β”‚ β”‚JobManager   β”‚ β”‚
                                       β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                                       β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                                       β”‚ β”‚LogManager   β”‚ β”‚
                                       β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                                       β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                                       β”‚ β”‚PortManager  β”‚ β”‚
                                       β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                                                 β”‚
                                       Ray API   β”‚
                                                 β–Ό
                                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                       β”‚   Ray Cluster   β”‚
                                       β”‚                 β”‚
                                       β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
                                       β”‚  β”‚Head Node β”‚   β”‚
                                       β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
                                       β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
                                       β”‚  β”‚Worker 1  β”‚   β”‚
                                       β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
                                       β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
                                       β”‚  β”‚Worker N  β”‚   β”‚
                                       β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
                                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

  • StateManager: Thread-safe cluster state management
  • ClusterManager: Pure cluster lifecycle operations
  • JobManager: Job operations and lifecycle management
  • LogManager: Centralized log retrieval with memory protection
  • PortManager: Port allocation with race condition prevention
  • UnifiedManager: Backward compatibility facade

Development

# Run tests
make test          # Complete test suite
make test-fast     # Unit tests only
make test-smoke    # Critical functionality validation

# Code quality
make lint          # Linting checks
make format        # Code formatting

Documentation

  • - Setup and configuration options
  • - Complete tool documentation
  • - Usage examples and patterns
  • - Development setup and testing
  • - Common issues and solutions

Requirements

  • Python β‰₯ 3.10
  • Ray β‰₯ 2.47.0
  • MCP β‰₯ 1.0.0

License

Apache-2.0 License

Contributing

Contributions welcome! See for setup instructions.