ray-mcp

pradeepiyer/ray-mcp

3.3

If you are the rightful owner of ray-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Ray MCP Server is a Model Context Protocol server designed for Ray distributed computing, enabling LLM agents to manage Ray clusters, submit jobs, and monitor workloads programmatically.

Ray MCP Server acts as a bridge between LLM agents and Ray distributed computing environments through the Model Context Protocol (MCP). It provides a structured interface for AI agents to interact with Ray's capabilities, such as cluster management, job submission, and workload monitoring. By leveraging the MCP protocol, Ray MCP Server allows for seamless integration and control over distributed computing resources, making it easier for AI agents to execute complex tasks across multiple nodes. This server is particularly useful for applications that require high scalability and efficient resource management, as it supports real-time monitoring and comprehensive logging. With its ability to handle both head-only and multi-worker cluster topologies, Ray MCP Server is versatile and adaptable to various distributed computing needs.

Features

  • Cluster Management: Initialize, connect to, and stop Ray clusters.
  • Job Operations: Submit, monitor, cancel, and inspect distributed jobs.
  • Worker Node Control: Manage worker nodes with custom resource configurations.
  • Comprehensive Logging: Retrieve and analyze logs with error detection.
  • Resource Monitoring: Real-time cluster health and performance metrics.

Usages

usage with claude desktop

{
  "mcpServers": {
    "ray-mcp": {
      "command": "uv",
      "args": ["run", "ray-mcp"],
      "cwd": "/path/to/ray-mcp"
    }
  }
}

Tools

  1. init_ray

    Initialize or connect to Ray cluster

  2. stop_ray

    Stop Ray cluster

  3. inspect_ray

    Get cluster status and metrics

  4. submit_job

    Submit jobs to the cluster

  5. list_jobs

    List all jobs

  6. inspect_job

    Inspect specific job with logs/debug info

  7. cancel_job

    Cancel running jobs

  8. retrieve_logs

    Get logs with error analysis

  9. retrieve_logs_paginated

    Get logs with pagination support