mcp-amdsmi

AMD-melliott/mcp-amdsmi

3.2

If you are the rightful owner of mcp-amdsmi and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The AMD SMI MCP Server is an intelligent Model Context Protocol server designed to provide conversational access to AMD GPU monitoring capabilities through the FastMCP framework.

Tools
  1. get_gpu_discovery

    Discovers and enumerates all available AMD GPU devices.

  2. get_gpu_status

    Provides comprehensive current status of a specific GPU.

  3. get_gpu_performance

    Analyzes GPU performance metrics and efficiency.

  4. analyze_gpu_memory

    Detailed GPU memory usage analysis.

  5. monitor_power_thermal

    Monitors GPU power consumption and thermal status.

  6. check_gpu_health

    Comprehensive GPU health assessment with recommendations.

AMD SMI MCP Server

An intelligent Model Context Protocol (MCP) server that provides conversational access to AMD GPU monitoring capabilities through the FastMCP framework. Designed for infrastructure management, performance analysis, and workshop demonstrations.

Features

  • Six Core Monitoring Tools: Device discovery, status monitoring, performance analysis, memory analysis, power/thermal monitoring, and health assessment
  • Intelligent Health Analysis: AI-powered health scoring with contextual recommendations
  • N/A Value Handling: Robust handling of missing or unavailable metrics without failures
  • FastMCP Integration: Modern MCP implementation with proper tool registration and error handling
  • Demo Mode Support: Works on development systems without enterprise GPUs

Quick Start

Prerequisites

  • Python 3.11+
  • AMD GPU with ROCm/AMD SMI installed (or any system for demo mode)
  • Git

Installation

Option 1: Install as a Package (Recommended)

  1. Clone the repository:

    git clone <repository-url>
    cd mcp-amdsmi
    
  2. Install the package:

    pip install -e .
    
  3. Test the installation:

    mcp-amdsmi --help
    

Option 2: Development Installation

  1. Clone and create virtual environment:

    git clone <repository-url>
    cd mcp-amdsmi
    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  2. Install in development mode:

    pip install -e .
    
  3. Test the installation:

    python test_monitoring.py
    

Running the MCP Server

Add this simplified configuration to your MCP client:

{
  "mcpServers": {
    "mcp-amdsmi": {
      "command": "mcp-amdsmi"
    }
  }
}

Or if you want to use a specific installation:

{
  "mcpServers": {
    "mcp-amdsmi": {
      "command": "/path/to/venv/bin/mcp-amdsmi"
    }
  }
}

Available Tools

1. get_gpu_discovery

Discovers and enumerates all available AMD GPU devices.

  • Returns device information, driver versions, and hardware specifications
  • Works in both real hardware and demo modes

2. get_gpu_status

Provides comprehensive current status of a specific GPU.

  • Temperature, power, utilization, memory, clock speeds, and fan data
  • Includes overall health score (0-100)
  • Parameters: device_id (string, default: "0")

3. get_gpu_performance

Analyzes GPU performance metrics and efficiency.

  • Performance analysis with efficiency scoring
  • Utilization patterns and bottleneck identification
  • Parameters: device_id (string, default: "0")

4. analyze_gpu_memory

Detailed GPU memory usage analysis.

  • Memory health assessment
  • Usage patterns and recommendations
  • Parameters: device_id (string, default: "0")

5. monitor_power_thermal

Monitors GPU power consumption and thermal status.

  • Real-time power and temperature data
  • Thermal warnings and power efficiency metrics
  • Parameters: device_id (string, default: "0")

6. check_gpu_health

Comprehensive GPU health assessment with recommendations.

  • Overall health status and scoring
  • Issue detection and actionable recommendations
  • Parameters: device_id (string, default: "0")

Example Usage

Once integrated with Claude Code, you can use natural language queries:

  • "What GPUs are available in the system?"
  • "Check the health of GPU 0"
  • "Show me the performance metrics for all GPUs"
  • "Is GPU 0 running too hot?"
  • "Analyze memory usage patterns"

Architecture

The system consists of three main layers:

  1. AMD SMI Interface Layer (AMDSMIManager) - Abstracts AMD SMI Python API with robust error handling
  2. Business Logic Layer (HealthAnalyzer, PerformanceInterpreter) - Provides intelligent analysis and recommendations
  3. MCP Server Layer (FastMCP-based) - Exposes functionality as conversational tools

Demo Mode

The server automatically falls back to demo mode when:

  • No AMD GPUs are detected
  • AMD SMI library is unavailable
  • Hardware access fails

Demo mode provides realistic mock data for development and testing.

N/A Value Handling

The server gracefully handles missing or "N/A" values common in:

  • Development environments
  • Limited hardware access scenarios
  • Partial metric availability

Missing values receive neutral health scores (80.0) and don't cause failures.

Development

Project Structure

mcp-amdsmi/
ā”œā”€ā”€ src/amd_smi_mcp/
│   ā”œā”€ā”€ server.py              # FastMCP server with tool definitions
│   ā”œā”€ā”€ amd_smi_wrapper.py     # AMD SMI library abstraction
│   └── business_logic.py      # Health analysis and performance interpretation
ā”œā”€ā”€ tests/                     # Unit and integration tests
ā”œā”€ā”€ test_monitoring.py         # Comprehensive test script
ā”œā”€ā”€ requirements.txt           # Python dependencies
└── README.md                  # This file

Running Tests

source venv/bin/activate
python test_monitoring.py      # Comprehensive functionality test
pytest                         # Unit tests (if available)

Code Quality

source venv/bin/activate
black src/                     # Code formatting
flake8 src/                    # Linting
mypy src/                      # Type checking

Workshop Integration

Designed for PEARC25 workshop demonstrations:

  • 30-second response times for single GPU queries
  • Support for 30 concurrent users
  • Educational explanations and visual indicators
  • Fallback modes for demonstration reliability

Troubleshooting

Common Issues

1. "amdsmi library not available"

  • Install ROCm and AMD SMI library
  • Server will automatically use demo mode if unavailable

2. "No AMD GPU devices found"

  • Check GPU hardware installation
  • Verify driver installation
  • Server continues in demo mode

3. "Permission denied" errors

  • Ensure user has GPU access permissions
  • May require adding user to appropriate groups

4. Import errors in Claude Code

  • Verify cwd and PYTHONPATH in MCP configuration
  • Ensure virtual environment activation if using venv

Logging

Enable debug logging by setting environment variable:

export PYTHONPATH=/path/to/mcp-amdsmi
export LOG_LEVEL=DEBUG

License

[License information to be added]

Contributing

[Contributing guidelines to be added]