AMD-melliott/mcp-amdsmi
If you are the rightful owner of mcp-amdsmi and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The AMD SMI MCP Server is an intelligent Model Context Protocol server designed to provide conversational access to AMD GPU monitoring capabilities through the FastMCP framework.
get_gpu_discovery
Discovers and enumerates all available AMD GPU devices.
get_gpu_status
Provides comprehensive current status of a specific GPU.
get_gpu_performance
Analyzes GPU performance metrics and efficiency.
analyze_gpu_memory
Detailed GPU memory usage analysis.
monitor_power_thermal
Monitors GPU power consumption and thermal status.
check_gpu_health
Comprehensive GPU health assessment with recommendations.
AMD SMI MCP Server
An intelligent Model Context Protocol (MCP) server that provides conversational access to AMD GPU monitoring capabilities through the FastMCP framework. Designed for infrastructure management, performance analysis, and workshop demonstrations.
Features
- Six Core Monitoring Tools: Device discovery, status monitoring, performance analysis, memory analysis, power/thermal monitoring, and health assessment
- Intelligent Health Analysis: AI-powered health scoring with contextual recommendations
- N/A Value Handling: Robust handling of missing or unavailable metrics without failures
- FastMCP Integration: Modern MCP implementation with proper tool registration and error handling
- Demo Mode Support: Works on development systems without enterprise GPUs
Quick Start
Prerequisites
- Python 3.11+
- AMD GPU with ROCm/AMD SMI installed (or any system for demo mode)
- Git
Installation
Option 1: Install as a Package (Recommended)
-
Clone the repository:
git clone <repository-url> cd mcp-amdsmi
-
Install the package:
pip install -e .
-
Test the installation:
mcp-amdsmi --help
Option 2: Development Installation
-
Clone and create virtual environment:
git clone <repository-url> cd mcp-amdsmi python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install in development mode:
pip install -e .
-
Test the installation:
python test_monitoring.py
Running the MCP Server
Add this simplified configuration to your MCP client:
{
"mcpServers": {
"mcp-amdsmi": {
"command": "mcp-amdsmi"
}
}
}
Or if you want to use a specific installation:
{
"mcpServers": {
"mcp-amdsmi": {
"command": "/path/to/venv/bin/mcp-amdsmi"
}
}
}
Available Tools
1. get_gpu_discovery
Discovers and enumerates all available AMD GPU devices.
- Returns device information, driver versions, and hardware specifications
- Works in both real hardware and demo modes
2. get_gpu_status
Provides comprehensive current status of a specific GPU.
- Temperature, power, utilization, memory, clock speeds, and fan data
- Includes overall health score (0-100)
- Parameters:
device_id
(string, default: "0")
3. get_gpu_performance
Analyzes GPU performance metrics and efficiency.
- Performance analysis with efficiency scoring
- Utilization patterns and bottleneck identification
- Parameters:
device_id
(string, default: "0")
4. analyze_gpu_memory
Detailed GPU memory usage analysis.
- Memory health assessment
- Usage patterns and recommendations
- Parameters:
device_id
(string, default: "0")
5. monitor_power_thermal
Monitors GPU power consumption and thermal status.
- Real-time power and temperature data
- Thermal warnings and power efficiency metrics
- Parameters:
device_id
(string, default: "0")
6. check_gpu_health
Comprehensive GPU health assessment with recommendations.
- Overall health status and scoring
- Issue detection and actionable recommendations
- Parameters:
device_id
(string, default: "0")
Example Usage
Once integrated with Claude Code, you can use natural language queries:
- "What GPUs are available in the system?"
- "Check the health of GPU 0"
- "Show me the performance metrics for all GPUs"
- "Is GPU 0 running too hot?"
- "Analyze memory usage patterns"
Architecture
The system consists of three main layers:
- AMD SMI Interface Layer (
AMDSMIManager
) - Abstracts AMD SMI Python API with robust error handling - Business Logic Layer (
HealthAnalyzer
,PerformanceInterpreter
) - Provides intelligent analysis and recommendations - MCP Server Layer (FastMCP-based) - Exposes functionality as conversational tools
Demo Mode
The server automatically falls back to demo mode when:
- No AMD GPUs are detected
- AMD SMI library is unavailable
- Hardware access fails
Demo mode provides realistic mock data for development and testing.
N/A Value Handling
The server gracefully handles missing or "N/A" values common in:
- Development environments
- Limited hardware access scenarios
- Partial metric availability
Missing values receive neutral health scores (80.0) and don't cause failures.
Development
Project Structure
mcp-amdsmi/
āāā src/amd_smi_mcp/
ā āāā server.py # FastMCP server with tool definitions
ā āāā amd_smi_wrapper.py # AMD SMI library abstraction
ā āāā business_logic.py # Health analysis and performance interpretation
āāā tests/ # Unit and integration tests
āāā test_monitoring.py # Comprehensive test script
āāā requirements.txt # Python dependencies
āāā README.md # This file
Running Tests
source venv/bin/activate
python test_monitoring.py # Comprehensive functionality test
pytest # Unit tests (if available)
Code Quality
source venv/bin/activate
black src/ # Code formatting
flake8 src/ # Linting
mypy src/ # Type checking
Workshop Integration
Designed for PEARC25 workshop demonstrations:
- 30-second response times for single GPU queries
- Support for 30 concurrent users
- Educational explanations and visual indicators
- Fallback modes for demonstration reliability
Troubleshooting
Common Issues
1. "amdsmi library not available"
- Install ROCm and AMD SMI library
- Server will automatically use demo mode if unavailable
2. "No AMD GPU devices found"
- Check GPU hardware installation
- Verify driver installation
- Server continues in demo mode
3. "Permission denied" errors
- Ensure user has GPU access permissions
- May require adding user to appropriate groups
4. Import errors in Claude Code
- Verify
cwd
andPYTHONPATH
in MCP configuration - Ensure virtual environment activation if using venv
Logging
Enable debug logging by setting environment variable:
export PYTHONPATH=/path/to/mcp-amdsmi
export LOG_LEVEL=DEBUG
License
[License information to be added]
Contributing
[Contributing guidelines to be added]