huggingface-mcp-server by bui21x - MCP Server

The HuggingFace MCP Server is designed to facilitate seamless integration of HuggingFace models into AI agent workflows. It supports model inference with GPU acceleration, ensuring efficient processing of tasks such as text generation, classification, and more. The server is equipped with model caching capabilities to enhance performance by reducing redundant loading times. Users can configure custom parameters to tailor the model's behavior to specific needs. Additionally, the server includes health monitoring features, providing real-time insights into GPU status and overall server health. This makes it a robust solution for deploying AI models in production environments, where reliability and performance are critical.

Features

Model inference with GPU support
Model caching for improved performance
Multiple task support
Custom parameter configuration
Health monitoring with GPU status

Usages

local integration with stdio

python
mcp.run(transport='stdio')  # Tools defined via @mcp.tool() decorator

remote integration with sse

python
mcp.run(transport='sse', host="0.0.0.0", port=8000)  # Specify SSE endpoint

remote integration with streamable http

yaml
paths:
  /mcp:
    post:
      x-ms-agentic-protocol: mcp-streamable-1.0  # Copilot Studio integration

platform integration with github

{"command": "docker", "args": ["run", "-e", "GITHUB_PERSONAL_ACCESS_TOKEN", "ghcr.io/github/github-mcp-server"]}

development framework with fastmcp

python
from mcp.server import FastMCP
app = FastMCP('demo')
@app.tool()
async def query(): ...