vllm-mcp

StanleyChanH/vllm-mcp

3.3

If you are the rightful owner of vllm-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

A Model Context Protocol (MCP) server that enables text models to call multimodal models, supporting both OpenAI and Dashscope multimodal models.

Tools
3
Resources
0
Prompts
0

VLLM MCP Server

MIT License Python 3.11+ uv

A Model Context Protocol (MCP) server that enables text models to call multimodal models. This server supports both OpenAI and Dashscope (Alibaba Cloud) multimodal models, allowing text-only models to process images and other media formats through standardized MCP tools.

GitHub Repository: https://github.com/StanleyChanH/vllm-mcp

Features

  • Multi-Provider Support: OpenAI GPT-4 Vision and Dashscope Qwen-VL models
  • Multiple Transport Options: STDIO, HTTP, and Server-Sent Events (SSE)
  • Flexible Deployment: Docker, Docker Compose, and local development
  • Easy Configuration: JSON configuration files and environment variables
  • Comprehensive Tooling: MCP tools for model interaction, validation, and provider management

Quick Start

Prerequisites

  • Python 3.11+
  • uv package manager
  • API keys for OpenAI and/or Dashscope (ι˜Ώι‡ŒδΊ‘)

Installation & Setup

  1. Clone the repository:

    git clone https://github.com/StanleyChanH/vllm-mcp.git
    cd vllm-mcp
    
  2. Set up environment:

    cp .env.example .env
    # Edit .env with your API keys
    nano .env  # or use your preferred editor
    
  3. Configure API keys (in .env file):

    # Dashscope (ι˜Ώι‡ŒδΊ‘) - Required for basic functionality
    DASHSCOPE_API_KEY=sk-your-dashscope-api-key
    
    # OpenAI - Optional
    OPENAI_API_KEY=sk-your-openai-api-key
    
  4. Install dependencies:

    uv sync
    
  5. Verify setup:

    uv run python test_simple.py
    

Running the Server

  1. Start the server (STDIO transport - default):

    ./scripts/start.sh
    
  2. Start with HTTP transport:

    ./scripts/start.sh --transport http --host 0.0.0.0 --port 8080
    
  3. Development mode with hot reload:

    ./scripts/start-dev.sh
    

Testing & Verification

  1. List available models:

    uv run python examples/list_models.py
    
  2. Run basic tests:

    uv run python test_simple.py
    
  3. Test MCP tools:

    uv run python examples/client_example.py
    

Docker Deployment

  1. Build and run with Docker Compose:

    # Create .env file with your API keys
    cp .env.example .env
    
    # Start the service
    docker-compose up -d
    
  2. Build manually:

    docker build -t vllm-mcp .
    docker run -p 8080:8080 --env-file .env vllm-mcp
    

Configuration

Environment Variables

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional
OPENAI_DEFAULT_MODEL=gpt-4o
OPENAI_SUPPORTED_MODELS=gpt-4o,gpt-4o-mini,gpt-4-turbo,gpt-4-vision-preview

# Dashscope Configuration
DASHSCOPE_API_KEY=your_dashscope_api_key
DASHSCOPE_DEFAULT_MODEL=qwen-vl-plus
DASHSCOPE_SUPPORTED_MODELS=qwen-vl-plus,qwen-vl-max,qwen-vl-chat,qwen2-vl-7b-instruct,qwen2-vl-72b-instruct

# Server Configuration (optional)
VLLM_MCP_HOST=localhost
VLLM_MCP_PORT=8080
VLLM_MCP_TRANSPORT=stdio
VLLM_MCP_LOG_LEVEL=INFO

Configuration File

Create a config.json file:

{
  "host": "localhost",
  "port": 8080,
  "transport": "stdio",
  "log_level": "INFO",
  "providers": [
    {
      "provider_type": "openai",
      "api_key": "${OPENAI_API_KEY}",
      "base_url": "${OPENAI_BASE_URL}",
      "default_model": "gpt-4o",
      "max_tokens": 4000,
      "temperature": 0.7
    },
    {
      "provider_type": "dashscope",
      "api_key": "${DASHSCOPE_API_KEY}",
      "default_model": "qwen-vl-plus",
      "max_tokens": 4000,
      "temperature": 0.7
    }
  ]
}

MCP Tools

The server provides the following MCP tools:

generate_multimodal_response

Generate responses from multimodal models.

Parameters:

  • model (string): Model name to use
  • prompt (string): Text prompt
  • image_urls (array, optional): List of image URLs
  • file_paths (array, optional): List of file paths
  • system_prompt (string, optional): System prompt
  • max_tokens (integer, optional): Maximum tokens to generate
  • temperature (number, optional): Generation temperature
  • provider (string, optional): Provider name (auto-detected if not specified)

Example:

result = await session.call_tool("generate_multimodal_response", {
    "model": "gpt-4o",
    "prompt": "Describe this image",
    "image_urls": ["https://example.com/image.jpg"],
    "max_tokens": 500
})

list_available_providers

List available model providers and their supported models.

Example:

result = await session.call_tool("list_available_providers", {})

validate_multimodal_request

Validate if a multimodal request is supported by the specified provider.

Parameters:

  • model (string): Model name to validate
  • image_count (integer, optional): Number of images
  • file_count (integer, optional): Number of files
  • provider (string, optional): Provider name

Supported Models

OpenAI

  • gpt-4o
  • gpt-4o-mini
  • gpt-4-turbo
  • gpt-4-vision-preview

Dashscope

  • qwen-vl-plus
  • qwen-vl-max
  • qwen-vl-chat
  • qwen2-vl-7b-instruct
  • qwen2-vl-72b-instruct

Model Selection

Using Environment Variables

You can configure default models and supported models through environment variables:

# OpenAI
OPENAI_DEFAULT_MODEL=gpt-4o
OPENAI_SUPPORTED_MODELS=gpt-4o,gpt-4o-mini,gpt-4-turbo

# Dashscope
DASHSCOPE_DEFAULT_MODEL=qwen-vl-plus
DASHSCOPE_SUPPORTED_MODELS=qwen-vl-plus,qwen-vl-max

Listing Available Models

Use the list_available_providers tool to see all available models:

result = await session.call_tool("list_available_providers", {})
print(result.content[0].text)

Model Selection Examples

# Use specific OpenAI model
result = await session.call_tool("generate_multimodal_response", {
    "model": "gpt-4o-mini",  # Specify exact model
    "prompt": "Analyze this image",
    "image_urls": ["https://example.com/image.jpg"]
})

# Use specific Dashscope model
result = await session.call_tool("generate_multimodal_response", {
    "model": "qwen-vl-max",  # Specify exact model
    "prompt": "Describe what you see",
    "image_urls": ["https://example.com/image.jpg"]
})

# Auto-detect provider based on model name
# OpenAI models (gpt-*) will use OpenAI provider
# Dashscope models (qwen-*) will use Dashscope provider

Model Configuration File

You can also configure models in config.json:

{
  "providers": [
    {
      "provider_type": "openai",
      "api_key": "${OPENAI_API_KEY}",
      "default_model": "gpt-4o-mini",
      "supported_models": ["gpt-4o-mini", "gpt-4-turbo"],
      "max_tokens": 4000,
      "temperature": 0.7
    },
    {
      "provider_type": "dashscope",
      "api_key": "${DASHSCOPE_API_KEY}",
      "default_model": "qwen-vl-max",
      "supported_models": ["qwen-vl-plus", "qwen-vl-max"],
      "max_tokens": 4000,
      "temperature": 0.7
    }
  ]
}

Client Integration

Python Client

import asyncio
from mcp.client.session import ClientSession
from mcp.client.stdio import StdioServerParameters, stdio_client

async def main():
    server_params = StdioServerParameters(
        command="uv",
        args=["run", "python", "-m", "vllm_mcp.server"],
        env={"PYTHONPATH": "src"}
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            # Generate multimodal response
            result = await session.call_tool("generate_multimodal_response", {
                "model": "gpt-4o",
                "prompt": "Analyze this image",
                "image_urls": ["https://example.com/image.jpg"]
            })

            print(result.content[0].text)

asyncio.run(main())

MCP Client Configuration

Add to your MCP client configuration:

{
  "mcpServers": {
    "vllm-mcp": {
      "command": "uv",
      "args": ["run", "python", "-m", "vllm_mcp.server"],
      "env": {
        "PYTHONPATH": "src",
        "OPENAI_API_KEY": "${OPENAI_API_KEY}",
        "DASHSCOPE_API_KEY": "${DASHSCOPE_API_KEY}"
      }
    }
  }
}

Development

Project Structure

vllm-mcp/
β”œβ”€β”€ src/vllm_mcp/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ server.py          # Main MCP server
β”‚   β”œβ”€β”€ models.py          # Data models
β”‚   └── providers/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ openai_provider.py
β”‚       └── dashscope_provider.py
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ start.sh           # Production startup
β”‚   └── start-dev.sh       # Development startup
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ client_example.py  # Example client
β”‚   └── mcp_client_config.json
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ config.json
└── README.md

Adding New Providers

  1. Create a new provider class in src/vllm_mcp/providers/
  2. Implement the required methods:
    • generate_response()
    • is_model_supported()
    • validate_request()
  3. Register the provider in src/vllm_mcp/server.py
  4. Update configuration schema

Running Tests

# Install development dependencies
uv add --dev pytest pytest-asyncio

# Run tests
uv run pytest

Deployment Options

STDIO Transport (Default)

Best for MCP client integrations and local development.

vllm-mcp --transport stdio

HTTP Transport

Suitable for web service deployments.

vllm-mcp --transport http --host 0.0.0.0 --port 8080

SSE Transport

For real-time streaming responses.

vllm-mcp --transport sse --host 0.0.0.0 --port 8080

Troubleshooting

Common Issues

  1. Import Error: No module named 'vllm_mcp'

    # Make sure you're in the project root and run:
    uv sync
    export PYTHONPATH="src:$PYTHONPATH"
    
  2. API Key Not Found

    # Ensure your .env file is properly configured:
    cp .env.example .env
    # Edit .env with your actual API keys
    
  3. Dashscope API Errors

    • Verify your API key is valid and active
    • Check if you have sufficient quota
    • Ensure network connectivity to Dashscope services
  4. Server Startup Issues

    # Check for port conflicts:
    lsof -i :8080
    
    # Try a different port:
    ./scripts/start.sh --port 8081
    
  5. Docker Issues

    # Rebuild Docker image:
    docker-compose down
    docker-compose build --no-cache
    docker-compose up -d
    

Debug Mode

Enable debug logging for troubleshooting:

./scripts/start.sh --log-level DEBUG

Getting Help

  • Check for detailed setup instructions
  • Run uv run python test_simple.py to verify basic functionality
  • Review logs for error messages and warnings

License

MIT License

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Support

Acknowledgments