StanleyChanH/vllm-mcp
If you are the rightful owner of vllm-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A Model Context Protocol (MCP) server that enables text models to call multimodal models, supporting both OpenAI and Dashscope multimodal models.
VLLM MCP Server
A Model Context Protocol (MCP) server that enables text models to call multimodal models. This server supports both OpenAI and Dashscope (Alibaba Cloud) multimodal models, allowing text-only models to process images and other media formats through standardized MCP tools.
GitHub Repository: https://github.com/StanleyChanH/vllm-mcp
Features
- Multi-Provider Support: OpenAI GPT-4 Vision and Dashscope Qwen-VL models
- Multiple Transport Options: STDIO, HTTP, and Server-Sent Events (SSE)
- Flexible Deployment: Docker, Docker Compose, and local development
- Easy Configuration: JSON configuration files and environment variables
- Comprehensive Tooling: MCP tools for model interaction, validation, and provider management
Quick Start
Prerequisites
- Python 3.11+
- uv package manager
- API keys for OpenAI and/or Dashscope (ιΏιδΊ)
Installation & Setup
-
Clone the repository:
git clone https://github.com/StanleyChanH/vllm-mcp.git cd vllm-mcp
-
Set up environment:
cp .env.example .env # Edit .env with your API keys nano .env # or use your preferred editor
-
Configure API keys (in
.env
file):# Dashscope (ιΏιδΊ) - Required for basic functionality DASHSCOPE_API_KEY=sk-your-dashscope-api-key # OpenAI - Optional OPENAI_API_KEY=sk-your-openai-api-key
-
Install dependencies:
uv sync
-
Verify setup:
uv run python test_simple.py
Running the Server
-
Start the server (STDIO transport - default):
./scripts/start.sh
-
Start with HTTP transport:
./scripts/start.sh --transport http --host 0.0.0.0 --port 8080
-
Development mode with hot reload:
./scripts/start-dev.sh
Testing & Verification
-
List available models:
uv run python examples/list_models.py
-
Run basic tests:
uv run python test_simple.py
-
Test MCP tools:
uv run python examples/client_example.py
Docker Deployment
-
Build and run with Docker Compose:
# Create .env file with your API keys cp .env.example .env # Start the service docker-compose up -d
-
Build manually:
docker build -t vllm-mcp . docker run -p 8080:8080 --env-file .env vllm-mcp
Configuration
Environment Variables
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=https://api.openai.com/v1 # Optional
OPENAI_DEFAULT_MODEL=gpt-4o
OPENAI_SUPPORTED_MODELS=gpt-4o,gpt-4o-mini,gpt-4-turbo,gpt-4-vision-preview
# Dashscope Configuration
DASHSCOPE_API_KEY=your_dashscope_api_key
DASHSCOPE_DEFAULT_MODEL=qwen-vl-plus
DASHSCOPE_SUPPORTED_MODELS=qwen-vl-plus,qwen-vl-max,qwen-vl-chat,qwen2-vl-7b-instruct,qwen2-vl-72b-instruct
# Server Configuration (optional)
VLLM_MCP_HOST=localhost
VLLM_MCP_PORT=8080
VLLM_MCP_TRANSPORT=stdio
VLLM_MCP_LOG_LEVEL=INFO
Configuration File
Create a config.json
file:
{
"host": "localhost",
"port": 8080,
"transport": "stdio",
"log_level": "INFO",
"providers": [
{
"provider_type": "openai",
"api_key": "${OPENAI_API_KEY}",
"base_url": "${OPENAI_BASE_URL}",
"default_model": "gpt-4o",
"max_tokens": 4000,
"temperature": 0.7
},
{
"provider_type": "dashscope",
"api_key": "${DASHSCOPE_API_KEY}",
"default_model": "qwen-vl-plus",
"max_tokens": 4000,
"temperature": 0.7
}
]
}
MCP Tools
The server provides the following MCP tools:
generate_multimodal_response
Generate responses from multimodal models.
Parameters:
model
(string): Model name to useprompt
(string): Text promptimage_urls
(array, optional): List of image URLsfile_paths
(array, optional): List of file pathssystem_prompt
(string, optional): System promptmax_tokens
(integer, optional): Maximum tokens to generatetemperature
(number, optional): Generation temperatureprovider
(string, optional): Provider name (auto-detected if not specified)
Example:
result = await session.call_tool("generate_multimodal_response", {
"model": "gpt-4o",
"prompt": "Describe this image",
"image_urls": ["https://example.com/image.jpg"],
"max_tokens": 500
})
list_available_providers
List available model providers and their supported models.
Example:
result = await session.call_tool("list_available_providers", {})
validate_multimodal_request
Validate if a multimodal request is supported by the specified provider.
Parameters:
model
(string): Model name to validateimage_count
(integer, optional): Number of imagesfile_count
(integer, optional): Number of filesprovider
(string, optional): Provider name
Supported Models
OpenAI
gpt-4o
gpt-4o-mini
gpt-4-turbo
gpt-4-vision-preview
Dashscope
qwen-vl-plus
qwen-vl-max
qwen-vl-chat
qwen2-vl-7b-instruct
qwen2-vl-72b-instruct
Model Selection
Using Environment Variables
You can configure default models and supported models through environment variables:
# OpenAI
OPENAI_DEFAULT_MODEL=gpt-4o
OPENAI_SUPPORTED_MODELS=gpt-4o,gpt-4o-mini,gpt-4-turbo
# Dashscope
DASHSCOPE_DEFAULT_MODEL=qwen-vl-plus
DASHSCOPE_SUPPORTED_MODELS=qwen-vl-plus,qwen-vl-max
Listing Available Models
Use the list_available_providers
tool to see all available models:
result = await session.call_tool("list_available_providers", {})
print(result.content[0].text)
Model Selection Examples
# Use specific OpenAI model
result = await session.call_tool("generate_multimodal_response", {
"model": "gpt-4o-mini", # Specify exact model
"prompt": "Analyze this image",
"image_urls": ["https://example.com/image.jpg"]
})
# Use specific Dashscope model
result = await session.call_tool("generate_multimodal_response", {
"model": "qwen-vl-max", # Specify exact model
"prompt": "Describe what you see",
"image_urls": ["https://example.com/image.jpg"]
})
# Auto-detect provider based on model name
# OpenAI models (gpt-*) will use OpenAI provider
# Dashscope models (qwen-*) will use Dashscope provider
Model Configuration File
You can also configure models in config.json
:
{
"providers": [
{
"provider_type": "openai",
"api_key": "${OPENAI_API_KEY}",
"default_model": "gpt-4o-mini",
"supported_models": ["gpt-4o-mini", "gpt-4-turbo"],
"max_tokens": 4000,
"temperature": 0.7
},
{
"provider_type": "dashscope",
"api_key": "${DASHSCOPE_API_KEY}",
"default_model": "qwen-vl-max",
"supported_models": ["qwen-vl-plus", "qwen-vl-max"],
"max_tokens": 4000,
"temperature": 0.7
}
]
}
Client Integration
Python Client
import asyncio
from mcp.client.session import ClientSession
from mcp.client.stdio import StdioServerParameters, stdio_client
async def main():
server_params = StdioServerParameters(
command="uv",
args=["run", "python", "-m", "vllm_mcp.server"],
env={"PYTHONPATH": "src"}
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# Generate multimodal response
result = await session.call_tool("generate_multimodal_response", {
"model": "gpt-4o",
"prompt": "Analyze this image",
"image_urls": ["https://example.com/image.jpg"]
})
print(result.content[0].text)
asyncio.run(main())
MCP Client Configuration
Add to your MCP client configuration:
{
"mcpServers": {
"vllm-mcp": {
"command": "uv",
"args": ["run", "python", "-m", "vllm_mcp.server"],
"env": {
"PYTHONPATH": "src",
"OPENAI_API_KEY": "${OPENAI_API_KEY}",
"DASHSCOPE_API_KEY": "${DASHSCOPE_API_KEY}"
}
}
}
}
Development
Project Structure
vllm-mcp/
βββ src/vllm_mcp/
β βββ __init__.py
β βββ server.py # Main MCP server
β βββ models.py # Data models
β βββ providers/
β βββ __init__.py
β βββ openai_provider.py
β βββ dashscope_provider.py
βββ scripts/
β βββ start.sh # Production startup
β βββ start-dev.sh # Development startup
βββ examples/
β βββ client_example.py # Example client
β βββ mcp_client_config.json
βββ docker-compose.yml
βββ Dockerfile
βββ config.json
βββ README.md
Adding New Providers
- Create a new provider class in
src/vllm_mcp/providers/
- Implement the required methods:
generate_response()
is_model_supported()
validate_request()
- Register the provider in
src/vllm_mcp/server.py
- Update configuration schema
Running Tests
# Install development dependencies
uv add --dev pytest pytest-asyncio
# Run tests
uv run pytest
Deployment Options
STDIO Transport (Default)
Best for MCP client integrations and local development.
vllm-mcp --transport stdio
HTTP Transport
Suitable for web service deployments.
vllm-mcp --transport http --host 0.0.0.0 --port 8080
SSE Transport
For real-time streaming responses.
vllm-mcp --transport sse --host 0.0.0.0 --port 8080
Troubleshooting
Common Issues
-
Import Error: No module named 'vllm_mcp'
# Make sure you're in the project root and run: uv sync export PYTHONPATH="src:$PYTHONPATH"
-
API Key Not Found
# Ensure your .env file is properly configured: cp .env.example .env # Edit .env with your actual API keys
-
Dashscope API Errors
- Verify your API key is valid and active
- Check if you have sufficient quota
- Ensure network connectivity to Dashscope services
-
Server Startup Issues
# Check for port conflicts: lsof -i :8080 # Try a different port: ./scripts/start.sh --port 8081
-
Docker Issues
# Rebuild Docker image: docker-compose down docker-compose build --no-cache docker-compose up -d
Debug Mode
Enable debug logging for troubleshooting:
./scripts/start.sh --log-level DEBUG
Getting Help
- Check for detailed setup instructions
- Run
uv run python test_simple.py
to verify basic functionality - Review logs for error messages and warnings
License
MIT License
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
Support
- Issues: GitHub Issues
- Documentation: Wiki