ViperMCP

ryansherby/ViperMCP

3.2

If you are the rightful owner of ViperMCP and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

ViperMCP is a visual question-answering server designed to handle tasks like visual grounding, compositional image question answering, and external knowledge-dependent image question answering.

🚀 ViperMCP: A Model Context Protocol for Viper Server

Mixture-of-Experts VQA, streaming-ready, and MCP-native.

Made with FastMCP ViperGPT Inspired GPU Ready License

ViperMCP is a mixture-of-experts (MoE) visual question‑answering (VQA) server that exposes streamable MCP tools for:

  • 🔎 Visual grounding
  • 🧩 Compositional image QA
  • 🌐 External knowledge‑dependent image QA

It’s built on the shoulders of 🐍 ViperGPT and delivered as a FastMCP HTTP server, so it works with all FastMCP client tooling.


✨ Highlights

  • MCP-native JSON‑RPC 2.0 endpoint (/mcp/) with streaming
  • 🧠 MoE routing across classic and modern VLMs/LLMs
  • 🧰 Two tools out of the box: viper_query (text) & viper_task (crops/masks)
  • 🐳 One‑command Docker or pure‑Python install
  • 🔐 Secure key handling via env var or secret mount

⚙️ Setup

🔑 OpenAI API Key

An OpenAI API key is required. Provide it via one of the following:

  • OPENAI_API_KEY (environment variable)
  • OPENAI_API_KEY_PATH (path to a file containing the key)
  • ?apiKey=... HTTP query parameter (for quick local testing)

🌐 Ngrok (Optional)

Use ngrok to expose your local server:

pip install ngrok
ngrok http 8000

Use the ngrok URL anywhere you see http://0.0.0.0:8000 below.


🛠️ Installation

🐳 Option A: Dockerized FastMCP Server (GPU‑ready)

  1. Save your key to api.key, then run:
docker run -i --rm \
  --mount type=bind,source=/path/to/api.key,target=/run/secrets/openai_api.key,readonly \
  -e OPENAI_API_KEY_PATH=/run/secrets/openai_api.key \
  -p 8000:8000 \
  rsherby/vipermcp:latest

This starts a CUDA‑enabled container serving MCP at:

http://0.0.0.0:8000/mcp/

💡 Prefer building from source? Use the included docker-compose.yaml. By default it reads api.key from the project root. If your platform injects env vars, you can also set OPENAI_API_KEY directly.


🐍 Option B: Pure FastMCP Server (dev‑friendly)

git clone --recurse-submodules https://github.com/ryansherby/ViperMCP.git
cd ViperMCP
bash download-models.sh

# Store your key for local dev
echo YOUR_OPENAI_API_KEY > api.key

# (recommended) activate a virtualenv / conda env
pip install -r requirements.txt
pip install -e .

# run the server
python run_server.py

Your server should be live at:

http://0.0.0.0:8000/mcp/

To use OpenAI‑backed models via query param:

http://0.0.0.0:8000/mcp?apiKey=sk-proj-XXXXXXXXXXXXXXXXXXXX

🧪 Usage

🤝 FastMCP Client Example

Pass images as base64 (shown) or as URLs:

image_path='./your_image.png'
img_byte_arr = io.BytesIO()
image.save(img_byte_arr, format='PNG')
img_byte_arr.seek(0)
image_bytes = img_byte_arr.read()
img_b64_string = base64.b64encode(image_bytes).decode('utf-8')

async with client:
    await client.ping()

    tools = await client.list_tools()  # optional

    query = await client.call_tool(
        "viper_query",
        {"query": "how many muffins can each kid have for it to be fair?"},
        {"image": f"data:image/png;base64,{img_b64_string}"},
    )

    task = await client.call_tool(
        "viper_task",
        {"task": "return a mask of all the people in the image"},
        {"image": f"data:image/png;base64,{img_b64_string}"},
    )

🧵 OpenAI API (MCP Integration)

The OpenAI MCP integration currently accepts image URLs (not raw base64). Send the URL as type: "input_text".

response = client.responses.create(
    model="gpt-4o",
    tools=[
        {
            "type": "mcp",
            "server_label": "ViperMCP",
            "server_url": f"{server_url}/mcp/",
            "require_approval": "never",
        },
    ],
    input=[
        {"role": "system", "content": "Forward any queries or tasks relating to an image directly to the ViperMCP server."},
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "based on this image, how many muffins can each kid have for it to be fair?"},
                {"type": "input_text", "text": img_url},
            ],
        },
    ],
)

🌐 Endpoints

🔓 HTTP GET Endpoints

GET /health      => 'OK' (200)
GET /device      => {"device": "cuda"|"mps"|"cpu"}
GET /mcp?apiKey= => 'Query parameters set successfully.'

🧠 MCP Client Endpoints (JSON‑RPC 2.0)

POST /mcp/

🔨 MCP Client Functions

viper_query(query, image) -> str
# Returns a text answer to your query.

viper_task(task, image) -> list[Image]
# Returns a list of images (e.g., masks) satisfying the task.

🧩 Models (Default MoE Pool)

  • 🐊 Grounding DINO
  • ✂️ Segment Anything (SAM)
  • 🤖 GPT‑4o‑mini (LLM)
  • 👀 GPT‑4o‑mini (VLM)
  • 🧠 GPT‑4.1
  • 🔭 X‑VLM
  • 🌊 MiDaS (depth)
  • 🐝 BERT

🧭 The MoE router picks from these based on the tool & prompt.


⚠️ Security & Production Notes

This package may generate and execute code on the host. We include basic injection guards, but you must harden for production. A recommended architecture separates concerns:

MCP Server (Query + Image)
  => Client Server (Generate Code Request)
    => Backend Server (Generates Code)
      => Client Server (Executes Wrapper Functions)
        => Backend Server (Executes Underlying Functions)
          => Client Server (Return Result)
            => MCP Server (Respond)
  • 🧱 Isolate codegen & execution.
  • 🔒 Lock down secrets & file access.
  • 🧪 Add unit/integration tests around wrappers.

📚 Citations

Huge thanks to the ViperGPT team:

@article{surismenon2023vipergpt,
    title={ViperGPT: Visual Inference via Python Execution for Reasoning},
    author={D'idac Sur'is and Sachit Menon and Carl Vondrick},
    journal={arXiv preprint arXiv:2303.08128},
    year={2023}
}

🤝 Contributions

PRs welcome! Please:

  1. ✅ Ensure all tests in /tests pass
  2. 🧪 Add coverage for new features
  3. 📦 Keep docs & examples up to date

🧭 Quick Commands Cheat‑Sheet

# Run with Docker (mount key file)
docker run -i --rm \
  --mount type=bind,source=$(pwd)/api.key,target=/run/secrets/openai_api.key,readonly \
  -e OPENAI_API_KEY_PATH=/run/secrets/openai_api.key \
  -p 8000:8000 rsherby/vipermcp:latest

# From source (after setup)
python run_server.py

# Hit health
curl http://0.0.0.0:8000/health

# List device
curl http://0.0.0.0:8000/device

# Use query param key (local only)
curl "http://0.0.0.0:8000/mcp?apiKey=sk-proj-XXXX..."

💬 Questions?

Open an issue or start a discussion. We ❤️ feedback and ambitious ideas!