DINO-X-MCP

IDEA-Research/DINO-X-MCP

3.5

If you are the rightful owner of DINO-X-MCP and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

DINO-X MCP enables large language models to perform fine-grained object detection and image understanding, powered by DINO-X and Grounding DINO 1.6 API.

Tools
4
Resources
0
Prompts
0

DINO-X MCP

License npm version npm downloads PRs Welcome MCP Badge GitHub stars

English |

DINO-X Official MCP Server — powered by the DINO-X and Grounding DINO models — brings fine-grained object detection and image understanding to your multimodal applications.

Why DINO-X MCP?

With DINO-X MCP, you can:

  • Fine-Grained Understanding: Full image detection, object detection, and region-level descriptions.

  • Structured Outputs: Get object categories, counts, locations, and attributes for VQA and multi-step reasoning tasks.

  • Composable: Works seamlessly with other MCP servers to build end-to-end visual agents or automation pipelines.

Transport Modes

DINO-X MCP supports two transport modes:

FeatureSTDIO (default)Streamable HTTP
RuntimeLocalLocal or Cloud
TransportStandard I/OHTTP (streaming responses)
Input sourcefile:// and https://https:// only
VisualizationSupported (saves annotated images locally)Not supported (for now)

Quick Start

1. Prepare an MCP client

Any MCP-compatible client works, e.g.:

2. Get your API key

Apply on the DINO-X platform: Request API Key (new users get free quota).

3. Configure MCP

Option A: Official Hosted Streamable HTTP (Recommended)

Add to your MCP client config and replace with your API key:

{
  "mcpServers": {
    "dinox-mcp": {
      "url": "https://mcp.deepdataspace.com/mcp?key=your-api-key"
    }
  }
}
Option B: Use the NPM package locally (STDIO)

Install Node.js first

  • Download the installer from nodejs.org

  • Or use command:

# macOS / Linux
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# or
wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash

# load nvm into current shell (choose the one you use)
source ~/.bashrc || true
source ~/.zshrc  || true

# install and use LTS Node.js
nvm install --lts
nvm use --lts

# Windows (one of the following)
winget install OpenJS.NodeJS.LTS
# or with Chocolatey (in admin PowerShell)
iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex
choco install nodejs-lts -y

Configure your MCP client:

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "npx",
      "args": ["-y", "@deepdataspace/dinox-mcp"],
      "env": {
        "DINOX_API_KEY": "your-api-key-here",
        "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
      }
    }
  }
}

Note: Replace your-api-key-here with your real key.

Option C: Run from source locally

Make sure Node.js is installed (see Option B), then:

# clone
git clone https://github.com/IDEA-Research/DINO-X-MCP.git
cd DINO-X-MCP

# install deps
npm install

# build
npm run build

Configure your MCP client:

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "node",
      "args": ["/path/to/DINO-X-MCP/build/index.js"],
      "env": {
        "DINOX_API_KEY": "your-api-key-here",
        "IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
      }
    }
  }
}

CLI Flags & Environment Variables

  • Common flags

    • --http: start in Streamable HTTP mode (otherwise STDIO by default)
    • --stdio: force STDIO mode
    • --dinox-api-key=...: set API key
    • --enable-client-key: allow API key via URL ?key= (Streamable HTTP only)
    • --port=8080: HTTP port (default 3020)
  • Environment variables

    • DINOX_API_KEY (required/conditionally required): DINO-X platform API key
    • IMAGE_STORAGE_DIRECTORY (optional, STDIO): directory to save annotated images
    • AUTH_TOKEN (optional, HTTP): if set, client must send Authorization: Bearer <token>

    Examples:

# STDIO (local)
node build/index.js --dinox-api-key=your-api-key

# Streamable HTTP (server provides a shared API key)
node build/index.js --http --dinox-api-key=your-api-key

# Streamable HTTP (custom port)
node build/index.js --http --dinox-api-key=your-api-key --port=8080

# Streamable HTTP (require client-provided API key via URL)
node build/index.js --http --enable-client-key

Client config when using ?key=:

{
  "mcpServers": {
    "dinox-mcp": {
      "url": "http://localhost:3020/mcp?key=your-api-key"
    }
  }
}

Using AUTH_TOKEN with a gateway that injects Authorization: Bearer <token>:

AUTH_TOKEN=my-token node build/index.js --http --enable-client-key

Client example with supergateway:

{
  "mcpServers": {
    "dinox-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "supergateway",
        "--streamableHttp",
        "http://localhost:3020/mcp?key=your-api-key",
        "--oauth2Bearer",
        "my-token"
      ]
    }
  }
}

Tools

CapabilityTool IDTransportInputOutput
Full-scene object detectiondetect-all-objectsSTDIO / HTTPImage URLCategory + bbox + (optional) captions
Text-prompted object detectiondetect-objects-by-textSTDIO / HTTPImage URL + English nouns (dot-separated for multiple, e.g., person.car)Target object bbox + (optional) captions
Human pose estimationdetect-human-pose-keypointsSTDIO / HTTPImage URL17 keypoints + bbox + (optional) captions
Visualizationvisualize-detection-resultSTDIO onlyImage URL + detection results arrayLocal path to annotated image

🎬 Use Cases

🎯 Scenario📝 Input✨ Output
Detection & Localization💬 Prompt:
Detect and visualize the
fire areas in the forest

🖼️ Input Image:
1-1
1-2
Object Counting💬 Prompt:
Please analyze this
warehouse image, detect
all the cardboard boxes,
count the total number

🖼️ Input Image:
2-1
Feature Detection💬 Prompt:
Find all red cars
in the image

🖼️ Input Image:
4-1
4-2
Attribute Reasoning💬 Prompt:
Find the tallest person
in the image, describe
their clothing

🖼️ Input Image:
5-1
5-2
Full Scene Detection💬 Prompt:
Find the fruit with
the highest vitamin C
content in the image

🖼️ Input Image:
6-1
6-3

Answer: Kiwi fruit (93mg/100g)
Pose Analysis💬 Prompt:
Please analyze what
yoga pose this is

🖼️ Input Image:
3-1
3-3

FAQ

  • Supported image sources?
    • STDIO: file:// and https://
    • Streamable HTTP: https:// only
  • Supported image formats?
    • jpg, jpeg, webp, png

Development & Debugging

Use watch mode to auto-rebuild during development:

npm run watch

Use MCP Inspector for debugging:

npm run inspector

License

Apache License 2.0