landing-ai/vision-agent-mcp

3.5

If you are the rightful owner of vision-agent-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

VisionAgent MCP Server is a lightweight, side-car server that facilitates communication between MCP-compatible clients and Landing AI’s VisionAgent REST APIs, enabling natural-language computer-vision and document-analysis commands.

Tools

Functions exposed to the LLM to take actions

agentic-document-analysis

Parse PDFs/images to extract text, tables, charts, and diagrams.

text-to-object-detection

Detect objects using free-form prompts and outputs bounding boxes.

text-to-instance-segmentation

Provides pixel-perfect masks for images.

activity-recognition

Recognizes multiple activities in video with start/end timestamps.

depth-pro

High-resolution monocular depth estimation for single images.

Prompts

Interactive templates invoked by user choice

No prompts

Resources

Contextual data attached and managed by the client

No resources

Author

landing-ai

Claim Ownership

Verify you have write access to the repository

Version

v0.1.17

Repository

https://github.com/landing-ai/vision-agent-mcp

GitHub Stars

Last publish date

2025-06-06

Last update date

2025-12-17

Server configs

via npx in vs code

{
  "mcpServers": {
    "VisionAgent": {
      "command": "npx",
      "args": ["vision-tools-mcp"],
      "env": {
        "VISION_AGENT_API_KEY": "<YOUR_API_KEY>",
        "OUTPUT_DIRECTORY": "/path/to/output/directory",
        "IMAGE_DISPLAY_ENABLED": "true"
      }
    }
  }
}

in vs code

{
  "mcpServers": {
    "VisionAgent": {
      "command": "node",
      "args": [
        "/path/to/build/index.js"
      ],
      "env": {
        "VISION_AGENT_API_KEY": "<YOUR_API_KEY>",
        "OUTPUT_DIRECTORY": "../../output",
        "IMAGE_DISPLAY_ENABLED": "true"
      }
    }
  }
}