landing-ai/vision-agent-mcp
If you are the rightful owner of vision-agent-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
VisionAgent MCP Server is a lightweight, side-car server that facilitates communication between MCP-compatible clients and Landing AI’s VisionAgent REST APIs, enabling natural-language computer-vision and document-analysis commands.
The VisionAgent MCP Server is designed to bridge the gap between modern LLM agents and external tools through the Model Context Protocol (MCP). It operates as a local server, translating tool calls from MCP-compatible clients into authenticated HTTPS requests to Landing AI’s VisionAgent REST APIs. This setup allows users to issue natural-language commands for computer vision and document analysis directly from their editors without the need for custom REST code or additional SDKs. The server supports a variety of use cases, including document analysis, object detection, instance segmentation, activity recognition, and depth estimation. By running locally on STDIN/STDOUT, it ensures that all operations are performed securely and efficiently, with outputs such as JSON responses and images being streamed back to the client for further processing or visualization.
Features
- Supports natural-language commands for computer vision and document analysis.
- Operates as a local server, ensuring secure and efficient processing.
- Translates MCP-compatible client calls into authenticated HTTPS requests.
- Streams JSON responses and images back to the client for visualization.
- Facilitates integration with various MCP-compatible clients like Claude Desktop and Cursor.
Usages
npx with VS Code
{ "mcpServers": { "VisionAgent": { "command": "npx", "args": ["vision-tools-mcp"], "env": { "VISION_AGENT_API_KEY": "<YOUR_API_KEY>", "OUTPUT_DIRECTORY": "/path/to/output/directory", "IMAGE_DISPLAY_ENABLED": "true" } } } }
node with VS Code
{ "mcpServers": { "VisionAgent": { "command": "node", "args": [ "/path/to/build/index.js" ], "env": { "VISION_AGENT_API_KEY": "<YOUR_API_KEY>", "OUTPUT_DIRECTORY": "../../output", "IMAGE_DISPLAY_ENABLED": "true" } } } }
Tools
agentic-document-analysis
Parse PDFs/images to extract text, tables, charts, and diagrams.
text-to-object-detection
Detect objects using free-form prompts and outputs bounding boxes.
text-to-instance-segmentation
Provides pixel-perfect masks for images.
activity-recognition
Recognizes multiple activities in video with start/end timestamps.
depth-pro
High-resolution monocular depth estimation for single images.