ollama-vision-mcp by xkiranj - MCP Server

The Ollama Vision MCP Server is a robust solution for local image analysis, leveraging advanced vision models from Ollama. It allows AI assistants and development tools to perform image processing tasks directly on the user's machine, eliminating the need for cloud-based APIs and associated costs. This server supports multiple vision models, including llava-phi3, llava:7b, llava:13b, and bakllava, offering flexibility in terms of performance and resource requirements. With a focus on privacy, the server ensures that images are processed locally, keeping sensitive data secure. The server is cross-platform, compatible with Windows, macOS, and Linux, and supports various input formats such as local files, URLs, and base64 encoded images. It is designed to integrate seamlessly with tools like Claude Desktop and Cursor IDE, providing a comprehensive suite of image analysis tools.

Features

Local Processing: All image analysis is performed on the user's machine, ensuring privacy and eliminating cloud API costs.
Multiple Vision Models: Supports a range of models including llava-phi3, llava:7b, llava:13b, and bakllava for diverse use cases.
Comprehensive Tools: Offers tools for image analysis, description, object detection, and text extraction.
Flexible Input: Accepts local files, URLs, and base64 encoded images for analysis.
Cross-Platform: Compatible with Windows, macOS, and Linux, ensuring broad accessibility.

Tools

analyze_image
Custom image analysis with optional prompts.
describe_image
Provides detailed image descriptions.
identify_objects
Detects and lists objects within an image.
read_text
Extracts text from images, similar to OCR.

ollama-vision-mcp

Features

Tools

analyze_image

describe_image

identify_objects

read_text