ollama-vision-mcp

ollama-vision-mcp

3.2

If you are the rightful owner of ollama-vision-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Ollama Vision MCP Server provides local computer vision capabilities using Ollama's models, ensuring privacy and cost-efficiency.

The Ollama Vision MCP Server is a robust solution for local image analysis, leveraging advanced vision models from Ollama. It allows AI assistants and development tools to perform image processing tasks directly on the user's machine, eliminating the need for cloud-based APIs and associated costs. This server supports multiple vision models, including llava-phi3, llava:7b, llava:13b, and bakllava, offering flexibility in terms of performance and resource requirements. With a focus on privacy, the server ensures that images are processed locally, keeping sensitive data secure. The server is cross-platform, compatible with Windows, macOS, and Linux, and supports various input formats such as local files, URLs, and base64 encoded images. It is designed to integrate seamlessly with tools like Claude Desktop and Cursor IDE, providing a comprehensive suite of image analysis tools.

Features

  • Local Processing: All image analysis is performed on the user's machine, ensuring privacy and eliminating cloud API costs.
  • Multiple Vision Models: Supports a range of models including llava-phi3, llava:7b, llava:13b, and bakllava for diverse use cases.
  • Comprehensive Tools: Offers tools for image analysis, description, object detection, and text extraction.
  • Flexible Input: Accepts local files, URLs, and base64 encoded images for analysis.
  • Cross-Platform: Compatible with Windows, macOS, and Linux, ensuring broad accessibility.

Tools

  1. analyze_image

    Custom image analysis with optional prompts.

  2. describe_image

    Provides detailed image descriptions.

  3. identify_objects

    Detects and lists objects within an image.

  4. read_text

    Extracts text from images, similar to OCR.