image_mcp

jettoblack/image_mcp

3.2

If you are the rightful owner of image_mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Image Summarization MCP Server is a Model Context Protocol server designed to process image files and generate detailed descriptions using an OpenAI-compatible chat completion endpoint.

Tools
1
Resources
0
Prompts
0

Image Summarization MCP Server

A Model Context Protocol (MCP) server that accepts image files and sends them to an OpenAI-compatible chat completion endpoint for analysis, description, and comparison tasks.

Use Case

Many LLMs used for agentic coding are text-only and lack support for image inputs. This tool allows you to use a secondary model dedicated to describing and analyzing images, without having to use a multi-modal LLM for your primary model. It supports both cloud and local LLMs via any server that supports the OpenAI chat completion endpoint (including llama.cpp / llama-swap, Ollama, open-webui, OpenRouter, etc).

For local models, gemma3:4b-it-qat works quite well with a relatively small footprint and fast performance (even on CPU-only).

Features

  • Accepts images via unified image_url parameter with multiple input formats
  • Supports custom_prompt to perform specific tasks other than just general description
  • Sends images to OpenAI-compatible chat completion endpoints
  • Returns detailed image descriptions
  • Configurable endpoint URL, API key, and model
  • Command-line interface for configuration
  • Comprehensive error handling

Quick install from NPM

Add this to your global mcp_settings.json or project mcp.json:

  "image_summarization": {
    "command": "npx",
    "args": [
      "-y",
      "@jettoblack/image_mcp",
      "--api-key",
      "key",
      "--base-url",
      "http://localhost:8080/v1",
      "--model",
      "gemma3:4b-it-qat"
    ]
  }

At a minimum, configure the base url, API key, and model to point to your choice of server.

For use with slow local models, you may need to also increase the timeout and max retries settings.

Configuration

The MCP server can be configured using environment variables or command-line arguments.

Environment Variables

  • OPENAI_API_KEY: Your API key for the OpenAI-compatible service
  • OPENAI_BASE_URL: The base URL of the OpenAI-compatible service (default: http://localhost:9292/v1)
  • OPENAI_MODEL: The model to use for image analysis
  • OPENAI_TIMEOUT: Request timeout in milliseconds (default: 60000). When running local models you may need to increase this.
  • OPENAI_MAX_RETRIES: Maximum number of retry attempts (default: 3)

Command Line Arguments

npx -y @jettoblack/image_mcp \
  --api-key your-api-key \
  --base-url https://api.openai.com/v1 \
  --model gpt-4-vision-preview \
  --timeout 60000 \
  --max-retries 5

Configuration Priority

  1. Command-line arguments
  2. Environment variables
  3. Default values

Usage

MCP Tools

The server provides two tools for image analysis:

summarize_image

Analyzes and describes a single image in detail.

Parameters
  • image_url (string): URL to the image file to analyze. Supports:
    • Absolute file paths
    • file:// URLs
    • HTTP/HTTPS URLs (will be downloaded and converted to base64)
    • Data URLs with base64 encoded image files
  • custom_prompt (string, optional): Custom prompt to use instead of the default image description prompt
Example Usage

Using file path:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "/path/to/your/image.jpg"
  }
}

Using file:// URL:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "file:///path/to/your/image.jpg"
  }
}

Using HTTP/HTTPS URL:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "https://example.com/image.jpg"
  }
}

Using data URL with base64:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg..."
  }
}

With custom prompt:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "/path/to/your/image.jpg",
    "custom_prompt": "What objects are visible in this image?"
  }
}
compare_images

Compares 2 or more images and describes their similarities and differences.

Parameters
  • image_urls (array of strings): Array of image URLs to compare (minimum 2 images required). Each URL supports:
    • Absolute file paths
    • file:// URLs
    • HTTP/HTTPS URLs (will be downloaded and converted to base64)
    • Data URLs with base64 encoded image files
  • custom_prompt (string, optional): Custom prompt to use instead of the default image comparison prompt
Example Usage

Comparing two images:

{
  "name": "compare_images",
  "arguments": {
    "image_urls": [
      "/path/to/image1.jpg",
      "/path/to/image2.jpg"
    ]
  }
}

Comparing multiple images with custom prompt:

{
  "name": "compare_images",
  "arguments": {
    "image_urls": [
      "https://example.com/image1.jpg",
      "https://example.com/image2.jpg"
    ],
    "custom_prompt": "Compare these UI screenshots and describe the differences in color themes."
  }
}

Dev Setup

  1. Clone the repository:
git clone https://github.com/jettoblack/image_mcp.git
cd image_mcp
  1. Install dependencies:
npm install
  1. Build the project:
npm run build
  1. Starting the Server
node build/index.js

The server will start and listen on stdio for MCP protocol communications.

MCP Tool Installation (local dev build)

Add this to your global mcp_settings.json or project mcp.json:

  "image_summarizer": {
    "command": "node",
    "args": [
      "/path/to/image_mcp/build/index.js",
      "--api-key",
      "key",
      "--base-url",
      "http://localhost:9292/v1",
      "--model",
      "gemma3:4b-it-qat"
    ]
  }

Testing

Running Tests

Run the test suite:

npm test

The test suite includes:

  • Unit tests for image processing functionality
  • Integration tests that require a mock server
  • Tests for both summarize_image and compare_images tools

Mock Server Testing

The project includes a mock OpenAI-compatible server for testing purposes.

  1. Start the mock server in a separate terminal:
node tests/mock-server.js

The mock server will start on http://localhost:9293 and provides endpoints for:

  • GET /v1/models - Lists available models
  • POST /v1/chat/completions - Mock chat completions with image support
  • POST /v1/test/image-process - Test endpoint for image processing validation
  1. Set environment variables for the mock server:
export OPENAI_BASE_URL=http://localhost:9293/v1
export OPENAI_API_KEY=test-key
export OPENAI_MODEL=test-model-vision
  1. Run the integration tests:
npm test tests/integration.test.ts

Real OpenAI-Compatible Server Testing

To test with a real OpenAI-compatible endpoint:

  1. Set up your environment variables:
export OPENAI_API_KEY=your-actual-api-key
export OPENAI_BASE_URL=https://api.openai.com/v1
export OPENAI_MODEL=gpt-4-vision-preview

Or for other OpenAI-compatible services:

export OPENAI_API_KEY=your-service-api-key
export OPENAI_BASE_URL=https://your-service-endpoint/v1
export OPENAI_MODEL=your-vision-model
  1. Start the MCP server:
node build/index.js
  1. Send test requests using an MCP client or test the tools directly.

Manual Testing

You can manually test the MCP server using tools like curl or MCP clients:

# Test with a local image file
curl -X POST http://localhost:8080/sse \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "summarize_image",
      "arguments": {
        "image_url": "/path/to/your/test/image.jpg"
      }
    }
  }'

API Reference

OpenAI-Compatible API Integration

The server sends requests to the OpenAI-compatible chat completion endpoint with the following structure:

{
  "model": "your-model",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image in detail, including all text."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,..."
          }
        }
      ]
    }
  ],
  "stream": false
}

Supported Image Formats

  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • GIF (.gif)
  • WebP (.webp)
  • SVG (.svg)
  • BMP (.bmp)
  • TIFF (.tiff)

Error Handling

The server includes comprehensive error handling for:

  • Invalid image files
  • Unsupported image formats
  • Missing API keys
  • Network connectivity issues
  • API response errors

Development

Project Structure

src/
├── config.ts          # Configuration management
├── image-processor.ts # Image processing utilities
├── index.ts          # Main MCP server
└── openai-client.ts  # OpenAI-compatible API client

Building

npm run build

Testing

npm test

License

This project is licensed under the MIT License.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

Support

For issues and questions, please open an issue on the GitHub repository.

Tips

Tips / donations always appreciated to help fund future development.