image_mcp by jettoblack - MCP Server

Image Summarization MCP Server

A Model Context Protocol (MCP) server that accepts image files and sends them to an OpenAI-compatible chat completion endpoint for analysis, description, and comparison tasks.

Use Case

Many LLMs used for agentic coding are text-only and lack support for image inputs. This tool allows you to use a secondary model dedicated to describing and analyzing images, without having to use a multi-modal LLM for your primary model. It supports both cloud and local LLMs via any server that supports the OpenAI chat completion endpoint (including llama.cpp / llama-swap, Ollama, open-webui, OpenRouter, etc).

For local models, gemma3:4b-it-qat works quite well with a relatively small footprint and fast performance (even on CPU-only).

Features

Accepts images via unified image_url parameter with multiple input formats
Supports custom_prompt to perform specific tasks other than just general description
Sends images to OpenAI-compatible chat completion endpoints
Returns detailed image descriptions
Configurable endpoint URL, API key, and model
Command-line interface for configuration
Comprehensive error handling

Quick install from NPM

Add this to your global mcp_settings.json or project mcp.json:

  "image_summarization": {
    "command": "npx",
    "args": [
      "-y",
      "@jettoblack/image_mcp",
      "--api-key",
      "key",
      "--base-url",
      "http://localhost:8080/v1",
      "--model",
      "gemma3:4b-it-qat"
    ]
  }

At a minimum, configure the base url, API key, and model to point to your choice of server.

For use with slow local models, you may need to also increase the timeout and max retries settings.

Configuration

The MCP server can be configured using environment variables or command-line arguments.

Environment Variables

OPENAI_API_KEY: Your API key for the OpenAI-compatible service
OPENAI_BASE_URL: The base URL of the OpenAI-compatible service (default: http://localhost:9292/v1)
OPENAI_MODEL: The model to use for image analysis
OPENAI_TIMEOUT: Request timeout in milliseconds (default: 60000). When running local models you may need to increase this.
OPENAI_MAX_RETRIES: Maximum number of retry attempts (default: 3)

Command Line Arguments

npx -y @jettoblack/image_mcp \
  --api-key your-api-key \
  --base-url https://api.openai.com/v1 \
  --model gpt-4-vision-preview \
  --timeout 60000 \
  --max-retries 5

Configuration Priority

Command-line arguments
Environment variables
Default values

Usage

MCP Tools

The server provides two tools for image analysis:

`summarize_image`

Analyzes and describes a single image in detail.

Parameters

image_url (string): URL to the image file to analyze. Supports:
- Absolute file paths
- file:// URLs
- HTTP/HTTPS URLs (will be downloaded and converted to base64)
- Data URLs with base64 encoded image files
custom_prompt (string, optional): Custom prompt to use instead of the default image description prompt

Example Usage

Using file path:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "/path/to/your/image.jpg"
  }
}

Using file:// URL:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "file:///path/to/your/image.jpg"
  }
}

Using HTTP/HTTPS URL:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "https://example.com/image.jpg"
  }
}

Using data URL with base64:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg..."
  }
}

With custom prompt:

{
  "name": "summarize_image",
  "arguments": {
    "image_url": "/path/to/your/image.jpg",
    "custom_prompt": "What objects are visible in this image?"
  }
}

`compare_images`

Compares 2 or more images and describes their similarities and differences.

Parameters

image_urls (array of strings): Array of image URLs to compare (minimum 2 images required). Each URL supports:
- Absolute file paths
- file:// URLs
- HTTP/HTTPS URLs (will be downloaded and converted to base64)
- Data URLs with base64 encoded image files
custom_prompt (string, optional): Custom prompt to use instead of the default image comparison prompt

Example Usage

Comparing two images:

{
  "name": "compare_images",
  "arguments": {
    "image_urls": [
      "/path/to/image1.jpg",
      "/path/to/image2.jpg"
    ]
  }
}

Comparing multiple images with custom prompt:

{
  "name": "compare_images",
  "arguments": {
    "image_urls": [
      "https://example.com/image1.jpg",
      "https://example.com/image2.jpg"
    ],
    "custom_prompt": "Compare these UI screenshots and describe the differences in color themes."
  }
}

Dev Setup

Clone the repository:

git clone https://github.com/jettoblack/image_mcp.git
cd image_mcp

Install dependencies:

npm install

Build the project:

npm run build

Starting the Server

node build/index.js

The server will start and listen on stdio for MCP protocol communications.

MCP Tool Installation (local dev build)

Add this to your global mcp_settings.json or project mcp.json:

  "image_summarizer": {
    "command": "node",
    "args": [
      "/path/to/image_mcp/build/index.js",
      "--api-key",
      "key",
      "--base-url",
      "http://localhost:9292/v1",
      "--model",
      "gemma3:4b-it-qat"
    ]
  }

Testing

Running Tests

Run the test suite:

npm test

The test suite includes:

Unit tests for image processing functionality
Integration tests that require a mock server
Tests for both summarize_image and compare_images tools

Mock Server Testing

The project includes a mock OpenAI-compatible server for testing purposes.

Start the mock server in a separate terminal:

node tests/mock-server.js

The mock server will start on http://localhost:9293 and provides endpoints for:

GET /v1/models - Lists available models
POST /v1/chat/completions - Mock chat completions with image support
POST /v1/test/image-process - Test endpoint for image processing validation

Set environment variables for the mock server:

export OPENAI_BASE_URL=http://localhost:9293/v1
export OPENAI_API_KEY=test-key
export OPENAI_MODEL=test-model-vision

Run the integration tests:

npm test tests/integration.test.ts

Real OpenAI-Compatible Server Testing

To test with a real OpenAI-compatible endpoint:

Set up your environment variables:

export OPENAI_API_KEY=your-actual-api-key
export OPENAI_BASE_URL=https://api.openai.com/v1
export OPENAI_MODEL=gpt-4-vision-preview

Or for other OpenAI-compatible services:

export OPENAI_API_KEY=your-service-api-key
export OPENAI_BASE_URL=https://your-service-endpoint/v1
export OPENAI_MODEL=your-vision-model

Start the MCP server:

node build/index.js

Send test requests using an MCP client or test the tools directly.

Manual Testing

You can manually test the MCP server using tools like curl or MCP clients:

# Test with a local image file
curl -X POST http://localhost:8080/sse \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "summarize_image",
      "arguments": {
        "image_url": "/path/to/your/test/image.jpg"
      }
    }
  }'

API Reference

OpenAI-Compatible API Integration

The server sends requests to the OpenAI-compatible chat completion endpoint with the following structure:

{
  "model": "your-model",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image in detail, including all text."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,..."
          }
        }
      ]
    }
  ],
  "stream": false
}

Supported Image Formats

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
WebP (.webp)
SVG (.svg)
BMP (.bmp)
TIFF (.tiff)

Error Handling

The server includes comprehensive error handling for:

Invalid image files
Unsupported image formats
Missing API keys
Network connectivity issues
API response errors

Development

Project Structure

src/
├── config.ts          # Configuration management
├── image-processor.ts # Image processing utilities
├── index.ts          # Main MCP server
└── openai-client.ts  # OpenAI-compatible API client

Building

npm run build

Testing

npm test

License

This project is licensed under the MIT License.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

Support

For issues and questions, please open an issue on the GitHub repository.

Tips

Tips / donations always appreciated to help fund future development.

PayPal: paypal.me/jettoblack
Venmo: venmo.com/u/jettoblack
BTC: bc1qa76jrsvyglxq7t5fxnvfkekjtmp4z82wtm6ywf
ETH: 0x47fc11F09A427540d10a45491d464F02177EAc66