ai-vision-mcp

tan-yong-sheng/ai-vision-mcp

3.2

If you are the rightful owner of ai-vision-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

AI Vision MCP Server is a robust Model Context Protocol server designed for AI-driven image and video analysis using Google Gemini and Vertex AI models.

Tools
2
Resources
0
Prompts
0

AI Vision MCP Server

A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models.

Features

  • Dual Provider Support: Choose between Google Gemini API and Vertex AI
  • Multimodal Analysis: Support for both image and video content analysis
  • Flexible File Handling: Upload via multiple methods (URLs, local files, base64)
  • Storage Integration: Built-in Google Cloud Storage support
  • Comprehensive Validation: Zod-based data validation throughout
  • Error Handling: Robust error handling with retry logic and circuit breakers
  • TypeScript: Full TypeScript support with strict type checking

Installation

npm install ai-vision-mcp

Quick Start

Using Google Gemini API

  1. Set your environment variables:
export IMAGE_PROVIDER="google" # or vertex_ai
export VIDEO_PROVIDER="google" # or vertex_ai
export GEMINI_API_KEY="your-gemini-api-key"
  1. Start the MCP server:
npx ai-vision-mcp

Using Vertex AI

  1. Set your environment variables:
export IMAGE_PROVIDER="vertex_ai"
export VIDEO_PROVIDER="vertex_ai"
export VERTEX_CREDENTIALS="/path/to/service-account.json"
export GCS_BUCKET_NAME="your-gcs-bucket"

Refer to

  1. Start the MCP server:
npx ai-vision-mcp

MCP Tools

The server provides two main MCP tools:

analyze_image

Analyzes an image using AI and returns a detailed description.

Parameters:

  • imageSource (string): URL, base64 data, or file path to the image
  • prompt (string): Question or instruction for the AI
  • options (object, optional): Analysis options including temperature and max tokens

Examples:

  1. Analyze image from URL:
{
  "imageSource": "https://plus.unsplash.com/premium_photo-1710965560034-778eedc929ff",
  "prompt": "What is this image about? Describe what you see in detail."
}
  1. Analyze local image file:
{
  "imageSource": "C:\\Users\\username\\Downloads\\image.jpg",
  "prompt": "What is this image about? Describe what you see in detail."
}

analyze_video

Analyzes a video using AI and returns a detailed description.

Parameters:

  • videoSource (string): YouTube URL, GCS URI, or local file path to the video
  • prompt (string): Question or instruction for the AI
  • options (object, optional): Analysis options including temperature and max tokens

Supported video sources:

  • YouTube URLs (e.g., https://www.youtube.com/watch?v=...)
  • Local file paths (e.g., C:\Users\username\Downloads\video.mp4)

Examples:

  1. Analyze video from YouTube URL:
{
  "videoSource": "https://www.youtube.com/watch?v=9hE5-98ZeCg",
  "prompt": "What is this video about? Describe what you see in detail."
}
  1. Analyze local video file:
{
  "videoSource": "C:\\Users\\username\\Downloads\\video.mp4",
  "prompt": "What is this video about? Describe what you see in detail."
}

Note: Only YouTube URLs are supported for public video URLs. Other public video URLs are not currently supported.

Configuration

Environment Variables

VariableRequiredDescriptionDefault
Provider Selection
IMAGE_PROVIDERYesProvider for image analysisgoogle,vertex_ai
VIDEO_PROVIDERYesProvider for video analysisgoogle,vertex_ai
Model Selection
IMAGE_MODELNoModel for image analysisgemini-2.5-flash-lite
VIDEO_MODELNoModel for video analysisgemini-2.5-flash
FALLBACK_IMAGE_MODELNoFallback Model for image analysisgemini-2.5-flash-lite
FALLBACK_VIDEO_MODELNoFallback Model for video analysisgemini-2.5-flash
Google Gemini API
GEMINI_API_KEYYes if IMAGE_PROVIDER or VIDEO_PROVIDER = googleGoogle Gemini API keyRequired for Gemini
GEMINI_BASE_URLNoGemini API base URLhttps://generativelanguage.googleapis.com
Vertex AI
VERTEX_CREDENTIALSYes if IMAGE_PROVIDER or VIDEO_PROVIDER = vertex_aiPath to GCP service account JSONRequired for Vertex AI
VERTEX_PROJECT_IDAutoGoogle Cloud project IDAuto-derived from credentials
VERTEX_LOCATIONNoVertex AI regionus-central1
VERTEX_ENDPOINTNoVertex AI endpoint URLhttps://aiplatform.googleapis.com
Google Cloud Storage (Vertex AI)
GCS_BUCKET_NAMEIf IMAGE_PROVIDER or VIDEO_PROVIDER = vertex_aiGCS bucket name for Vertex AI uploadsRequired for Vertex AI
GCS_CREDENTIALSNoPath to GCS credentialsDefaults to VERTEX_CREDENTIALS
GCS_PROJECT_IDNoGCS project IDAuto-derived from VERTEX_CREDENTIALS
GCS_REGIONNoGCS regionDefaults to VERTEX_LOCATION
API Configuration
TEMPERATURENoAI response temperature (0.0–2.0)0.8
TOP_PNoTop-p sampling parameter (0.0–1.0)0.6
MAX_TOKENS_FOR_IMAGENoMaximum tokens for image analysis500
MAX_TOKENS_FOR_VIDEONoMaximum tokens for video analysis2000
File Processing
MAX_IMAGE_SIZENoMaximum image size in bytes20971520 (20 MB)
MAX_VIDEO_SIZENoMaximum video size in bytes2147483648 (2 GB)
MAX_VIDEO_DURATIONNoMaximum video duration (seconds)3600 (1 hour)
ALLOWED_IMAGE_FORMATSNoComma-separated image formatspng,jpg,jpeg,webp,gif,bmp,tiff
ALLOWED_VIDEO_FORMATSNoComma-separated video formatsmp4,mov,avi,mkv,webm,flv,wmv,3gp
Development
LOG_LEVELNoLogging levelinfo
NODE_ENVNoEnvironment modedevelopment
GEMINI_FILES_API_THRESHOLDNoSize threshold for Gemini Files API (bytes)10485760 (10 MB)
VERTEX_AI_FILES_API_THRESHOLDNoSize threshold for Vertex AI uploads (bytes)0

Supported Formats

Images: PNG, JPG, JPEG, WebP, GIF, BMP, TIFF, HEIC, HEIF Videos: MP4, MOV, AVI, MKV, WebM, FLV, WMV, 3GP, M4V

Development

Prerequisites

  • Node.js 18+
  • npm or yarn

Setup

# Clone the repository
git clone https://github.com/tan-yong-sheng/ai-vision-mcp.git
cd ai-vision-mcp

# Install dependencies
npm install

# Build the project
npm run build

# Run tests
npm test

# Start development server
npm run dev

Scripts

  • npm run build - Build the TypeScript project
  • npm run dev - Start development server with watch mode
  • npm test - Run the test suite
  • npm run lint - Run ESLint
  • npm run format - Format code with Prettier
  • npm start - Start the built server

Architecture

The project follows a modular architecture:

src/
ā”œā”€ā”€ providers/          # AI provider implementations
│   ā”œā”€ā”€ gemini/        # Google Gemini provider
│   ā”œā”€ā”€ vertexai/      # Vertex AI provider
│   └── factory/       # Provider factory
ā”œā”€ā”€ services/          # Core services
│   ā”œā”€ā”€ ConfigService.ts
│   └── FileService.ts
ā”œā”€ā”€ storage/           # Storage implementations
ā”œā”€ā”€ file-upload/       # File upload strategies
ā”œā”€ā”€ types/            # TypeScript type definitions
ā”œā”€ā”€ utils/            # Utility functions
└── server.ts         # Main MCP server

Error Handling

The server includes comprehensive error handling:

  • Validation Errors: Input validation using Zod schemas
  • Network Errors: Automatic retries with exponential backoff
  • Authentication Errors: Clear error messages for API key issues
  • File Errors: Handling for file size limits and format restrictions

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the file for details.

Acknowledgments

  • Google for the Gemini and Vertex AI APIs
  • The Model Context Protocol team for the MCP framework
  • All contributors and users of this project