tan-yong-sheng/ai-vision-mcp
If you are the rightful owner of ai-vision-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
AI Vision MCP Server is a robust Model Context Protocol server designed for AI-driven image and video analysis using Google Gemini and Vertex AI models.
AI Vision MCP Server
A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models.
Features
- Dual Provider Support: Choose between Google Gemini API and Vertex AI
- Multimodal Analysis: Support for both image and video content analysis
- Flexible File Handling: Upload via multiple methods (URLs, local files, base64)
- Storage Integration: Built-in Google Cloud Storage support
- Comprehensive Validation: Zod-based data validation throughout
- Error Handling: Robust error handling with retry logic and circuit breakers
- TypeScript: Full TypeScript support with strict type checking
Installation
npm install ai-vision-mcp
Quick Start
Using Google Gemini API
- Set your environment variables:
export IMAGE_PROVIDER="google" # or vertex_ai
export VIDEO_PROVIDER="google" # or vertex_ai
export GEMINI_API_KEY="your-gemini-api-key"
- Start the MCP server:
npx ai-vision-mcp
Using Vertex AI
- Set your environment variables:
export IMAGE_PROVIDER="vertex_ai"
export VIDEO_PROVIDER="vertex_ai"
export VERTEX_CREDENTIALS="/path/to/service-account.json"
export GCS_BUCKET_NAME="your-gcs-bucket"
Refer to
- Start the MCP server:
npx ai-vision-mcp
MCP Tools
The server provides two main MCP tools:
analyze_image
Analyzes an image using AI and returns a detailed description.
Parameters:
imageSource
(string): URL, base64 data, or file path to the imageprompt
(string): Question or instruction for the AIoptions
(object, optional): Analysis options including temperature and max tokens
Examples:
- Analyze image from URL:
{
"imageSource": "https://plus.unsplash.com/premium_photo-1710965560034-778eedc929ff",
"prompt": "What is this image about? Describe what you see in detail."
}
- Analyze local image file:
{
"imageSource": "C:\\Users\\username\\Downloads\\image.jpg",
"prompt": "What is this image about? Describe what you see in detail."
}
analyze_video
Analyzes a video using AI and returns a detailed description.
Parameters:
videoSource
(string): YouTube URL, GCS URI, or local file path to the videoprompt
(string): Question or instruction for the AIoptions
(object, optional): Analysis options including temperature and max tokens
Supported video sources:
- YouTube URLs (e.g.,
https://www.youtube.com/watch?v=...
) - Local file paths (e.g.,
C:\Users\username\Downloads\video.mp4
)
Examples:
- Analyze video from YouTube URL:
{
"videoSource": "https://www.youtube.com/watch?v=9hE5-98ZeCg",
"prompt": "What is this video about? Describe what you see in detail."
}
- Analyze local video file:
{
"videoSource": "C:\\Users\\username\\Downloads\\video.mp4",
"prompt": "What is this video about? Describe what you see in detail."
}
Note: Only YouTube URLs are supported for public video URLs. Other public video URLs are not currently supported.
Configuration
Environment Variables
Variable | Required | Description | Default |
---|---|---|---|
Provider Selection | |||
IMAGE_PROVIDER | Yes | Provider for image analysis | google ,vertex_ai |
VIDEO_PROVIDER | Yes | Provider for video analysis | google ,vertex_ai |
Model Selection | |||
IMAGE_MODEL | No | Model for image analysis | gemini-2.5-flash-lite |
VIDEO_MODEL | No | Model for video analysis | gemini-2.5-flash |
FALLBACK_IMAGE_MODEL | No | Fallback Model for image analysis | gemini-2.5-flash-lite |
FALLBACK_VIDEO_MODEL | No | Fallback Model for video analysis | gemini-2.5-flash |
Google Gemini API | |||
GEMINI_API_KEY | Yes if IMAGE_PROVIDER or VIDEO_PROVIDER = google | Google Gemini API key | Required for Gemini |
GEMINI_BASE_URL | No | Gemini API base URL | https://generativelanguage.googleapis.com |
Vertex AI | |||
VERTEX_CREDENTIALS | Yes if IMAGE_PROVIDER or VIDEO_PROVIDER = vertex_ai | Path to GCP service account JSON | Required for Vertex AI |
VERTEX_PROJECT_ID | Auto | Google Cloud project ID | Auto-derived from credentials |
VERTEX_LOCATION | No | Vertex AI region | us-central1 |
VERTEX_ENDPOINT | No | Vertex AI endpoint URL | https://aiplatform.googleapis.com |
Google Cloud Storage (Vertex AI) | |||
GCS_BUCKET_NAME | If IMAGE_PROVIDER or VIDEO_PROVIDER = vertex_ai | GCS bucket name for Vertex AI uploads | Required for Vertex AI |
GCS_CREDENTIALS | No | Path to GCS credentials | Defaults to VERTEX_CREDENTIALS |
GCS_PROJECT_ID | No | GCS project ID | Auto-derived from VERTEX_CREDENTIALS |
GCS_REGION | No | GCS region | Defaults to VERTEX_LOCATION |
API Configuration | |||
TEMPERATURE | No | AI response temperature (0.0ā2.0) | 0.8 |
TOP_P | No | Top-p sampling parameter (0.0ā1.0) | 0.6 |
MAX_TOKENS_FOR_IMAGE | No | Maximum tokens for image analysis | 500 |
MAX_TOKENS_FOR_VIDEO | No | Maximum tokens for video analysis | 2000 |
File Processing | |||
MAX_IMAGE_SIZE | No | Maximum image size in bytes | 20971520 (20 MB) |
MAX_VIDEO_SIZE | No | Maximum video size in bytes | 2147483648 (2 GB) |
MAX_VIDEO_DURATION | No | Maximum video duration (seconds) | 3600 (1 hour) |
ALLOWED_IMAGE_FORMATS | No | Comma-separated image formats | png,jpg,jpeg,webp,gif,bmp,tiff |
ALLOWED_VIDEO_FORMATS | No | Comma-separated video formats | mp4,mov,avi,mkv,webm,flv,wmv,3gp |
Development | |||
LOG_LEVEL | No | Logging level | info |
NODE_ENV | No | Environment mode | development |
GEMINI_FILES_API_THRESHOLD | No | Size threshold for Gemini Files API (bytes) | 10485760 (10 MB) |
VERTEX_AI_FILES_API_THRESHOLD | No | Size threshold for Vertex AI uploads (bytes) | 0 |
Supported Formats
Images: PNG, JPG, JPEG, WebP, GIF, BMP, TIFF, HEIC, HEIF Videos: MP4, MOV, AVI, MKV, WebM, FLV, WMV, 3GP, M4V
Development
Prerequisites
- Node.js 18+
- npm or yarn
Setup
# Clone the repository
git clone https://github.com/tan-yong-sheng/ai-vision-mcp.git
cd ai-vision-mcp
# Install dependencies
npm install
# Build the project
npm run build
# Run tests
npm test
# Start development server
npm run dev
Scripts
npm run build
- Build the TypeScript projectnpm run dev
- Start development server with watch modenpm test
- Run the test suitenpm run lint
- Run ESLintnpm run format
- Format code with Prettiernpm start
- Start the built server
Architecture
The project follows a modular architecture:
src/
āāā providers/ # AI provider implementations
ā āāā gemini/ # Google Gemini provider
ā āāā vertexai/ # Vertex AI provider
ā āāā factory/ # Provider factory
āāā services/ # Core services
ā āāā ConfigService.ts
ā āāā FileService.ts
āāā storage/ # Storage implementations
āāā file-upload/ # File upload strategies
āāā types/ # TypeScript type definitions
āāā utils/ # Utility functions
āāā server.ts # Main MCP server
Error Handling
The server includes comprehensive error handling:
- Validation Errors: Input validation using Zod schemas
- Network Errors: Automatic retries with exponential backoff
- Authentication Errors: Clear error messages for API key issues
- File Errors: Handling for file size limits and format restrictions
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the MIT License - see the file for details.
Acknowledgments
- Google for the Gemini and Vertex AI APIs
- The Model Context Protocol team for the MCP framework
- All contributors and users of this project