ultimate-image-gen-mcp

anand-92/ultimate-image-gen-mcp

3.3

If you are the rightful owner of ultimate-image-gen-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

Gemini 3 Pro Image MCP Server is a professional server designed for Google's Gemini 3 Pro Image, offering state-of-the-art image generation capabilities with advanced reasoning and high-resolution output.

Tools
2
Resources
0
Prompts
0

Ultimate Gemini MCP Banner

Gemini 3 Pro Image MCP Server 🎨

Professional MCP server exclusively for Google's Gemini 3 Pro Image Preview (aka "Nano Banana Pro") - state-of-the-art image generation with advanced reasoning, high-resolution output (1K-4K), up to 14 reference images, Google Search grounding, and automatic thinking mode.

✨ All generated images include invisible SynthID watermarks for authenticity and provenance tracking.

✨ Features

Gemini 3 Pro Image Capabilities

  • High-Resolution Output: Generate images in 1K, 2K, and 4K resolutions
  • Advanced Text Rendering: Create legible, stylized text in infographics, menus, diagrams, and marketing assets
  • Up to 14 Reference Images: Mix up to 14 reference images (6 objects + 5 humans) for consistent style and characters
  • Google Search Grounding: Use real-time data from Google Search (weather, stocks, events, maps)
  • Thinking Mode: Model uses reasoning process to refine composition before generating final output

Advanced Capabilities

  • 🤖 AI Prompt Enhancement: Automatically optimize prompts using Gemini Flash for superior results
  • 🔍 Google Search Integration: Generate images based on real-time information
  • 🎨 Reference Images: Use up to 14 images for style consistency and character preservation
  • 📐 Flexible Aspect Ratios: Support for 10 aspect ratios (1:1, 16:9, 9:16, 3:2, 4:3, 4:5, 5:4, 2:3, 3:4, 21:9)
  • 💭 Thought Process Visibility: See the model's thinking process (interim images and reasoning)
  • 🚀 Batch Processing: Generate multiple images efficiently with parallel processing
  • 🎯 Dual Modalities: Get both text explanations and images in responses

Production Ready

  • Comprehensive error handling and validation
  • Configurable settings via environment variables
  • Detailed logging and debugging
  • MCP resources for configuration and model information

🎬 Showcase - Gemini 3 Pro Image Features

Gemini 3 Pro Image - Experience state-of-the-art image generation with advanced reasoning and high-resolution output.

Key Features in Action

All images can be generated with 4K resolution and optional AI prompt enhancement.

Example Use Cases

1. High-Resolution Professional Assets

Generate a 4K image of "modern office interior with natural lighting"
- Model: gemini-3-pro-image-preview
- Image Size: 4K
- Aspect Ratio: 16:9

2. Real-Time Data Visualization

Generate an image with Google Search grounding:
"Visualize the current weather forecast for the next 5 days in San Francisco as a clean, modern weather chart. Add a visual on what I should wear each day"
- Enable Google Search: true
- Aspect Ratio: 16:9

3. Reference Image Consistency

Use reference images to maintain consistent characters:
- Provide up to 5 human reference images
- Provide up to 6 object reference images
- Generate "An office group photo of these people, they are making funny faces"

4. Advanced Text Rendering

Generate infographics, menus, or diagrams with legible text:
"Create a restaurant menu with elegant typography showing appetizers, mains, and desserts"
- Image Size: 2K
- Aspect Ratio: 3:4

🔥 Why Gemini 3 Pro Image Is Powerful

  1. State-of-the-Art Quality: Built-in generation capabilities up to 4K resolution
  2. Advanced Reasoning: Thinking mode refines composition before final output
  3. Real-Time Grounding: Google Search integration for accurate, current data
  4. Character Consistency: Use up to 14 reference images for maintaining style
  5. Professional Features: Advanced text rendering for infographics and marketing

🎨 Prompt Enhancement Showcase

See the power of AI prompt enhancement! When enabled, simple prompts can be transformed into detailed, cinematic descriptions:

Original: "A fierce wolf wearing the black symbiote Spider-Man suit, web-slinging through city at night"

Enhanced: "A powerfully built Alaskan Tundra Wolf, snarling fiercely, wearing the matte black, viscous, wet-looking symbiote suit with exaggerated white spider emblem. Captured mid-air in dramatic web-slinging arc with taut glowing webbing. Extreme low-angle perspective, hyper-detailed neo-noir cityscape at midnight with rain-slicked asphalt. High-contrast cinematic lighting with deep shadows and electric neon rim lighting."

Generated Images (2K, 16:9, Prompt Enhancement: ON)

Wolf - Black Symbiote Suit

Wolf in Black Symbiote Suit

Lion - Classic Red & Blue Suit

Lion in Classic Spider-Man Suit

Black Panther - Symbiote Suit

Panther in Symbiote Suit

Eagle - Classic Suit in Flight

Eagle in Spider-Man Suit

Grizzly Bear - Symbiote Suit

Bear in Symbiote Suit

Fox - Classic Suit at Dusk

Fox in Spider-Man Suit

All images generated with enhance_prompt: true, showcasing how simple descriptions can become photorealistic, cinematic masterpieces with dramatic lighting, detailed textures, and professional composition when enhancement is enabled.

📸 Photorealistic Capabilities

Gemini 3 Pro Image excels at creating incredibly realistic images of people in unusual and imaginative scenarios:

Jensen Huang - GPU Surfing

Jensen surfing on GPU through cyberpunk city Riding a giant NVIDIA GPU chip through a neon-lit cyberpunk cityscape

Elon Musk - Mars Chess Match

Elon playing chess with robot on Mars Playing chess with a humanoid robot on the surface of Mars, Earth visible in background

Jensen Huang - GPU Kitchen

Jensen cooking with GPU appliances Cooking breakfast in a high-tech kitchen where all appliances are GPUs with RGB lighting

Elon Musk - Cybertruck Symphony

Elon conducting Cybertruck orchestra Conducting a symphony orchestra made entirely of Tesla Cybertrucks in a concert hall

Jensen Huang - Underwater Data Center

Jensen scuba diving in data center Scuba diving in an underwater data center surrounded by glowing servers and tropical fish

Elon Musk - SpaceX Skateboarding

Elon skateboarding at SpaceX Skateboarding through the SpaceX factory with a Starship rocket in the background

These images demonstrate the model's exceptional ability to:

  • Generate photorealistic human likenesses
  • Blend reality with creative, surreal concepts
  • Maintain accurate lighting and perspective
  • Create detailed, believable environments
  • Handle complex compositions with multiple elements

🚀 Quick Start

Prerequisites

Installation

Option 1: Using uv (Recommended)
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install and run the server
uvx ultimate-gemini-mcp
Option 2: Using pip
pip install ultimate-gemini-mcp
Option 3: From Source
git clone <repository-url>
cd ultimate-gemini-mcp
uv sync

Configuration

Create a .env file in your project directory:

cp .env.example .env
# Edit .env and add your GEMINI_API_KEY

Or set environment variables directly:

export GEMINI_API_KEY=your_api_key_here

📖 Usage

With Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "ultimate-gemini": {
      "command": "uvx",
      "args": ["ultimate-gemini-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Important Notes:

  1. Images are automatically saved to ~/gemini_images (your home directory). You can optionally set OUTPUT_DIR to customize this location:

    • macOS: "OUTPUT_DIR": "/Users/yourusername/custom_folder"
    • Windows: "OUTPUT_DIR": "C:\\Users\\YourUsername\\custom_folder"
  2. uvx path issues on macOS: If you get spawn uvx ENOENT errors, use the full path to uvx:

    "command": "/Users/yourusername/.local/bin/uvx"
    

Config file locations:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

With Claude Code (VS Code)

# Add MCP server to Claude Code
claude mcp add ultimate-gemini \
  --env GEMINI_API_KEY=your-api-key \
  -- uvx ultimate-gemini-mcp

Note: Images are automatically saved to ~/gemini_images. To customize, add --env OUTPUT_DIR=/your/custom/path.

With Cursor

Add to Cursor's MCP configuration (.cursor/mcp.json):

{
  "mcpServers": {
    "ultimate-gemini": {
      "command": "uvx",
      "args": ["ultimate-gemini-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Note: Images are automatically saved to ~/gemini_images. Optionally add "OUTPUT_DIR": "/your/custom/path" to customize.

🎯 Supported Model

This MCP server exclusively supports:

Gemini 3 Pro Image Preview (gemini-3-pro-image-preview)

The only model supported - Google's state-of-the-art image generation model (aka "Nano Banana Pro") optimized for professional asset production with:

  • Built-in 1K, 2K, and 4K resolution support (must use uppercase 'K')
  • Advanced text rendering capabilities for infographics, menus, diagrams, logos
  • Up to 14 reference images (max 6 objects + max 5 humans) for style/character consistency
  • Google Search grounding for real-time data (weather, stocks, events, maps)
  • Thinking mode with reasoning process (automatic, cannot be disabled)
  • Support for both TEXT and IMAGE response modalities
  • SynthID watermarking automatically applied to all generated images

🛠️ Tools

generate_image

Generate professional images using Gemini 3 Pro Image with advanced features.

Parameters:

  • prompt (required): Text description of the image to generate
    • Best Practice: Use descriptive paragraphs, not keyword lists. "Describe the scene, don't just list keywords"
  • model: Model to use (default: gemini-3-pro-image-preview - the only supported model)
  • enhance_prompt: Automatically enhance prompt using Gemini Flash (default: false)
    • Enable for simple/vague prompts; transforms them into detailed, cinematic descriptions
  • aspect_ratio: Image proportions (default: 1:1)
    • Options: "1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"
  • image_size: Resolution (default: 2K)
    • CRITICAL: Must use uppercase 'K': "1K", "2K", or "4K" (lowercase like "2k" will be rejected!)
    • "1K" - Fast testing (1120 tokens, ~1-2MB)
    • "2K" - Recommended for most use cases (1120 tokens, ~3-5MB)
    • "4K" - Maximum quality for production (2000 tokens, ~8-15MB)
  • output_format: Image file format (default: png)
    • Options: "png" (recommended), "jpeg", "webp"
  • reference_image_paths: List of paths to reference images (up to 14 total)
    • Maximum 6 object images for high-fidelity inclusion of products/items
    • Maximum 5 human images for character/person consistency
    • Use for: character consistency, style transfer, object inclusion, multi-person compositions
  • enable_google_search: Enable Google Search grounding for real-time data (default: false)
    • Use for: current events, weather forecasts, stock data, recent news, real-time maps
    • Adds 1-3 seconds latency and includes grounding_metadata in response
  • response_modalities: Response types (default: ["TEXT", "IMAGE"])
    • Options: ["TEXT", "IMAGE"], ["IMAGE"], ["TEXT"]

Examples:

1. Basic image generation:
   Generate an image of "a serene mountain landscape at sunset with a lake reflection"

2. High-resolution with specific aspect ratio:
   Generate a 4K image of "modern minimalist architecture" with aspect_ratio 16:9

3. With Google Search grounding:
   Generate an image with Google Search enabled: "Current weather map for New York City"

4. With reference images:
   Generate an image with reference_image_paths: ["/path/person1.png", "/path/person2.png"]
   and prompt: "An office group photo of these people making funny faces"

batch_generate

Process multiple prompts efficiently with parallel batch processing.

Parameters:

  • prompts (required): List of text prompts
  • model: Model to use for all images
  • enhance_prompt: Enhance all prompts (default: false) - Enable for simple/vague prompts
  • aspect_ratio: Aspect ratio for all images
  • batch_size: Parallel processing size (default: from config)

Example:

Batch generate images for these prompts:
1. "minimalist logo design for a tech startup"
2. "modern dashboard UI design"
3. "mobile app wireframe"

🎨 Advanced Features

AI Prompt Enhancement

When enabled, the server uses Gemini Flash to automatically enhance your prompts:

Original: a cat wearing a space helmet

Enhanced: A photorealistic portrait of a domestic tabby cat wearing a futuristic space helmet, close-up composition, warm studio lighting, detailed fur texture, reflective helmet visor showing subtle reflections, soft focus background, professional photography style

This significantly improves image quality without requiring you to be a prompt engineering expert!

Google Search Grounding

Generate images based on real-time data:

Generate an image with Google Search enabled:
- prompt: "Visualize the current weather forecast for San Francisco as a modern chart"
- enable_google_search: true

The response will include grounding metadata with search sources used.

Reference Images for Consistency

Maintain consistent characters and objects across generations:

Generate an image with:
- prompt: "An office group photo of these people, they are making funny faces"
- reference_image_paths: ["/path/person1.png", "/path/person2.png", "/path/person3.png"]
- aspect_ratio: "5:4"
- image_size: "2K"

You can provide up to 14 reference images (max 6 objects, max 5 humans).

High-Resolution Assets

Generate professional 4K assets:

Generate a 4K image of "minimalist logo design for a tech startup"
with image_size: "4K" and aspect_ratio: "1:1"

⚙️ Configuration

Environment Variables

VariableDescriptionDefault
GEMINI_API_KEYGoogle Gemini API key (required)-
OUTPUT_DIRDirectory for generated images~/gemini_images
ENABLE_PROMPT_ENHANCEMENTEnable AI prompt enhancementfalse
ENABLE_BATCH_PROCESSINGEnable batch processingtrue
DEFAULT_MODELDefault modelgemini-3-pro-image-preview
DEFAULT_IMAGE_SIZEDefault resolution2K
ENABLE_GOOGLE_SEARCHEnable Google Search groundingfalse
REQUEST_TIMEOUTAPI request timeout (seconds)60
MAX_BATCH_SIZEMaximum parallel batch size8
LOG_LEVELLogging levelINFO

📚 MCP Resources

models://list

View all available models with descriptions and features.

settings://config

View current server configuration.

🎭 Use Cases

Web Development

  • Hero images and banners
  • UI/UX mockups and wireframes
  • Logo and branding assets
  • Placeholder images

App Development

  • App icons and splash screens
  • User interface elements
  • Marketing materials
  • Documentation images

Content Creation

  • Blog post illustrations
  • Social media graphics
  • Presentation visuals
  • Product mockups

Creative Projects

  • Character design iterations
  • Concept art exploration
  • Style variations
  • Scene composition

📊 Gemini 3 Pro Image Features

FeatureSupportDetails
Resolution Options✅ 1K, 2K, 4KBuilt-in high-resolution generation (MUST use uppercase 'K')
Reference Images✅ Up to 146 objects + 5 humans for consistency
Google Search Grounding✅ Real-time dataWeather, stocks, events, maps
Thinking Mode✅ Advanced reasoningAutomatic (cannot be disabled), generates up to 2 interim images
Text Rendering✅ AdvancedLegible text in infographics, menus, diagrams, logos
Aspect Ratios✅ 10 optionsFull flexibility for any format
Response Modalities✅ TEXT + IMAGEDual output modes
Prompt Enhancement✅ Built-inAI-powered optimization using Gemini Flash
SynthID Watermarking✅ AutomaticInvisible watermark on all generated images
Thought Signatures✅ AutomaticPreserved across multi-turn interactions (handled by SDK)
Best ForProfessional assets, marketing, real-time visualization, logos, infographics

🐛 Troubleshooting

"spawn uvx ENOENT" error

  • Cause: Claude Desktop cannot find the uvx command in its PATH
  • Solution: Use the full path to uvx in your config:
    "command": "/Users/yourusername/.local/bin/uvx"
    
  • Find your uvx location with: which uvx

Custom output directory

  • Default: Images are automatically saved to ~/gemini_images in your home directory
  • Customize: Set OUTPUT_DIR in your MCP config if you want a different location:
    "env": {
      "GEMINI_API_KEY": "your-key",
      "OUTPUT_DIR": "/your/custom/path"
    }
    

"GEMINI_API_KEY not found"

  • Add your API key to .env or environment variables
  • Get a free key at Google AI Studio

"Content blocked by safety filters"

  • Modify your prompt to comply with content policies
  • Try rephrasing without potentially sensitive content

"Rate limit exceeded"

  • Wait a few moments and try again
  • Consider upgrading your API plan for higher limits

Images not saving

  • Check that OUTPUT_DIR exists and is writable
  • Verify you have sufficient disk space
  • Create the directory manually: mkdir -p /path/to/your/images

🤝 Contributing

Contributions are welcome! This project combines the best features from multiple MCP servers:

  • mcp-image (TypeScript): Prompt enhancement and editing features
  • nanobanana-mcp-server (Python): Architecture and FastMCP integration
  • gemini-imagen-mcp-server (TypeScript): Imagen API support and batch processing

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Built on the excellent work of:

🔗 Links


Ready to create amazing AI-generated images? Install now and start generating! 🚀