ultimate-image-gen-mcp by anand-92 - MCP Server

Ultimate Gemini MCP Banner

Gemini 3 Pro Image MCP Server 🎨

Professional MCP server exclusively for Google's Gemini 3 Pro Image Preview (aka "Nano Banana Pro") - state-of-the-art image generation with advanced reasoning, high-resolution output (1K-4K), up to 14 reference images, Google Search grounding, and automatic thinking mode.

✨ All generated images include invisible SynthID watermarks for authenticity and provenance tracking.

✨ Features

Gemini 3 Pro Image Capabilities

High-Resolution Output: Generate images in 1K, 2K, and 4K resolutions
Advanced Text Rendering: Create legible, stylized text in infographics, menus, diagrams, and marketing assets
Up to 14 Reference Images: Mix up to 14 reference images (6 objects + 5 humans) for consistent style and characters
Google Search Grounding: Use real-time data from Google Search (weather, stocks, events, maps)
Thinking Mode: Model uses reasoning process to refine composition before generating final output

Advanced Capabilities

🤖 AI Prompt Enhancement: Automatically optimize prompts using Gemini Flash for superior results
🔍 Google Search Integration: Generate images based on real-time information
🎨 Reference Images: Use up to 14 images for style consistency and character preservation
📐 Flexible Aspect Ratios: Support for 10 aspect ratios (1:1, 16:9, 9:16, 3:2, 4:3, 4:5, 5:4, 2:3, 3:4, 21:9)
💭 Thought Process Visibility: See the model's thinking process (interim images and reasoning)
🚀 Batch Processing: Generate multiple images efficiently with parallel processing
🎯 Dual Modalities: Get both text explanations and images in responses

Production Ready

Comprehensive error handling and validation
Configurable settings via environment variables
Detailed logging and debugging
MCP resources for configuration and model information

🎬 Showcase - Gemini 3 Pro Image Features

Gemini 3 Pro Image - Experience state-of-the-art image generation with advanced reasoning and high-resolution output.

Key Features in Action

All images can be generated with 4K resolution and optional AI prompt enhancement.

Example Use Cases

1. High-Resolution Professional Assets

Generate a 4K image of "modern office interior with natural lighting"
- Model: gemini-3-pro-image-preview
- Image Size: 4K
- Aspect Ratio: 16:9

2. Real-Time Data Visualization

Generate an image with Google Search grounding:
"Visualize the current weather forecast for the next 5 days in San Francisco as a clean, modern weather chart. Add a visual on what I should wear each day"
- Enable Google Search: true
- Aspect Ratio: 16:9

3. Reference Image Consistency

Use reference images to maintain consistent characters:
- Provide up to 5 human reference images
- Provide up to 6 object reference images
- Generate "An office group photo of these people, they are making funny faces"

4. Advanced Text Rendering

Generate infographics, menus, or diagrams with legible text:
"Create a restaurant menu with elegant typography showing appetizers, mains, and desserts"
- Image Size: 2K
- Aspect Ratio: 3:4

🔥 Why Gemini 3 Pro Image Is Powerful

State-of-the-Art Quality: Built-in generation capabilities up to 4K resolution
Advanced Reasoning: Thinking mode refines composition before final output
Real-Time Grounding: Google Search integration for accurate, current data
Character Consistency: Use up to 14 reference images for maintaining style
Professional Features: Advanced text rendering for infographics and marketing

🎨 Prompt Enhancement Showcase

See the power of AI prompt enhancement! When enabled, simple prompts can be transformed into detailed, cinematic descriptions:

Original: "A fierce wolf wearing the black symbiote Spider-Man suit, web-slinging through city at night"

Enhanced: "A powerfully built Alaskan Tundra Wolf, snarling fiercely, wearing the matte black, viscous, wet-looking symbiote suit with exaggerated white spider emblem. Captured mid-air in dramatic web-slinging arc with taut glowing webbing. Extreme low-angle perspective, hyper-detailed neo-noir cityscape at midnight with rain-slicked asphalt. High-contrast cinematic lighting with deep shadows and electric neon rim lighting."

Generated Images (2K, 16:9, Prompt Enhancement: ON)

Wolf - Black Symbiote Suit

Wolf in Black Symbiote Suit

Lion - Classic Red & Blue Suit

Lion in Classic Spider-Man Suit

Black Panther - Symbiote Suit

Panther in Symbiote Suit

Eagle - Classic Suit in Flight

Eagle in Spider-Man Suit

Grizzly Bear - Symbiote Suit

Bear in Symbiote Suit

Fox - Classic Suit at Dusk

Fox in Spider-Man Suit

All images generated with enhance_prompt: true, showcasing how simple descriptions can become photorealistic, cinematic masterpieces with dramatic lighting, detailed textures, and professional composition when enhancement is enabled.

📸 Photorealistic Capabilities

Gemini 3 Pro Image excels at creating incredibly realistic images of people in unusual and imaginative scenarios:

Jensen Huang - GPU Surfing

Jensen surfing on GPU through cyberpunk city Riding a giant NVIDIA GPU chip through a neon-lit cyberpunk cityscape

Elon Musk - Mars Chess Match

Elon playing chess with robot on Mars Playing chess with a humanoid robot on the surface of Mars, Earth visible in background

Jensen Huang - GPU Kitchen

Jensen cooking with GPU appliances Cooking breakfast in a high-tech kitchen where all appliances are GPUs with RGB lighting

Elon Musk - Cybertruck Symphony

Elon conducting Cybertruck orchestra Conducting a symphony orchestra made entirely of Tesla Cybertrucks in a concert hall

Jensen Huang - Underwater Data Center

Jensen scuba diving in data center Scuba diving in an underwater data center surrounded by glowing servers and tropical fish

Elon Musk - SpaceX Skateboarding

Elon skateboarding at SpaceX Skateboarding through the SpaceX factory with a Starship rocket in the background

These images demonstrate the model's exceptional ability to:

Generate photorealistic human likenesses
Blend reality with creative, surreal concepts
Maintain accurate lighting and perspective
Create detailed, believable environments
Handle complex compositions with multiple elements

🚀 Quick Start

Prerequisites

Python 3.11 or higher
Google Gemini API key (free)

Installation

Option 1: Using uv (Recommended)

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install and run the server
uvx ultimate-gemini-mcp

Option 2: Using pip

pip install ultimate-gemini-mcp

Option 3: From Source

git clone <repository-url>
cd ultimate-gemini-mcp
uv sync

Configuration

Create a .env file in your project directory:

cp .env.example .env
# Edit .env and add your GEMINI_API_KEY

Or set environment variables directly:

export GEMINI_API_KEY=your_api_key_here

📖 Usage

With Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "ultimate-gemini": {
      "command": "uvx",
      "args": ["ultimate-gemini-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Important Notes:

Images are automatically saved to ~/gemini_images (your home directory). You can optionally set OUTPUT_DIR to customize this location:
- macOS: "OUTPUT_DIR": "/Users/yourusername/custom_folder"
- Windows: "OUTPUT_DIR": "C:\\Users\\YourUsername\\custom_folder"
uvx path issues on macOS: If you get spawn uvx ENOENT errors, use the full path to uvx:
```
"command": "/Users/yourusername/.local/bin/uvx"
```

Config file locations:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

With Claude Code (VS Code)

# Add MCP server to Claude Code
claude mcp add ultimate-gemini \
  --env GEMINI_API_KEY=your-api-key \
  -- uvx ultimate-gemini-mcp

Note: Images are automatically saved to ~/gemini_images. To customize, add --env OUTPUT_DIR=/your/custom/path.

With Cursor

Add to Cursor's MCP configuration (.cursor/mcp.json):

{
  "mcpServers": {
    "ultimate-gemini": {
      "command": "uvx",
      "args": ["ultimate-gemini-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Note: Images are automatically saved to ~/gemini_images. Optionally add "OUTPUT_DIR": "/your/custom/path" to customize.

🎯 Supported Model

This MCP server exclusively supports:

Gemini 3 Pro Image Preview (gemini-3-pro-image-preview)

The only model supported - Google's state-of-the-art image generation model (aka "Nano Banana Pro") optimized for professional asset production with:

Built-in 1K, 2K, and 4K resolution support (must use uppercase 'K')
Advanced text rendering capabilities for infographics, menus, diagrams, logos
Up to 14 reference images (max 6 objects + max 5 humans) for style/character consistency
Google Search grounding for real-time data (weather, stocks, events, maps)
Thinking mode with reasoning process (automatic, cannot be disabled)
Support for both TEXT and IMAGE response modalities
SynthID watermarking automatically applied to all generated images

🛠️ Tools

`generate_image`

Generate professional images using Gemini 3 Pro Image with advanced features.

Parameters:

prompt (required): Text description of the image to generate
- Best Practice: Use descriptive paragraphs, not keyword lists. "Describe the scene, don't just list keywords"
model: Model to use (default: gemini-3-pro-image-preview - the only supported model)
enhance_prompt: Automatically enhance prompt using Gemini Flash (default: false)
- Enable for simple/vague prompts; transforms them into detailed, cinematic descriptions
aspect_ratio: Image proportions (default: 1:1)
- Options: "1:1", "2:3", "3:2", "3:4", "4:3", "4:5", "5:4", "9:16", "16:9", "21:9"
image_size: Resolution (default: 2K)
- CRITICAL: Must use uppercase 'K': "1K", "2K", or "4K" (lowercase like "2k" will be rejected!)
- "1K" - Fast testing (1120 tokens, ~1-2MB)
- "2K" - Recommended for most use cases (1120 tokens, ~3-5MB)
- "4K" - Maximum quality for production (2000 tokens, ~8-15MB)
output_format: Image file format (default: png)
- Options: "png" (recommended), "jpeg", "webp"
reference_image_paths: List of paths to reference images (up to 14 total)
- Maximum 6 object images for high-fidelity inclusion of products/items
- Maximum 5 human images for character/person consistency
- Use for: character consistency, style transfer, object inclusion, multi-person compositions
enable_google_search: Enable Google Search grounding for real-time data (default: false)
- Use for: current events, weather forecasts, stock data, recent news, real-time maps
- Adds 1-3 seconds latency and includes grounding_metadata in response
response_modalities: Response types (default: ["TEXT", "IMAGE"])
- Options: ["TEXT", "IMAGE"], ["IMAGE"], ["TEXT"]

Examples:

1. Basic image generation:
   Generate an image of "a serene mountain landscape at sunset with a lake reflection"

2. High-resolution with specific aspect ratio:
   Generate a 4K image of "modern minimalist architecture" with aspect_ratio 16:9

3. With Google Search grounding:
   Generate an image with Google Search enabled: "Current weather map for New York City"

4. With reference images:
   Generate an image with reference_image_paths: ["/path/person1.png", "/path/person2.png"]
   and prompt: "An office group photo of these people making funny faces"

`batch_generate`

Process multiple prompts efficiently with parallel batch processing.

Parameters:

prompts (required): List of text prompts
model: Model to use for all images
enhance_prompt: Enhance all prompts (default: false) - Enable for simple/vague prompts
aspect_ratio: Aspect ratio for all images
batch_size: Parallel processing size (default: from config)

Example:

Batch generate images for these prompts:
1. "minimalist logo design for a tech startup"
2. "modern dashboard UI design"
3. "mobile app wireframe"

🎨 Advanced Features

AI Prompt Enhancement

When enabled, the server uses Gemini Flash to automatically enhance your prompts:

Original: a cat wearing a space helmet

Enhanced: A photorealistic portrait of a domestic tabby cat wearing a futuristic space helmet, close-up composition, warm studio lighting, detailed fur texture, reflective helmet visor showing subtle reflections, soft focus background, professional photography style

This significantly improves image quality without requiring you to be a prompt engineering expert!

Google Search Grounding

Generate images based on real-time data:

Generate an image with Google Search enabled:
- prompt: "Visualize the current weather forecast for San Francisco as a modern chart"
- enable_google_search: true

The response will include grounding metadata with search sources used.

Reference Images for Consistency

Maintain consistent characters and objects across generations:

Generate an image with:
- prompt: "An office group photo of these people, they are making funny faces"
- reference_image_paths: ["/path/person1.png", "/path/person2.png", "/path/person3.png"]
- aspect_ratio: "5:4"
- image_size: "2K"

You can provide up to 14 reference images (max 6 objects, max 5 humans).

High-Resolution Assets

Generate professional 4K assets:

Generate a 4K image of "minimalist logo design for a tech startup"
with image_size: "4K" and aspect_ratio: "1:1"

⚙️ Configuration

Environment Variables

Variable	Description	Default
`GEMINI_API_KEY`	Google Gemini API key (required)	-
`OUTPUT_DIR`	Directory for generated images	`~/gemini_images`
`ENABLE_PROMPT_ENHANCEMENT`	Enable AI prompt enhancement	`false`
`ENABLE_BATCH_PROCESSING`	Enable batch processing	`true`
`DEFAULT_MODEL`	Default model	`gemini-3-pro-image-preview`
`DEFAULT_IMAGE_SIZE`	Default resolution	`2K`
`ENABLE_GOOGLE_SEARCH`	Enable Google Search grounding	`false`
`REQUEST_TIMEOUT`	API request timeout (seconds)	`60`
`MAX_BATCH_SIZE`	Maximum parallel batch size	`8`
`LOG_LEVEL`	Logging level	`INFO`

📚 MCP Resources

`models://list`

View all available models with descriptions and features.

`settings://config`

View current server configuration.

🎭 Use Cases

Web Development

Hero images and banners
UI/UX mockups and wireframes
Logo and branding assets
Placeholder images

App Development

App icons and splash screens
User interface elements
Marketing materials
Documentation images

Content Creation

Blog post illustrations
Social media graphics
Presentation visuals
Product mockups

Creative Projects

Character design iterations
Concept art exploration
Style variations
Scene composition

📊 Gemini 3 Pro Image Features

Feature	Support	Details
Resolution Options	✅ 1K, 2K, 4K	Built-in high-resolution generation (MUST use uppercase 'K')
Reference Images	✅ Up to 14	6 objects + 5 humans for consistency
Google Search Grounding	✅ Real-time data	Weather, stocks, events, maps
Thinking Mode	✅ Advanced reasoning	Automatic (cannot be disabled), generates up to 2 interim images
Text Rendering	✅ Advanced	Legible text in infographics, menus, diagrams, logos
Aspect Ratios	✅ 10 options	Full flexibility for any format
Response Modalities	✅ TEXT + IMAGE	Dual output modes
Prompt Enhancement	✅ Built-in	AI-powered optimization using Gemini Flash
SynthID Watermarking	✅ Automatic	Invisible watermark on all generated images
Thought Signatures	✅ Automatic	Preserved across multi-turn interactions (handled by SDK)
Best For	Professional assets, marketing, real-time visualization, logos, infographics

🐛 Troubleshooting

"spawn uvx ENOENT" error

Cause: Claude Desktop cannot find the uvx command in its PATH

Solution: Use the full path to uvx in your config:

"command": "/Users/yourusername/.local/bin/uvx"

Find your uvx location with: which uvx

Custom output directory

Default: Images are automatically saved to ~/gemini_images in your home directory

Customize: Set OUTPUT_DIR in your MCP config if you want a different location:

"env": {
  "GEMINI_API_KEY": "your-key",
  "OUTPUT_DIR": "/your/custom/path"
}

"GEMINI_API_KEY not found"

Add your API key to .env or environment variables
Get a free key at Google AI Studio

"Content blocked by safety filters"

Modify your prompt to comply with content policies
Try rephrasing without potentially sensitive content

"Rate limit exceeded"

Wait a few moments and try again
Consider upgrading your API plan for higher limits

Images not saving

Check that OUTPUT_DIR exists and is writable
Verify you have sufficient disk space
Create the directory manually: mkdir -p /path/to/your/images

🤝 Contributing

Contributions are welcome! This project combines the best features from multiple MCP servers:

mcp-image (TypeScript): Prompt enhancement and editing features
nanobanana-mcp-server (Python): Architecture and FastMCP integration
gemini-imagen-mcp-server (TypeScript): Imagen API support and batch processing

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Built on the excellent work of:

mcp-image - Prompt enhancement concept
nanobanana-mcp-server - FastMCP architecture
gemini-imagen-mcp-server - Imagen integration

🔗 Links

Ready to create amazing AI-generated images? Install now and start generating! 🚀