pavelsukhachev/mcp-server-gpt-image
If you are the rightful owner of mcp-server-gpt-image and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
MCP Server GPT Image provides access to OpenAI's GPT Image-1 model for advanced image generation and editing capabilities.
generate_image
Generate images from text prompts with customizable parameters.
edit_image
Edit existing images using text prompts and optional masks.
MCP Server GPT Image-1
A Model Context Protocol (MCP) server that provides access to OpenAI's latest image generation capabilities using both the GPT Image-1 model and the new Responses API. This server enables AI assistants like Claude to generate and manipulate images using natural language prompts with cutting-edge multimodal AI technology.
๐ Features
- ๐จ Latest Image Generation: Create stunning images using OpenAI's GPT Image-1 (2025's state-of-the-art model)
- ๐ Dual API Support: Choose between Images API (gpt-image-1) and Responses API (gpt-4o with image tools)
- โ๏ธ Advanced Image Editing: Modify existing images with text prompts and optional masks
- ๐ Multiple Transports: Supports both stdio (for Claude Desktop) and HTTP (for remote access)
- โก Real-time Streaming: Server-Sent Events (SSE) for live progress updates and partial previews
- ๐พ Smart Caching: Two-tier caching system (memory + disk) for instant repeated requests
- ๐ผ๏ธ Image Optimization: Automatic compression with up to 80% size reduction
- ๐ Production Ready: Docker support, session management, and comprehensive error handling
- ๐ Secure: API key authentication via environment variables
- ๐ Flexible Options: Support for various sizes, quality levels, and output formats
- ๐ Conversation Context: Multi-turn image editing with conversation history tracking
- ๐ฏ Superior Text Rendering: GPT Image-1's enhanced text-in-image capabilities
๐ Prerequisites
- Node.js 18 or higher
- npm or yarn
- OpenAI API key with access to image generation models
- API Organization Verification completed for image generation access
๐ Quick Start
Installation
Option 1: Use Pre-built Version (Recommended)
# Clone the repository
git clone https://github.com/pavelsukhachev/mcp-server-gpt-image.git
cd mcp-server-gpt-image
# Install dependencies (required for runtime)
npm install --production
Option 2: Build from Source
# Clone the repository
git clone https://github.com/pavelsukhachev/mcp-server-gpt-image.git
cd mcp-server-gpt-image
# Install all dependencies
npm install
# Build the project
npm run build
Configuration
Create a .env
file in the root directory:
# Required
OPENAI_API_KEY=your-openai-api-key-here
# API Configuration
API_MODE=responses # 'responses' (default, latest) or 'images' (legacy)
RESPONSES_MODEL=gpt-4o # Model for Responses API (default: gpt-4o)
# Optional
PORT=3000
CORS_ORIGIN=*
# Cache Configuration
CACHE_DIR=.cache/images
CACHE_TTL=3600
CACHE_MAX_SIZE=100
# Feature Flags
ENABLE_CONVERSATION_CONTEXT=true # Multi-turn conversation support
ENABLE_STREAMING=true # Real-time streaming updates
ENABLE_OPTIMIZATION=true # Image optimization
๐ฏ API Modes & Model Selection
Responses API (Recommended)
Default Mode: API_MODE=responses
- Model: gpt-4o with image_generation tool
- Technology: Latest 2025 Responses API with integrated GPT Image-1 capabilities
- Features:
- Native multimodal understanding
- Better context awareness
- Enhanced prompt following
- Superior text rendering in images
- Real-time streaming with partial previews
- Multi-turn conversation support
Images API (Legacy)
Legacy Mode: API_MODE=images
- Model: gpt-image-1 (dedicated image model)
- Technology: Traditional Images API endpoint
- Features:
- Direct access to GPT Image-1 model
- Simple, focused image generation
- Backward compatibility
Model Comparison
Feature | Responses API (gpt-4o) | Images API (gpt-image-1) |
---|---|---|
Latest Technology | โ 2025 Responses API | โ ๏ธ Legacy API |
Text in Images | โ Superior | โ Good |
Context Awareness | โ Excellent | โ ๏ธ Limited |
Streaming | โ Partial previews | โ ๏ธ Final only |
Multi-turn | โ Full support | โ ๏ธ Basic |
Performance | โ Optimized | โ Fast |
๐ง Usage
Claude Desktop Integration
Add the following to your Claude Desktop MCP settings (~/Library/Application Support/Claude/claude_desktop_config.json
on macOS):
{
"mcpServers": {
"gpt-image": {
"command": "node",
"args": ["/path/to/mcp-server-gpt-image/dist/index.js", "stdio"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key-here"
}
}
}
}
Standalone Server (HTTP Mode)
# Run in HTTP mode for remote access
npm run start:http
# Or use Docker
docker-compose up
The server will be available at:
- Health check:
http://localhost:3000/health
- MCP endpoint:
http://localhost:3000/mcp
- Streaming endpoint:
http://localhost:3000/mcp/stream
๐ ๏ธ Available Tools
1. generate_image
Generate images from text prompts with optional streaming support.
Parameters:
prompt
(required): Text description of the image to generatesize
: Image dimensions1024x1024
(default)1024x1536
(portrait)1536x1024
(landscape)auto
quality
: Rendering qualitylow
(60% compression)medium
(80% compression)high
(95% quality)auto
(default, 85% quality)
format
: Output format (png
,jpeg
,webp
)background
: Background transparency (transparent
,opaque
,auto
)output_compression
: Explicit compression level (0-100)n
: Number of images to generate (1-4)partialImages
: Number of partial images to stream (1-3, enables streaming)stream
: Enable streaming mode for real-time generation updatesconversationId
: ID for conversation context tracking (optional)useContext
: Whether to use conversation context from previous interactions (default: false)maxContextEntries
: Maximum number of context entries to consider (1-10, default: 5)
Example:
{
"prompt": "A serene Japanese garden with cherry blossoms at sunset",
"size": "1536x1024",
"quality": "high",
"format": "png",
"partialImages": 2,
"stream": true
}
2. edit_image
Edit existing images using text prompts and optional masks.
Parameters:
prompt
(required): Text description of the desired editimages
(required): Array of base64-encoded images to editmask
: Base64-encoded mask for inpainting (optional)- Other parameters same as
generate_image
Example:
{
"prompt": "Add a red bridge over the stream",
"images": ["base64_encoded_image_data..."],
"mask": "base64_encoded_mask_data..."
}
API Mode Support:
- Responses API: Editing via conversation with image input (recommended)
- Images API: Direct image editing using traditional endpoint
3. clear_cache
Clear all cached images from memory and disk.
Example:
// No parameters required
{}
4. cache_stats
Get cache statistics including memory entries and disk usage.
Example:
// No parameters required
{}
5. list_conversations
List all active conversation IDs.
Example:
// No parameters required
{}
6. get_conversation
Get the full history of a specific conversation.
Parameters:
conversationId
(required): The conversation ID to retrieve
Example:
{
"conversationId": "design-session-123"
}
7. clear_conversation
Clear the history of a specific conversation.
Parameters:
conversationId
(required): The conversation ID to clear
Example:
{
"conversationId": "design-session-123"
}
๐ Streaming Image Generation
Real-time Generation with SSE
The server supports streaming image generation via Server-Sent Events (SSE) for real-time progress updates and partial image previews.
Endpoint: POST /mcp/stream
Request Body:
{
"prompt": "A beautiful sunset over mountains",
"partialImages": 3,
"size": "1024x1024",
"quality": "high"
}
Response: Server-Sent Events stream
Event Types:
progress
: Generation progress updates with percentage and messagepartial
: Partial image preview (base64 encoded)complete
: Final image with revised prompterror
: Error information if generation fails
API Mode Examples
Responses API Streaming (Recommended):
// Using Responses API with gpt-4o
const response = await fetch('http://localhost:3000/mcp/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'A futuristic city with flying cars',
partialImages: 2,
apiMode: 'responses' // Use latest Responses API
})
});
Images API Streaming (Legacy):
// Using traditional Images API with gpt-image-1
const response = await fetch('http://localhost:3000/mcp/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'A futuristic city with flying cars',
partialImages: 2,
apiMode: 'images' // Use legacy Images API
})
});
Example Client:
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const events = chunk.split('\n\n');
for (const event of events) {
if (event.startsWith('data: ')) {
const data = JSON.parse(event.slice(6));
console.log('Event:', data.type, data.data?.message);
}
}
}
See examples/streaming-client.ts
for a complete implementation.
๐ Multi-turn Conversation Context
Iterative Image Refinement
The server now supports maintaining conversation context across multiple image generation and editing operations. This allows for iterative refinement where each new prompt can build upon previous results.
How it works:
- Assign a
conversationId
to group related operations - Enable
useContext: true
to enhance prompts with previous context - The system automatically tracks prompts, revised prompts, and image metadata
- Context is persisted to disk for resuming sessions later
Example Workflow:
// Initial generation
{
"prompt": "Create a serene mountain landscape",
"conversationId": "landscape-design-001",
"useContext": false // First prompt doesn't need context
}
// Iterative refinement with context
{
"prompt": "Add a crystal clear lake in the foreground",
"conversationId": "landscape-design-001",
"useContext": true, // Will consider previous "mountain landscape" context
"maxContextEntries": 5
}
// Further editing
{
"prompt": "Make the sky more dramatic with sunset colors",
"images": ["previous_generated_image_base64..."],
"conversationId": "landscape-design-001",
"useContext": true // Considers both previous prompts for consistency
}
Benefits:
- Consistency: Maintains style and elements across iterations
- Context Awareness: Each generation considers previous prompts and results
- Session Persistence: Resume work later with full context preserved
- Flexible History: Control how much context to use with
maxContextEntries
Managing Conversations:
- Use
list_conversations
to see all active sessions - Use
get_conversation
to review the full history of a session - Use
clear_conversation
to start fresh when needed
๐พ Response Caching
Intelligent Cache System
The server includes an intelligent caching system to reduce API calls and improve response times.
Features:
- Memory + Disk Cache: Two-tier caching for optimal performance
- Content-Based Keys: Cache keys based on prompt, size, quality, and other parameters
- TTL Support: Configurable time-to-live for cache entries
- Size Management: Automatic cleanup when cache exceeds size limits
- Cache Tools: Built-in tools for cache management
Configuration (via environment variables):
CACHE_DIR=.cache/images # Cache directory (default: .cache/images)
CACHE_TTL=3600 # Cache TTL in seconds (default: 1 hour)
CACHE_MAX_SIZE=100 # Max cache size in MB (default: 100MB)
Cache Behavior:
- Identical requests return cached results instantly
- Cache hits are logged for monitoring
- Expired entries are automatically cleaned up
- Edit operations cache based on image+mask+prompt combination
๐ผ๏ธ Image Optimization
Automatic Optimization Engine
The server includes an intelligent image optimization engine powered by Sharp.
Features:
- Format Conversion: Automatically convert between PNG, JPEG, and WebP
- Smart Compression: Adaptive quality based on image characteristics
- Size Constraints: Maintain dimensions while reducing file size
- Transparency Handling: Preserve alpha channels when needed
- Progressive Encoding: Better perceived loading performance
Optimization Results:
- Typical size reductions: 30-70% for JPEG, 20-50% for WebP
- Automatic format selection based on content type
- Preserved visual quality with smaller file sizes
- Logged optimization metrics for monitoring
๐ณ Docker Support
Using Docker Compose
# Build and run
docker-compose up -d
# View logs
docker-compose logs -f
# Stop
docker-compose down
Docker Configuration
The included docker-compose.yml
provides:
- Automatic container restart
- Health checks
- Volume mounting for generated images
- Environment variable configuration
๐๏ธ Architecture
The codebase follows SOLID principles and clean architecture patterns for maintainability and testability.
src/
โโโ index.ts # Entry point with transport selection
โโโ server.ts # MCP server setup and tool registration
โโโ types.ts # TypeScript interfaces and Zod schemas
โโโ interfaces/ # Contract definitions (Dependency Inversion)
โ โโโ image-generation.interface.ts # Core interfaces for DI
โโโ services/ # Business logic (Single Responsibility)
โ โโโ image-generator.ts # Main image generation service
โ โโโ streaming-image-generator.ts # Streaming implementation
โ โโโ file-converter.ts # File conversion utilities
โ โโโ openai-client-adapter.ts # OpenAI API adapter
โโโ adapters/ # Interface adapters (Open/Closed)
โ โโโ cache-adapter.ts # Cache interface implementation
โ โโโ optimizer-adapter.ts # Image optimizer interface
โโโ tools/
โ โโโ image-generation.ts # Tool endpoints using services
โ โโโ image-generation-streaming.ts # Streaming endpoints
โโโ transport/
โ โโโ http.ts # HTTP/SSE transport with session management
โโโ utils/
โโโ cache.ts # Two-tier caching system
โโโ image-optimizer.ts # Sharp-based image optimization
Key Design Patterns
- Dependency Injection: All services depend on interfaces, not concrete implementations
- Single Responsibility: Each class has one clear purpose
- Open/Closed Principle: Services are extensible through interfaces
- Interface Segregation: Focused interfaces for specific concerns
- Liskov Substitution: All implementations are interchangeable
๐ Examples
The examples/
directory contains complete, runnable examples:
test-client.ts
: Basic MCP client examplestreaming-client.ts
: Streaming image generation with SSEoptimization-demo.ts
: Image optimization features demonstration
Run examples with:
npx tsx examples/streaming-client.ts
๐ฐ Cost Considerations
GPT Image-1 generates images by producing specialized image tokens. Cost and latency depend on:
Quality | Square (1024ร1024) | Portrait (1024ร1536) | Landscape (1536ร1024) |
---|---|---|---|
Low | 272 tokens | 408 tokens | 400 tokens |
Medium | 1056 tokens | 1584 tokens | 1568 tokens |
High | 4160 tokens | 6240 tokens | 6208 tokens |
Pricing: $5.00/1M text input tokens, $10.00/1M image input tokens, $40.00/1M image output tokens
๐ Security Best Practices
-
API Key Management:
- Never commit API keys to version control
- Use environment variables for sensitive data
- Rotate API keys regularly
-
Network Security:
- Configure CORS appropriately for production
- Use HTTPS in production environments
- Implement rate limiting for public deployments
-
Input Validation:
- All inputs are validated using Zod schemas
- File size limits are enforced
- Content moderation is applied by default
๐ Performance Tips
Optimize Generation Speed
- Use Lower Quality: Start with
quality: "low"
for drafts, then regenerate with higher quality - Enable Caching: Identical requests are served instantly from cache
- Use Streaming: Get partial results faster with
partialImages
parameter
Reduce Costs
- Cache Results: Automatic caching prevents redundant API calls
- Optimize Images: Use compression to reduce storage and bandwidth
- Monitor Usage: Track cache stats to understand usage patterns
Improve Quality
- Detailed Prompts: Include style, mood, lighting, and perspective details
- Reference Styles: Mention specific art styles or artists for consistency
- Iterative Refinement: Use edit_image tool to refine specific areas
๐ง Roadmap
Completed โ
- Basic image generation and editing
- Docker support
- Pre-built distribution
- Streaming Infrastructure (SSE-based)
- Partial Image Simulation (1-3 previews)
- Response Caching (memory + disk)
- Image Optimization (format conversion & compression)
- SOLID principles architecture refactoring
- Comprehensive test suite with 90+ tests
- Test coverage reporting (98%+ for core utilities)
- TDD (Test-Driven Development) practices
- Multi-turn editing with conversation context
- OpenAI Responses API integration with GPT-4o + image_generation tool
- Dual API support (Images API + Responses API) with seamless switching
In Progress ๐
- Batch processing with queue management
Future Plans ๐
- WebSocket transport for bidirectional communication
- File upload support (direct image handling)
- Custom prompts library
- Usage analytics and cost tracking
- Web dashboard for server management
- Plugin system for custom processors
โ ๏ธ Known Limitations
API Limitations
- Generation Time: Complex prompts may take up to 30 seconds
- Text Rendering: Generated text in images may have inconsistencies
- Response Format: Currently returns base64 images only (no URL support)
- Model Access: Requires organization verification for GPT Image-1
Technical Limitations
- Max Image Size: Limited by base64 encoding and transport constraints
- Concurrent Requests: Rate limited by OpenAI API quotas
- Cache Size: Limited by available disk space
๐งช Testing
The project uses Vitest for testing with comprehensive coverage:
# Run all tests
npm test
# Run tests with coverage
npm run test:coverage
# Run tests in watch mode
npm test -- --watch
# Run specific test file
npm test -- src/utils/cache.test.ts
Test Coverage
- Overall: ~50% statements
- Core Services: 78.91% coverage
- Utilities: 98.88% coverage (Cache: 100%, ImageOptimizer: 97.69%)
- Server: 96.08% coverage
Testing Approach
- Unit Tests: Comprehensive tests for all services and utilities
- Integration Tests: MCP server endpoint testing
- TDD Practice: Write tests first, then implementation
- Mocking: Proper dependency mocking for isolated testing
๐ค Contributing
Contributions are welcome! Please follow our development practices:
Development Process
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Write tests first (TDD approach)
- Implement your feature following SOLID principles
- Ensure all tests pass (
npm test
) - Check test coverage (
npm run test:coverage
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Code Standards
- Follow TypeScript best practices
- Maintain test coverage above 80%
- Use dependency injection for new services
- Follow existing code patterns and conventions
- Document complex logic with clear comments
๐ License
This project is licensed under the MIT License - see the file for details.
๐ Acknowledgments
- Built with the Model Context Protocol SDK
- Powered by OpenAI's GPT Image-1
- Image optimization by Sharp
- Inspired by the MCP community
๐ Resources
- Model Context Protocol Documentation
- OpenAI Image Generation Guide
- GPT Image-1 Documentation
- MCP Server Examples
Note: This is an unofficial implementation. GPT Image-1 is a product of OpenAI.