TJC-LP/sanzaru
If you are the rightful owner of sanzaru and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
Sanzaru is a stateless, lightweight MCP server that integrates OpenAI's Sora Video API, Whisper, and GPT-4o Audio APIs using the OpenAI Python SDK.
sanzaru
A stateless, lightweight MCP server that wraps OpenAI's Sora Video API, Whisper, and GPT-4o Audio APIs via the OpenAI Python SDK.
Features
Video Generation (Sora)
- Create videos with
sora-2orsora-2-promodels - Use reference images to guide generation
- Remix and refine existing videos
- Download variants (video, thumbnail, spritesheet)
Image Generation
- Generate images with gpt-image-1.5 (recommended) or GPT-5
- Edit and compose images with up to 16 inputs
- Iterative refinement via Responses API
- Automatic resizing for Sora compatibility
Audio Processing
- Transcription: Whisper and GPT-4o models
- Audio Chat: Interactive analysis with GPT-4o
- Text-to-Speech: Multi-voice TTS generation
- Processing: Format conversion, compression, file management
Note: Content guardrails are enforced by OpenAI. This server does not run local moderation.
Requirements
- Python 3.10+
OPENAI_API_KEYenvironment variable
Feature-specific paths (set only what you need):
VIDEO_PATH- Enables video generation featuresIMAGE_PATH- Enables image generation featuresAUDIO_PATH- Enables audio processing features
Quick Start
-
Clone the repository:
git clone https://github.com/TJC-LP/sanzaru.git cd sanzaru -
Run the setup script:
./setup.shThe script will:
- Prompt for your OpenAI API key
- Create directories and
.envconfiguration - Install dependencies with
uv sync --all-extras --dev
-
Start using:
claude
That's it! Claude Code will automatically connect and you can start generating videos, images, and processing audio.
Installation
Quick Install
# All features
uv add "sanzaru[all]"
# Specific features
uv add "sanzaru[audio]" # With audio support
uv add sanzaru # Base (video + image only)
Alternative Installation Methods
From Source
git clone https://github.com/TJC-LP/sanzaru.git
cd sanzaru
uv sync --all-extras
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"sanzaru": {
"command": "uvx",
"args": ["sanzaru[all]"],
"env": {
"OPENAI_API_KEY": "your-api-key-here",
"VIDEO_PATH": "/absolute/path/to/videos",
"IMAGE_PATH": "/absolute/path/to/images",
"AUDIO_PATH": "/absolute/path/to/audio"
}
}
}
}
Or from source:
{
"mcpServers": {
"sanzaru": {
"command": "uv",
"args": ["run", "--directory", "/path/to/sanzaru", "sanzaru"]
}
}
}
Codex MCP
# Using uvx (from PyPI)
codex mcp add sanzaru \
--env OPENAI_API_KEY="sk-..." \
--env VIDEO_PATH="$HOME/sanzaru-videos" \
--env IMAGE_PATH="$HOME/sanzaru-images" \
--env AUDIO_PATH="$HOME/sanzaru-audio" \
-- uvx "sanzaru[all]"
# Or from source
cd /path/to/sanzaru
set -a; source .env; set +a
codex mcp add sanzaru \
--env OPENAI_API_KEY="$OPENAI_API_KEY" \
--env VIDEO_PATH="$VIDEO_PATH" \
--env IMAGE_PATH="$IMAGE_PATH" \
--env AUDIO_PATH="$AUDIO_PATH" \
-- uv run --directory "$(pwd)" sanzaru
Manual Setup
uv venv
uv sync
# Set required environment variables
export OPENAI_API_KEY=sk-...
export VIDEO_PATH=~/videos
export IMAGE_PATH=~/images
export AUDIO_PATH=~/audio
# Run server
uv run sanzaru
Feature Auto-Detection: Features are automatically enabled based on configured paths. Set only the paths you need.
Available Tools
| Category | Tools | Description |
|---|---|---|
| Video | create_video, get_video_status, download_video, list_videos, delete_video, remix_video | Generate and manage Sora videos with optional reference images |
| Image | generate_image, edit_image, create_image, get_image_status, download_image | Generate with gpt-image-1.5 (sync) or GPT-5 (polling) |
| Reference | list_reference_images, prepare_reference_image | Manage and resize images for Sora compatibility |
| Audio | transcribe_audio, chat_with_audio, create_audio, convert_audio, compress_audio, list_audio_files, get_latest_audio, transcribe_with_enhancement | Transcription, analysis, TTS, and file management |
Full API documentation: See
Basic Workflows
Generate a Video
# Create video from text
video = create_video(
prompt="A serene mountain landscape at sunrise",
model="sora-2",
seconds="8",
size="1280x720"
)
# Poll for completion
status = get_video_status(video.id)
# Download when ready
download_video(video.id, filename="mountain_sunrise.mp4")
Generate with Reference Image
# 1. Generate reference image (gpt-image-1.5, synchronous)
generate_image(
prompt="futuristic pilot in mech cockpit",
size="1536x1024",
filename="pilot.png"
)
# 2. Prepare for video (resize to Sora dimensions)
prepare_reference_image("pilot.png", "1280x720", resize_mode="crop")
# 3. Animate
video = create_video(
prompt="The pilot looks up and smiles",
size="1280x720",
input_reference_filename="pilot_1280x720.png"
)
Audio Transcription
# List available audio files
files = list_audio_files(format="mp3")
# Transcribe
result = transcribe_audio("interview.mp3")
# Or analyze with GPT-4o
analysis = chat_with_audio(
"meeting.mp3",
user_prompt="Summarize key decisions and action items"
)
Documentation
- - Complete tool documentation with parameters and examples
- - Working with reference images and resizing
- - Generating and editing reference images
- - Crafting effective video prompts
- - Audio transcription, chat, and TTS
- - Technical details and benchmarks
Performance
Fully asynchronous architecture with proven scalability:
- ✅ 32+ concurrent operations verified
- ✅ 8-10x speedup for parallel tasks
- ✅ Non-blocking I/O with
aiofiles+anyio - ✅ Python 3.14 free-threading ready
See for technical details.