NimbleBrainInc/mcp-gemini
If you are the rightful owner of mcp-gemini and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Google Gemini MCP Server provides access to advanced multimodal AI models for text generation, image and video analysis, and more.
Google Gemini MCP Server
MCP server for accessing Google's Gemini multimodal AI models. Generate text, analyze images and videos, process PDFs, create embeddings, and leverage function calling with models that support up to 2 million token context windows.
Features
- Text Generation: Advanced LLMs with up to 2M token context
- Multimodal Analysis: Images, videos, audio, and PDFs
- Chat: Multi-turn conversations with context
- Function Calling: Tool use and structured outputs
- Embeddings: High-quality text embeddings
- Streaming: Real-time response streaming
- JSON Mode: Structured JSON output generation
- Token Counting: Estimate costs before generation
Setup
Prerequisites
- Google Cloud account or Google AI Studio access
- Gemini API key
Environment Variables
GEMINI_API_KEY
(required): Your Google Gemini API key
How to get an API key:
- Go to aistudio.google.com/app/apikey
- Click "Create API key"
- Select a Google Cloud project or create new one
- Copy the API key (starts with
AIza...
) - Store as
GEMINI_API_KEY
Available Models
Text Models
- gemini-1.5-pro-latest - Best quality, 2M token context, multimodal
- gemini-1.5-flash-latest - Fast and efficient, 1M token context
- gemini-2.0-flash-exp - Experimental with latest features
Embedding Models
- text-embedding-004 - 768-dimension text embeddings
Available Tools
Text Generation
generate_text
Generate text with Gemini models.
Parameters:
prompt
(string, required): Input text promptmodel
(string, optional): Model name (default: 'gemini-1.5-flash-latest')temperature
(float, optional): Sampling temperature 0-2top_p
(float, optional): Nucleus sampling 0-1top_k
(int, optional): Top-k samplingmax_output_tokens
(int, optional): Maximum tokens to generatesystem_instruction
(string, optional): System instruction for behavior
Example:
result = await generate_text(
prompt="Explain quantum computing in simple terms",
model="gemini-1.5-pro-latest",
temperature=0.7,
max_output_tokens=500,
system_instruction="You are a helpful science educator"
)
Conversation
chat
Multi-turn conversation with context.
Parameters:
messages
(list, required): List of messages with 'role' ('user'/'model') and 'text'model
(string, optional): Model nametemperature
(float, optional): Sampling temperaturesystem_instruction
(string, optional): System instruction
Example:
result = await chat(
messages=[
{"role": "user", "text": "What is machine learning?"},
{"role": "model", "text": "Machine learning is..."},
{"role": "user", "text": "Can you give an example?"}
],
model="gemini-1.5-flash-latest"
)
Multimodal Analysis
analyze_image
Analyze images with vision capabilities.
Parameters:
prompt
(string, required): Question about the imageimage_base64
(string, required): Base64 encoded imagemime_type
(string, optional): Image MIME type (default: 'image/jpeg')model
(string, optional): Model name
Example:
result = await analyze_image(
prompt="What objects are in this image?",
image_base64="base64_encoded_image_data",
mime_type="image/jpeg"
)
analyze_video
Analyze video content including frames, audio, and transcription.
Parameters:
prompt
(string, required): Question about the videovideo_base64
(string, required): Base64 encoded videomime_type
(string, optional): Video MIME type (default: 'video/mp4')model
(string, optional): Model name (use gemini-1.5-pro-latest)
Example:
result = await analyze_video(
prompt="Summarize the key events in this video",
video_base64="base64_encoded_video_data",
model="gemini-1.5-pro-latest"
)
analyze_pdf
Extract and analyze PDF documents.
Parameters:
prompt
(string, required): Question about the PDFpdf_base64
(string, required): Base64 encoded PDFmodel
(string, optional): Model name
Example:
result = await analyze_pdf(
prompt="Extract all financial data from this document",
pdf_base64="base64_encoded_pdf_data"
)
Utility Tools
count_tokens
Estimate token usage before generation.
Parameters:
text
(string, required): Input textmodel
(string, optional): Model name
Example:
result = await count_tokens(
text="Long document text here...",
model="gemini-1.5-flash-latest"
)
# Returns: {'totalTokens': 1234}
list_models
List available Gemini models.
Example:
models = await list_models()
Advanced Features
generate_with_tools
Function calling and tool use.
Parameters:
prompt
(string, required): Input prompttools
(list, required): List of tool definitionsmodel
(string, optional): Model name
Example:
tools = [{
"function_declarations": [{
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}]
}]
result = await generate_with_tools(
prompt="What's the weather in Paris?",
tools=tools
)
stream_generate
Stream text generation responses.
Parameters:
prompt
(string, required): Input promptmodel
(string, optional): Model nametemperature
(float, optional): Sampling temperature
Example:
result = await stream_generate(
prompt="Write a long story about...",
model="gemini-1.5-flash-latest"
)
generate_json
Generate structured JSON output.
Parameters:
prompt
(string, required): Input promptjson_schema
(dict, required): JSON schema for responsemodel
(string, optional): Model name
Example:
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"},
"skills": {"type": "array", "items": {"type": "string"}}
}
}
result = await generate_json(
prompt="Extract person details: John is 30 and knows Python and SQL",
json_schema=schema
)
embed_text
Generate text embeddings.
Parameters:
text
(string, required): Input textmodel
(string, optional): Embedding model (default: 'text-embedding-004')task_type
(string, optional): Task type
Task types:
RETRIEVAL_DOCUMENT
- For documents in retrievalRETRIEVAL_QUERY
- For search queriesSEMANTIC_SIMILARITY
- For similarity comparisonCLASSIFICATION
- For classification tasksCLUSTERING
- For clustering tasks
Example:
result = await embed_text(
text="This is a sample document",
task_type="RETRIEVAL_DOCUMENT"
)
# Returns 768-dimensional vector
batch_generate
Generate multiple responses in parallel.
Parameters:
prompts
(list, required): List of promptsmodel
(string, optional): Model nametemperature
(float, optional): Sampling temperature
Example:
result = await batch_generate(
prompts=[
"Translate 'hello' to Spanish",
"Translate 'goodbye' to French",
"Translate 'thank you' to German"
]
)
Context Windows
- gemini-1.5-pro: Up to 2 million tokens
- gemini-1.5-flash: Up to 1 million tokens
- gemini-2.0-flash-exp: Up to 1 million tokens
Rate Limits
Free Tier
- 15 requests per minute (RPM)
- 1 million tokens per minute (TPM)
- 1,500 requests per day (RPD)
Paid Tier (Pay-as-you-go)
- 360 RPM (gemini-1.5-pro)
- 2000 RPM (gemini-1.5-flash)
- 10 million TPM
Pricing (as of 2025)
- gemini-1.5-pro: $0.00125 per 1K characters input, $0.00375 per 1K characters output
- gemini-1.5-flash: $0.000125 per 1K characters input, $0.000375 per 1K characters output
- text-embedding-004: $0.00001 per 1K characters
Visit ai.google.dev/pricing for current rates.
Safety Settings
Gemini includes built-in safety filters. You can configure safety settings:
# Add to payload
"safetySettings": [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
]
Categories:
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_SEXUALLY_EXPLICIT
HARM_CATEGORY_DANGEROUS_CONTENT
Thresholds:
BLOCK_NONE
- No blockingBLOCK_LOW_AND_ABOVE
- Block low and aboveBLOCK_MEDIUM_AND_ABOVE
- Block medium and above (default)BLOCK_ONLY_HIGH
- Block only high
Best Practices
- Use appropriate models: Flash for speed, Pro for quality
- Leverage long context: Process entire documents
- Token counting: Estimate costs with count_tokens
- System instructions: Guide model behavior
- Streaming: Use for long responses
- Function calling: Enable tool use
- JSON mode: Get structured outputs
- Batch processing: Process multiple prompts efficiently
Multimodal Capabilities
Supported Formats
Images:
- JPEG, PNG, WebP, HEIC, HEIF
- Max size: 4MB per image
- Up to 16 images per request
Videos:
- MP4, MPEG, MOV, AVI, FLV, MPG, WebM
- Max duration: 2 hours
- Max size: 2GB
Audio:
- WAV, MP3, AIFF, AAC, OGG, FLAC
- Max duration: 9.5 hours
PDFs:
- Up to 3,000 pages
- Text and images extracted
Error Handling
Common errors:
- 400 Bad Request: Invalid parameters or content
- 403 Forbidden: Invalid API key or insufficient permissions
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Gemini service issue
Use Cases
- Content Creation: Blog posts, articles, creative writing
- Code Generation: Programming assistance and debugging
- Document Analysis: Extract insights from PDFs and documents
- Visual Understanding: Image and video analysis
- Chatbots: Conversational AI with context
- Data Extraction: Structured data from unstructured content
- Embeddings: Semantic search and similarity
- Tool Integration: Function calling for external APIs