gemini-mcp-server by mnthe - MCP Server

gemini-mcp-server

An intelligent MCP (Model Context Protocol) server that enables AI assistants to query Google AI (Gemini models) via Vertex AI or Google AI Studio with agentic capabilities - automatic tool selection, multi-turn reasoning, MCP-to-MCP delegation, and multimodal input support.

Purpose

This server provides:

Agentic Loop: Turn-based execution with automatic tool selection and reasoning
Query Gemini: Access Gemini models via Vertex AI or Google AI Studio for cross-validation
Multimodal Support: Send images, audio, video, and code files alongside text prompts
Tool Execution: Built-in WebFetch + integration with external MCP servers
Multi-turn Conversations: Maintain context across queries with session management
Reasoning Traces: File-based logging of AI thinking processes

Key Features

🎭 System Prompt Customization

Customize the AI assistant's behavior and persona:

Domain-Specific Roles: Configure as financial analyst, code reviewer, research assistant, etc.
Environment-Based: Set via GEMINI_SYSTEM_PROMPT environment variable
Multi-Persona Support: Run multiple servers with different personas
100% Backward Compatible: Optional feature - works normally without customization
See for detailed guide and for templates

🎨 Multimodal Input Support

Send images, audio, video, and code files to Gemini:

Images: JPEG, PNG, WebP, HEIC
Videos: MP4, MOV, AVI, WebM, and more
Audio: MP3, WAV, AAC, FLAC, and more
Documents/Code: PDF, text files, code files (Python, JavaScript, etc.)
Support for both base64-encoded inline data and Cloud Storage URIs
See for detailed documentation

🤖 Intelligent Agentic Loop

Inspired by OpenAI Agents SDK, the server operates as an autonomous agent:

Turn-based execution (up to 10 turns per query)
Automatic tool selection based on LLM decisions
Parallel tool execution with retry logic
Smart fallback to Gemini knowledge when tools fail

🛠️ Built-in Tools

WebFetch: Secure HTTPS-only web content fetching with private IP blocking
MCP Integration: Dynamic discovery and execution of external MCP server tools

🔐 Security First

Multi-Layer Defense:

SSRF Protection: HTTPS-only URL fetching, private IP blocking (10.x, 172.16.x, 192.168.x, 127.x, 169.254.x), cloud metadata endpoint blocking (AWS, GCP, Azure)
Prompt Injection Guardrails: External content tagging, trust boundaries, system prompt hardening
File Security: MIME type validation, executable file rejection, path traversal prevention, directory whitelist
Redirect Validation: Manual redirect handling with security checks, maximum 5 redirects, cross-domain blocking
Content Boundaries: 50KB size limits, external content wrapping with security tags

Comprehensive Testing: 69 security-focused tests covering SSRF, path traversal, MIME validation, and prompt injection.

See for detailed security documentation and best practices.

📝 Observability

File-based logging (logs/general.log, logs/reasoning.log)
Configurable log directory or disable logging for npx/containerized environments
Detailed execution traces for debugging
Turn and tool usage statistics

Prerequisites

Node.js 18 or higher
Google Cloud Platform account (for Vertex AI) OR Google AI Studio account
Google Cloud credentials configured (for Vertex AI mode)

Quick Start

Installation

Option 1: npx (Recommended)

npx -y github:mnthe/gemini-mcp-server

Option 2: From Source

git clone https://github.com/mnthe/gemini-mcp-server.git
cd gemini-mcp-server
npm install
npm run build

Authentication

The gen-ai SDK supports multiple authentication methods. For Vertex AI mode:

Application Default Credentials (Recommended):

gcloud auth application-default login

Or use Service Account:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

For Google AI Studio mode, see the gen-ai SDK documentation.

Configuration

Required Environment Variables:

export GOOGLE_CLOUD_PROJECT="your-gcp-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1"

Optional Model Settings:

export GEMINI_MODEL="gemini-2.5-pro"
export GEMINI_TEMPERATURE="1.0"
export GEMINI_MAX_TOKENS="8192"
export GEMINI_TOP_P="0.95"
export GEMINI_TOP_K="40"

Optional Agentic Features:

# System prompt customization
export GEMINI_SYSTEM_PROMPT="You are a specialized financial analyst AI assistant. You have access to the following tools:"

# Multi-turn conversations
export GEMINI_ENABLE_CONVERSATIONS="true"
export GEMINI_SESSION_TIMEOUT="3600"
export GEMINI_MAX_HISTORY="10"

# Logging configuration
# Default: Console logging to stderr (recommended for npx/MCP usage)
export GEMINI_LOG_TO_STDERR="true"         # Default: true (console logging)

# For file-based logging instead:
export GEMINI_LOG_TO_STDERR="false"        # Disable console, use file logging
export GEMINI_LOG_DIR="./logs"             # Log directory (default: ./logs)

# To disable logging completely:
export GEMINI_DISABLE_LOGGING="true"

# File URI support (for CLI environments only)
export GEMINI_ALLOW_FILE_URIS="true"       # Set to 'true' to allow file:// URIs (CLI tools only, NOT for desktop apps)

# External MCP servers (for tool delegation)
export GEMINI_MCP_SERVERS='[
  {
    "name": "filesystem",
    "transport": "stdio",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "./data"]
  },
  {
    "name": "web-search",
    "transport": "http",
    "url": "http://localhost:3000/mcp"
  }
]'

MCP Client Integration

Add to your MCP client configuration:

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "gemini": {
      "command": "npx",
      "args": ["-y", "github:mnthe/gemini-mcp-server"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-gcp-project-id",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "GEMINI_MODEL": "gemini-2.5-pro",
        "GEMINI_ENABLE_CONVERSATIONS": "true"
      }
    }
  }
}

Claude Code (.claude.json in project root):

{
  "mcpServers": {
    "gemini": {
      "command": "npx",
      "args": ["-y", "github:mnthe/gemini-mcp-server"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-gcp-project-id",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "GEMINI_MODEL": "gemini-2.5-pro"
      }
    }
  }
}

Other MCP Clients (Generic stdio):

# Command to run
npx -y github:mnthe/gemini-mcp-server

# Or direct execution
node /path/to/gemini-mcp-server/build/index.js

Multi-Persona Setup

You can run multiple Gemini servers with different personas for specialized tasks:

{
  "mcpServers": {
    "gemini-code": {
      "command": "npx",
      "args": ["-y", "github:mnthe/gemini-mcp-server"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-project-id",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "GEMINI_SYSTEM_PROMPT": "You are a code review specialist. Focus on code quality, security, and best practices. You have access to the following tools:"
      }
    },
    "gemini-research": {
      "command": "npx",
      "args": ["-y", "github:mnthe/gemini-mcp-server"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-project-id",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "GEMINI_SYSTEM_PROMPT": "You are an academic research assistant. Cite sources and provide comprehensive analysis. You have access to the following tools:"
      }
    }
  }
}

See for comprehensive guide and for ready-to-use templates.

Available Tools

query

Main agentic entrypoint that handles multi-turn execution with automatic tool selection and multimodal input support.

Parameters:

prompt (string, required): The text prompt to send
sessionId (string, optional): Conversation session ID
parts (array, optional): Multimodal content parts (images, audio, video, documents)

How It Works:

Analyzes the prompt and conversation history (including multimodal content)
Decides whether to use tools or respond directly
Executes tools in parallel if needed (WebFetch, MCP tools)
Retries failed tools with exponential backoff
Falls back to Gemini knowledge if tools fail
Continues for up to 10 turns until final answer

Examples:

# Simple text query
query: "What is the capital of France?"

# Complex query with tool usage
query: "Fetch the latest news from https://example.com/news and summarize"
→ Automatically uses WebFetch tool
→ Synthesizes content into answer

# Image analysis (multimodal)
query: "What's in this image?"
parts: [{ inlineData: { mimeType: "image/jpeg", data: "<base64>" } }]

# Multi-turn conversation
query: "What is machine learning?" (sessionId auto-created)
query: "Give me an example" (uses sessionId from previous response)

Multimodal Support: See for detailed documentation on:

Parts array structure and field requirements (for agent developers)
Supported file types (images, audio, video, documents)
Base64 inline data vs Cloud Storage URIs
Complete schema and validation rules
Usage examples and code samples
Best practices and limitations
Common mistakes to avoid

Response Includes:

Final answer
Session ID (if conversations enabled)
Statistics: turns used, tool calls, reasoning steps

search

Search for information using Gemini (OpenAI MCP spec).

Parameters:

query (string, required): Search query

Returns:

results: Array of {id, title, url}

fetch

Fetch full content of a search result (OpenAI MCP spec).

Parameters:

id (string, required): Document ID from search results

Returns:

id, title, text, url, metadata

Security

The gemini-mcp-server implements comprehensive security measures to protect against common vulnerabilities. See for complete documentation.

Defense Layers

1. SSRF (Server-Side Request Forgery) Protection

HTTPS-only: HTTP requests are blocked; only HTTPS is allowed for web resources
Private IP blocking: Blocks access to internal networks (10.x, 172.16.x, 192.168.x, 127.x, 169.254.x)
Cloud metadata blocking: Prevents access to AWS, GCP, Azure, and Alibaba Cloud metadata endpoints
Redirect validation: All redirects are manually validated; cross-domain redirects are blocked

2. Prompt Injection Guardrails

Trust boundaries: Clear separation between user input (trusted) and external content (untrusted)
Content tagging: All fetched web content is wrapped in <external_content> tags with security warnings
System prompt hardening: Built-in instructions to ignore malicious commands in external content
Information disclosure protection: Guidelines prevent revealing system prompts or internal details

3. File Security (Multimodal Content)

MIME type validation: Only known safe types (images, video, audio, PDF, code) are allowed
Executable rejection: Blocks .exe, .sh, .dll, and other executable file types
Path traversal prevention: All paths are normalized and validated against a whitelist
Directory whitelist: Local files only allowed in safe directories (cwd, Documents, Downloads, Desktop)
URI scheme validation: Only gs://, https://, and conditionally file:// URIs are allowed

4. Content Boundaries

Size limits: Web content limited to 50KB to prevent resource exhaustion
Content type validation: Basic validation of response content types
Encoding validation: Proper handling of character encodings

Configuration

File Security (Multimodal)

# Default: false (secure) - file:// URIs are disabled
export GEMINI_ALLOW_FILE_URIS="false"

# For CLI environments only - enables local file:// URIs with whitelist validation
export GEMINI_ALLOW_FILE_URIS="true"

Security Note: Never enable GEMINI_ALLOW_FILE_URIS in production or web-facing applications. It's designed for trusted CLI environments only.

Security Monitoring

# Enable logging to monitor security events
export GEMINI_DISABLE_LOGGING="false"
export GEMINI_LOG_DIR="/var/log/gemini-mcp"

# Log to stderr for real-time monitoring
export GEMINI_LOG_TO_STDERR="true"

Best Practices

For Desktop Applications (Recommended)

{
  "mcpServers": {
    "gemini": {
      "env": {
        "GEMINI_ALLOW_FILE_URIS": "false"
      }
    }
  }
}

For CLI Tools (Use with Caution)

export GEMINI_ALLOW_FILE_URIS="true"
export GEMINI_LOG_TO_STDERR="true"

Security Testing

Run comprehensive security test suite:

# All security tests
npx tsx test/url-security-test.ts        # 21 tests - SSRF protection
npx tsx test/file-security-test.ts       # 34 tests - File validation
npx tsx test/webfetch-security-test.ts   # 5 tests - Content tagging
npx tsx test/security-guidelines-test.ts # 3 tests - Prompt injection
npx tsx test/multimodal-security-test.ts # 6 tests - Multimodal files

Total: 69 security-focused tests covering SSRF, path traversal, MIME validation, and prompt injection.

For detailed security information, threat models, and vulnerability reporting, see .

Architecture

Agentic Loop

User Query
  ↓
┌─── Turn 1..10 Loop ───┐
│                        │
│  1. Build Prompt       │
│     + Tool Definitions │
│     + History          │
│                        │
│  2. Gemini Generation  │
│     (with thinking)    │
│                        │
│  3. Parse Response     │
│     - Reasoning?       │
│     - Tool Calls?      │
│     - Final Output?    │
│                        │
│  4. Execute Tools      │
│     (parallel + retry) │
│                        │
│  5. Check MaxTurns     │
│     Continue or Exit?  │
│                        │
└────────────────────────┘
  ↓
Final Result + Stats

Project Structure

src/
├── agentic/           # Core agentic loop
│   ├── AgenticLoop.ts       # Main orchestrator
│   ├── RunState.ts          # Turn-based state management
│   ├── ResponseProcessor.ts # Parse Gemini responses
│   └── Tool.ts              # Tool interface (MCP standard)
│
├── mcp/               # MCP client implementation
│   ├── EnhancedMCPClient.ts # Unified stdio + HTTP client
│   ├── StdioMCPConnection.ts
│   └── HttpMCPConnection.ts
│
├── tools/             # Tool implementations
│   ├── WebFetchTool.ts      # Secure web fetching
│   └── ToolRegistry.ts      # Tool management + parallel execution
│
├── services/          # External services
│   └── GeminiAIService.ts   # Gemini API (with thinkingConfig)
│
├── handlers/          # MCP tool handlers
│   ├── QueryHandler.ts
│   ├── SearchHandler.ts
│   └── FetchHandler.ts
│
├── managers/          # Business logic
│   └── ConversationManager.ts
│
├── errors/            # Custom error types
├── types/             # TypeScript type definitions
├── schemas/           # Zod validation schemas
├── config/            # Configuration loading
├── utils/             # Shared utilities (Logger, security)
│
└── server/            # MCP server bootstrap
    └── GeminiAIMCPServer.ts

See and for details.

Advanced Usage

External MCP Servers

Connect to external MCP servers for extended capabilities:

Stdio (subprocess):

export GEMINI_MCP_SERVERS='[
  {
    "name": "filesystem",
    "transport": "stdio",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "./workspace"]
  }
]'

HTTP:

export GEMINI_MCP_SERVERS='[
  {
    "name": "api-server",
    "transport": "http",
    "url": "https://api.example.com/mcp",
    "headers": {"Authorization": "Bearer token"}
  }
]'

Tools from external servers are automatically discovered and made available to the agent.

Reasoning Traces

Default: Console Logging

Logs are sent to stderr by default, making them visible in MCP client logs.

For File-Based Logging:

export GEMINI_LOG_TO_STDERR="false"        # Disable console, use files
export GEMINI_LOG_DIR="./logs"             # Log directory (default: ./logs)

Then check logs:

tail -f logs/general.log     # All logs
tail -f logs/reasoning.log   # Gemini thinking process only

To Disable All Logging:

export GEMINI_DISABLE_LOGGING="true"

Custom Tool Development

Tools follow MCP standard:

import { BaseTool, ToolResult, RunContext } from './agentic/Tool.js';

export class MyTool extends BaseTool {
  name = 'my_tool';
  description = 'Description for LLM';
  parameters = {
    type: 'object',
    properties: {
      arg: { type: 'string', description: 'Argument' }
    },
    required: ['arg']
  };

  async execute(args: any, context: RunContext): Promise<ToolResult> {
    // Your implementation
    return {
      status: 'success',
      content: 'Result'
    };
  }
}

Development

Build

npm run build

Watch Mode

npm run watch

Development Mode

npm run dev

Troubleshooting

MCP Server Connection Issues

If the MCP server appears to be "dead" or disconnects unexpectedly:

Check MCP client logs (logs are sent to stderr by default):

macOS: ~/Library/Logs/Claude/mcp*.log
Windows: %APPDATA%\Claude\Logs\mcp*.log

Server logs will appear in these files automatically.

Log Directory Errors

If you encounter errors like ENOENT: no such file or directory, mkdir './logs':

This should not happen with default settings (console logging is default).

If you enabled file logging (GEMINI_LOG_TO_STDERR="false"):

Solution: Use a writable log directory:

{
  "mcpServers": {
    "gemini": {
      "command": "npx",
      "args": ["-y", "github:mnthe/gemini-mcp-server"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-project-id",
        "GEMINI_LOG_TO_STDERR": "false",
        "GEMINI_LOG_DIR": "/tmp/gemini-logs"
      }
    }
  }
}

Authentication Errors

Verify credentials: gcloud auth application-default login
Check project ID: echo $GOOGLE_CLOUD_PROJECT
Enable Vertex AI API: gcloud services enable aiplatform.googleapis.com

Tool Execution Failures

Check logs in logs/general.log (if logging is enabled)
Verify MCP server configurations in GEMINI_MCP_SERVERS
Ensure external servers are running (for HTTP transport)

MaxTurns Exceeded

Agent returns best-effort response after 10 turns
Check if tools are repeatedly failing
Review reasoning logs to understand loop behavior (if logging is enabled)

Documentation

- Security documentation and best practices
- System architecture and agentic loop design
- Code organization
- Implementation details
- Build and release process
- Multimodal content guide
- System prompt customization
- Contribution guidelines