livekit-gemini-mcp-prototype by SaharshPamecha - MCP Server

Voice AI Microservices with LiveKit, Gemini, and MongoDB MCP

A production-ready microservices architecture featuring a real-time voice AI agent powered by LiveKit and Google Gemini, integrated with a Model Context Protocol (MCP) server for MongoDB user data operations.

Architecture Overview

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│                 │     │                 │     │                 │
│  LiveKit Room   │◄───►│  Voice Agent    │◄───►│   MCP Server    │
│   (WebRTC)      │     │ (Gemini + LK)   │     │  (FastAPI)      │
│                 │     │                 │     │                 │
└─────────────────┘     └─────────────────┘     └────────┬────────┘
                                                         │
                                                         ▼
                                                ┌─────────────────┐
                                                │                 │
                                                │    MongoDB      │
                                                │   (Users DB)    │
                                                │                 │
                                                └─────────────────┘

Components

1. MCP Server (`mcp-server/`)

A FastAPI-based server implementing the Model Context Protocol pattern for MongoDB operations.

Features:

JSON-RPC style API for function calls
User CRUD operations
Preference management
Interaction history tracking
Async MongoDB operations with Motor

Exposed Functions:

get_user_by_id - Fetch user by unique ID
get_user_by_email - Fetch user by email
get_user_preferences - Get user settings
update_user_preferences - Update user settings
get_user_history - Get interaction history
log_interaction - Log user interactions
create_user - Create new user
list_users - List all users with pagination

2. Voice Agent (`voice-agent/`)

A LiveKit agent using Google Gemini's real-time multimodal API for voice conversations.

Features:

Real-time voice conversations with Gemini 2.0
Automatic user identification from LiveKit participant identity
MCP function calling for user-specific data
Personalized responses based on user context
Interaction logging

Prerequisites

Docker and Docker Compose
Google Cloud API key with Gemini API access
(Optional) LiveKit Cloud account for production

Quick Start

1. Clone and Configure

cd "CAI with MongoDB MCP"

# Copy environment template
cp .env.example .env

# Edit .env with your credentials
nano .env

2. Set Required Environment Variables

# Required: Google Gemini API Key
GOOGLE_API_KEY=your_google_api_key_here

# Optional: Change MongoDB credentials
MONGO_ROOT_USERNAME=admin
MONGO_ROOT_PASSWORD=your_secure_password

3. Start Services

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Check service health
docker-compose ps

4. Verify Services

# Check MCP Server health
curl http://localhost:8080/health

# List available MCP functions
curl http://localhost:8080/functions

# Test user lookup
curl -X POST http://localhost:8080/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc": "2.0", "method": "get_user_by_id", "params": {"user_id": "user_001"}, "id": "1"}'

Development

Running Services Individually

MCP Server:

cd mcp-server
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Set environment variables
export MONGODB_URI="mongodb://localhost:27017"
export MONGODB_DATABASE="voice_ai_db"

# Run server
uvicorn src.server:app --reload --port 8080

Voice Agent:

cd voice-agent
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Set environment variables
export LIVEKIT_URL="ws://localhost:7880"
export LIVEKIT_API_KEY="devkey"
export LIVEKIT_API_SECRET="secret"
export GOOGLE_API_KEY="your_key"
export MCP_SERVER_URL="http://localhost:8080"

# Run agent
python -m src.main start

Testing MCP Functions

# Create a user
curl -X POST http://localhost:8080/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "create_user",
    "params": {
      "user_id": "test_user",
      "name": "Test User",
      "email": "test@example.com"
    },
    "id": "1"
  }'

# Get user preferences
curl -X POST http://localhost:8080/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "get_user_preferences",
    "params": {"user_id": "user_001"},
    "id": "2"
  }'

# Log an interaction
curl -X POST http://localhost:8080/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "log_interaction",
    "params": {
      "user_id": "user_001",
      "interaction_type": "voice_query",
      "content": "What is my account balance?"
    },
    "id": "3"
  }'

Connecting to the Voice Agent

Using LiveKit Meet (Quick Test)

Go to LiveKit Meet
Enter your LiveKit server URL: ws://localhost:7880
Use API key devkey and secret secret
Join a room - the agent will automatically connect

Programmatic Connection

from livekit import api

# Generate a token for a user
token = api.AccessToken(
    api_key="devkey",
    api_secret="secret"
).with_identity("user_001")  # This ID is used to identify the user
 .with_grants(api.VideoGrants(room_join=True, room="my-room"))
 .to_jwt()

Project Structure

CAI with MongoDB MCP/
├── docker-compose.yml          # Docker orchestration
├── .env.example                # Environment template
├── .gitignore
├── README.md
├── scripts/
│   └── mongo-init.js           # MongoDB initialization
├── mcp-server/
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── .env.example
│   └── src/
│       ├── __init__.py
│       ├── config.py           # Configuration management
│       ├── database.py         # MongoDB connection & repos
│       ├── mcp_functions.py    # MCP function implementations
│       └── server.py           # FastAPI server
└── voice-agent/
    ├── Dockerfile
    ├── requirements.txt
    ├── .env.example
    └── src/
        ├── __init__.py
        ├── config.py           # Configuration management
        ├── mcp_client.py       # MCP server client
        ├── function_handler.py # Function call routing
        ├── tools.py            # LiveKit tool definitions
        ├── agent.py            # Agent implementation
        └── main.py             # Entry point

Configuration Reference

MCP Server Environment Variables

Variable	Description	Default
`MONGODB_URI`	MongoDB connection string	`mongodb://localhost:27017`
`MONGODB_DATABASE`	Database name	`voice_ai_db`
`MONGODB_USERS_COLLECTION`	Users collection name	`users`
`MCP_SERVER_HOST`	Server bind host	`0.0.0.0`
`MCP_SERVER_PORT`	Server port	`8080`
`LOG_LEVEL`	Logging level	`INFO`

Voice Agent Environment Variables

Variable	Description	Default
`LIVEKIT_URL`	LiveKit server URL	`ws://localhost:7880`
`LIVEKIT_API_KEY`	LiveKit API key	-
`LIVEKIT_API_SECRET`	LiveKit API secret	-
`GOOGLE_API_KEY`	Google Gemini API key	-
`GEMINI_MODEL`	Gemini model name	`gemini-2.0-flash-exp`
`GEMINI_VOICE`	Voice for TTS	`Puck`
`GEMINI_TEMPERATURE`	Response temperature	`0.7`
`MCP_SERVER_URL`	MCP server URL	`http://localhost:8080`
`LOG_LEVEL`	Logging level	`INFO`

Production Deployment

Security Considerations

MongoDB: Use strong passwords, enable authentication, consider TLS
LiveKit: Use LiveKit Cloud or secure your self-hosted instance
API Keys: Never commit API keys, use secrets management
Network: Use private networks between services

Scaling

MCP Server: Stateless, can be horizontally scaled
Voice Agent: Scale based on concurrent room requirements
MongoDB: Use replica sets for high availability

Troubleshooting

Common Issues

MCP Server can't connect to MongoDB:

# Check MongoDB is running
docker-compose ps mongodb

# Check MongoDB logs
docker-compose logs mongodb

Voice Agent can't connect to MCP Server:

# Verify MCP server is healthy
curl http://localhost:8080/health

# Check network connectivity
docker-compose exec voice-agent curl http://mcp-server:8080/health

Gemini API errors:

Verify your GOOGLE_API_KEY is valid
Check API quotas in Google Cloud Console
Ensure Gemini API is enabled for your project

License

MIT License - See LICENSE file for details.