flight_agent by fw2274 - MCP Server

Multi-Agent Voice-Activated Flight Search System

A comprehensive flight search agent with voice input capabilities using Google ADK, Amadeus API, LangGraph, and Model Context Protocol (MCP).

Overview
Quick Start
System Architecture
Voice Integration Setup
Usage Guide
Technical Details
Troubleshooting
Integration Examples

Overview

This project implements a multi-agent system that enables users to search for flights using either voice commands or text input. The application is designed to improve accessibility for users who have difficulties with writing or interacting with keypads, including individuals with motor impairments, visual disabilities, or those who prefer voice interaction.

System Flow

┌─────────────────┐
│  User speaks    │
│  flight query   │
└────────┬────────┘
         │
         ▼
┌─────────────────────────┐
│ Voice-to-Text MCP Server│  ← Rust-based, uses Whisper AI
│ (Rust + Whisper)        │
└────────┬────────────────┘
         │
         ▼ (JSON-RPC)
┌─────────────────────────┐
│ voice_mcp_client.py     │  ← Python MCP client
│ (Python MCP Client)     │
└────────┬────────────────┘
         │
         ▼
┌─────────────────────────┐
│ flight_search_vtt.py    │  ← Enhanced flight search
│ (Interpreter + Executor)│
└────────┬────────────────┘
         │
         ▼
┌─────────────────────────┐
│ LangGraph Flight Search │  ← Existing Amadeus integration
│ (agent_graph.py)        │
└─────────────────────────┘

Key Capabilities

🎤 Voice Input: Speak your flight requirements naturally
💬 Text Input: Traditional text query support
🤖 Dual-Agent System: Interpreter agent + Executor agent
✈️ Real Flight Data: Amadeus API integration
🧠 Smart Parsing: Gemini AI for natural language understanding
🎯 Accurate Results: Structured flight search with IATA codes and ISO dates

Quick Start

Text Input Mode

Prerequisites:

Python 3.10+
Amadeus API credentials (Get them here)
Google API key

Setup:

# 1. Setup environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

# 2. Configure credentials
echo "GOOGLE_API_KEY=your_key_here" >> .env
echo "AMADEUS_API_KEY=your_key_here" >> .env
echo "AMADEUS_API_SECRET=your_secret_here" >> .env

# 3. Run test search
python flight_search.py --query "Find a round-trip flight from ATL to JFK on Dec 02 returning Dec 15 for 2 adults in economy"

Flags: --verbose (stream tool calls), --debug (full timeline)

Voice Input Mode

Additional Prerequisites:

Rust
Microphone
~200MB disk space for Whisper model

Setup:

# 1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

# 2. Build voice-to-text MCP server
cd voice-to-text-mcp
cargo build --release
./scripts/download-models.sh  # Choose ggml-base.en.bin
cd ..

# 3. Verify setup
./test_voice_setup.sh

# 4. Test voice input
python3 flight_search_vtt.py --voice

What to say:

"Find a round trip from Atlanta to New York, December first to December fifteenth, two adults, economy"
"I need a flight from San Francisco to Chicago on January tenth, business class"

System Architecture

Three-Agent Architecture

Agent 1: Voice Recognition Agent

Captures spoken input from users and converts it into text
Technology: Whisper AI via MCP server (Rust-based)
Hardware acceleration: Metal/CoreML (macOS), CUDA (Linux/Windows)

Agent 2: Information Extraction Agent

Processes transcribed text and extracts structured parameters
Model: Gemini 2.5 Flash Lite
Extracts:
- Origin and destination cities/airports (IATA codes)
- Departure and return dates (ISO format)
- Number of passengers (adults, children, infants)
- Cabin class preferences
- Special requirements or preferences

Agent 3: Flight Search Agent

Executes flight search based on structured information
Framework: LangGraph
API: Amadeus (real flight data)
Output: Flight options with prices, times, airlines, duration

Component Details

The Amadeus tooling comes from langgraph_travel_agent (vendored under langgraph_travel_agent/backend). We wrap its agent_graph.py primitives so Google ADK can call them directly:

agent_graph_module.search_flights → exposed to ADK via async search_flights wrapper
agent_graph_module.amadeus client → validated on startup

Files you'll care about:

— main entry; wires Gemini to LangGraph tool (text input only)
— flight search with voice-to-text integration
— Python client for voice-to-text MCP server
— search_flights LangChain tool and Amadeus plumbing
voice-to-text-mcp/ — Rust-based MCP server for speech recognition

Voice Integration Setup

Step 1: Install Rust

# Install Rust via rustup
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Reload shell configuration
source $HOME/.cargo/env

# Verify installation
cargo --version
# Expected: cargo 1.91.1 (or later)

Step 2: Build MCP Server

cd voice-to-text-mcp

# Build release version (takes 2-3 minutes first time)
cargo build --release

# Verify binary was created
ls -lh target/release/voice-to-text-mcp
# Expected: ~4.5MB binary

cd ..

Step 3: Download Whisper Model

Choose the right model for your needs:

Model	Size	Speed	Accuracy	Use Case
ggml-tiny.en.bin	75MB	Very Fast	Good	Testing, prototyping
ggml-base.en.bin	142MB	Fast	Better	General use (recommended) ⭐
ggml-small.en.bin	466MB	Slower	Best	High accuracy needs

Interactive download:

cd voice-to-text-mcp
./scripts/download-models.sh
# Choose: ggml-base.en.bin
cd ..

Manual download:

cd voice-to-text-mcp/models/
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
cd ../..

Step 4: Verify Setup

./test_voice_setup.sh

What it checks:

✓ Rust installation (cargo)
✓ MCP repository presence
✓ Binary build status
✓ Whisper model availability
✓ Python integration files
✓ Environment variables

Step 5: Test Voice Input

# Basic voice input test
python3 flight_search_vtt.py --voice

# Voice with debug output
python3 flight_search_vtt.py --voice --debug --verbose

Expected flow:

🎤 Listening... (max 30s, silence timeout 2s)
Speak your flight requirement
Recording stops after 2 seconds of silence
Transcription appears
Interpreter parses parameters
Executor searches flights
Results displayed

Usage Guide

Command Reference

flight_search_vtt.py options:

--voice                     Enable voice input
--voice-timeout N           Recording timeout in milliseconds (default: 30000)
--voice-silence-timeout N   Silence timeout in milliseconds (default: 2000)
--mcp-server PATH          Path to MCP server binary
--mcp-model PATH           Path to Whisper model
--query TEXT               Use text query instead of voice
--debug                    Show debug timeline
--verbose                  Show tool calls/responses

Voice Input Best Practices

For best transcription results:

Environment: Speak in a quiet room, reduce background noise
Speaking style: Speak clearly at normal pace, use complete sentences
Dates: State dates explicitly ("December fifteenth" not "12/15")
Pauses: Pause briefly between thoughts (silence detection helps)

Example good inputs:

"Find a round trip from Atlanta to New York,
 departing December 1st, returning December 15th,
 for 2 adults in economy"

"I need a flight from San Francisco to Chicago
 on January 10th, one way, business class"

"Search for flights from LAX to JFK,
 leaving next Friday, returning the following Monday"

Timeout Configuration

# Quick commands (10 seconds)
python3 flight_search_vtt.py --voice --voice-timeout 10000 --voice-silence-timeout 1000

# Normal use (30 seconds) - DEFAULT
python3 flight_search_vtt.py --voice --voice-timeout 30000 --voice-silence-timeout 2000

# Detailed descriptions (60 seconds)
python3 flight_search_vtt.py --voice --voice-timeout 60000 --voice-silence-timeout 3000

Example Session

$ python3 flight_search_vtt.py --voice

🎤 Listening... (max 30s, silence timeout 2s)
   Speak your flight requirement now!

# User says: "Find a round trip from Atlanta to New York,
#             departing December 1st, returning December 15th,
#             for 2 adults in economy"

✓ Transcribed: 'Find a round trip from Atlanta to New York,
departing December 1st, returning December 15th,
for 2 adults in economy'

🧭 Interpreter Agent

✓ Interpreter output:
{
  "originLocationCode": "ATL",
  "destinationLocationCode": "JFK",
  "departureDate": "2025-12-01",
  "returnDate": "2025-12-15",
  "adults": 2,
  "travelClass": "ECONOMY"
}

🛠️  Executor Agent

→ Flight search: ATL → JFK
  Departure: 2025-12-01, Return: 2025-12-15, Adults: 2, Class: ECONOMY

✓ Received 3 flight results

📋 FLIGHT SEARCH RESULTS
[Flight options listed here...]

✅ Search complete!

Technical Details

MCP (Model Context Protocol) Architecture

What is MCP?

Language-agnostic communication protocol
JSON-RPC 2.0 based
Enables tools to work across different languages

Communication:

Transport: stdio (stdin/stdout)
Protocol: JSON-RPC 2.0
Tools exposed: listen, transcribe_file

Benefits:

Language-agnostic (Rust server, Python client)
Standardized protocol
Isolated concerns (audio processing separate from business logic)
Reusable components

Voice Processing

Whisper AI Integration:

OpenAI Whisper (speech recognition)
Quantized models (ggml format)
English-only variants (.en suffix)

Hardware Acceleration:

macOS: Metal GPU + CoreML (Apple Neural Engine)
Linux/Windows: CUDA (NVIDIA GPUs)
Fallback: CPU-only

Recording format:

Sample rate: 16kHz
Channels: Mono
Format: WAV (PCM)

Auto-stop logic:

Records up to timeout_ms milliseconds
Stops early if silence_timeout_ms of silence detected
Silence threshold: -30dB

Google ADK Integration

Agent configuration:

# Interpreter Agent
model = "gemini-2.5-flash-lite"
temperature = 0.3  # Low for consistent parsing

# Executor Agent
model = "gemini-2.5-flash-lite"
tools = [search_flights]

LangGraph Integration

Key file:

Enhanced error handling:

except ResponseError as error:
    error_code = getattr(error, 'code', 'UNKNOWN')
    error_description = getattr(error, 'description', str(error))

    # Extract from response body
    if hasattr(error, 'response') and error.response:
        error_body = error.response.body
        if isinstance(error_body, dict):
            errors = error_body.get('errors', [])
            if errors:
                first_error = errors[0]
                error_code = first_error.get('code', error_code)
                error_description = first_error.get('detail', error_description)

Troubleshooting

Setup Issues

"cargo: command not found"

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

"MCP server binary not found"

cd voice-to-text-mcp && cargo build --release

"Whisper model not found"

cd voice-to-text-mcp && ./scripts/download-models.sh

Missing API keys

# Check .env file exists
cat .env

# Verify all keys are set
grep GOOGLE_API_KEY .env
grep AMADEUS_API_KEY .env
grep AMADEUS_API_SECRET .env

Voice Input Issues

"No input device available"

Checklist:

Microphone is connected and working
Microphone is not in use by another application
System has microphone permissions

macOS:

System Settings → Privacy & Security → Microphone
Ensure Terminal has access

Linux:

# Check audio devices
arecord -l

# Test microphone
arecord -d 5 test.wav
aplay test.wav

"Recording cuts off too early"

# Increase silence timeout
python3 flight_search_vtt.py --voice --voice-silence-timeout 5000

# Increase overall timeout
python3 flight_search_vtt.py --voice --voice-timeout 45000

"Poor transcription quality"

Try:

Use a better model (ggml-small.en.bin)
Speak more clearly and slowly
Reduce background noise
Increase silence timeout for longer pauses

Flight Search Issues

"No results returned"