mcp-youtube-summarizer by Kenneth-Aidan-B - MCP Server

MCP YouTube Audio Summarizer Server

A complete Model Context Protocol (MCP) server that downloads YouTube videos, transcribes them with Whisper, summarizes using Google Gemini, converts to speech with TTS, and returns Base64-encoded audio for WhatsApp integration via Puch AI.

🚀 Features

✅ YouTube Audio Download - Extract audio from YouTube videos using yt-dlp
✅ Speech-to-Text - Transcribe audio using OpenAI Whisper (local, free)
✅ AI Summarization - Summarize transcripts using Google Gemini 1.5 Flash (free tier)
✅ Text-to-Speech - Convert summaries to natural speech using gTTS
✅ MCP Protocol - Full Model Context Protocol compliance
✅ HTTPS Server - Secure server with self-signed certificates
✅ Base64 Audio - Direct audio embedding for WhatsApp
✅ Error Handling - Comprehensive error handling and logging
✅ Free Hosting - Works on Railway, Render, Replit (free tiers)

📋 Prerequisites

Python 3.8+
FFmpeg (for audio processing)
Google Gemini API Key (free at https://makersuite.google.com/)

🛠️ Quick Setup

Windows

# 1. Clone/download the project
git clone [your-repo] or download the files

# 2. Run setup script
setup.bat

# 3. Get Gemini API key from https://makersuite.google.com/
# 4. Edit .env file and add your GEMINI_API_KEY

# 5. Start the server
run_server.bat

Linux/macOS

# 1. Clone/download the project
git clone [your-repo] or download the files

# 2. Make scripts executable and run setup
chmod +x setup.sh run_server.sh
./setup.sh

# 3. Get Gemini API key from https://makersuite.google.com/
# 4. Edit .env file and add your GEMINI_API_KEY

# 5. Start the server
./run_server.sh

🔧 Manual Setup

Install Python Dependencies
```
pip install -r requirements.txt
```
Install FFmpeg
- Windows: winget install FFmpeg or download from https://ffmpeg.org/
- macOS: brew install ffmpeg
- Ubuntu/Debian: sudo apt-get install ffmpeg

Configure Environment

cp .env.template .env
# Edit .env and add your GEMINI_API_KEY

Run Server
```
python mcp_server.py
```

🌐 API Endpoints

HTTPS Server: https://localhost:8443
Health Check: https://localhost:8443/health
MCP Endpoint: https://localhost:8443/mcp

📡 MCP Tool Usage

The server exposes one MCP tool: youtube_audio_summary

Example MCP Request

{
  "jsonrpc": "2.0",
  "id": "req-1",
  "method": "tools/call",
  "params": {
    "name": "youtube_audio_summary",
    "arguments": {
      "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    }
  }
}

Example MCP Response

{
  "jsonrpc": "2.0",
  "id": "req-1",
  "result": {
    "content": [
      {
        "type": "text",
        "text": "YouTube video processed successfully. Audio summary generated."
      },
      {
        "type": "resource",
        "resource": {
          "uri": "data:audio/mpeg;base64,SUQzBAAAAAABEFRYWFgAAAAtAAADY29tbWVudABCaWdTb3VuZEJhbmsuY29tIC8gTGFTb25pYSBHU...",
          "mimeType": "audio/mpeg",
          "text": "Generated audio summary"
        }
      }
    ]
  }
}

🔗 Puch AI Integration

1. Configure Puch AI MCP Connection

In your Puch AI configuration, add the MCP server:

{
  "mcp_servers": {
    "youtube_summarizer": {
      "endpoint": "https://your-server-url:8443/mcp",
      "description": "YouTube video summarization service"
    }
  }
}

2. Usage in WhatsApp

Users can send YouTube links to your Puch AI bot, and it will:

Call the youtube_audio_summary tool
Receive the Base64-encoded audio
Send the audio file directly to WhatsApp

Example chat flow:

User: "Summarize this video: https://www.youtube.com/watch?v=dQw4w9WgXcQ"
Bot: [Processes video and sends audio summary]
User: [Receives audio file in WhatsApp]

🚢 Deployment Options

⚠️ Important: Vercel and Netlify are NOT suitable for this project!

See DEPLOYMENT.md for detailed deployment instructions. Quick options:

🎯 Recommended FREE Hosting Platforms:

Railway (BEST - Most generous free tier)

npm install -g @railway/cli
railway login && railway init
railway variables set GEMINI_API_KEY=your_key
railway up

Replit (Always free, simple)
- Upload to https://replit.com/
- Add GEMINI_API_KEY in Secrets
- Run: python mcp_server.py
Render (Good for production)
- Connect GitHub repo
- Set environment variables
- Auto-deploy with render.yaml

Heroku (Classic option)

heroku create your-app
heroku config:set GEMINI_API_KEY=your_key
git push heroku main

Why not Vercel/Netlify? They have 10-60 second timeouts, but our processing takes 60+ seconds. Plus they don't support FFmpeg or the 1-2GB RAM needed for Whisper AI.

🔍 Testing the Server

Test Health Check

curl -k https://localhost:8443/health

Test MCP Tool List

curl -k -X POST https://localhost:8443/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": "test",
    "method": "tools/list"
  }'

Test YouTube Processing

curl -k -X POST https://localhost:8443/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": "test",
    "method": "tools/call",
    "params": {
      "name": "youtube_audio_summary",
      "arguments": {
        "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
      }
    }
  }'

⚙️ Configuration

Environment Variables (.env)

# Required
GEMINI_API_KEY=your_gemini_api_key_here

# Server Configuration
PORT=8443                    # HTTPS port
HOST=0.0.0.0                 # Listen on all interfaces
SSL_CERT_PATH=./certs/cert.pem
SSL_KEY_PATH=./certs/key.pem

# Optional Settings
TEMP_DIR=./temp              # Temporary file directory
AUDIO_BITRATE=128k           # Output audio bitrate
SAMPLE_RATE=22050            # Audio sample rate

Audio Quality Settings

High Quality: AUDIO_BITRATE=192k, SAMPLE_RATE=44100
Medium Quality (default): AUDIO_BITRATE=128k, SAMPLE_RATE=22050
Low Quality (smaller files): AUDIO_BITRATE=96k, SAMPLE_RATE=16000

🚨 Troubleshooting

Common Issues

"FFmpeg not found"
- Install FFmpeg: Windows: winget install FFmpeg, macOS: brew install ffmpeg, Linux: sudo apt install ffmpeg
"Gemini API quota exceeded"
- Check your API key at https://makersuite.google.com/
- Ensure you're within free tier limits
"SSL certificate error"
- Server auto-generates self-signed certificates
- For production, use valid SSL certificates
"YouTube download failed"
- Some videos may be restricted
- Try different videos or check yt-dlp compatibility
"Import errors"
- Ensure virtual environment is activated
- Run pip install -r requirements.txt

Logs and Debugging

Server logs show detailed processing steps
Check /health endpoint for server status
Use curl -k to bypass SSL certificate validation during testing

📊 Performance Notes

Processing Time: ~30-60 seconds per video (depends on length)
Memory Usage: ~500MB-2GB (Whisper model + processing)
Storage: Temporary files cleaned automatically
Rate Limits: Respect Gemini API free tier limits

🔒 Security Considerations

Uses HTTPS with self-signed certificates (auto-generated)
API keys stored in environment variables
Temporary files cleaned after processing
No persistent storage of video content

📝 License

This project is provided as-is for hackathon and educational purposes. Ensure compliance with:

YouTube Terms of Service
Google Gemini API Terms
Whisper License (Apache 2.0)
All other service provider terms

🤝 Contributing

This is a hackathon project template. Feel free to:

Fork and modify for your needs
Add additional TTS providers
Implement different summarization strategies
Add more audio processing features

📞 Support

For issues during setup or deployment:

Check this README thoroughly
Verify all prerequisites are installed
Check server logs for specific error messages
Ensure API keys are correctly configured

Ready to use with Puch AI on WhatsApp! 🎉