mcp-youtube-summarizer

Kenneth-Aidan-B/mcp-youtube-summarizer

3.1

If you are the rightful owner of mcp-youtube-summarizer and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

This server is designed to process YouTube videos by downloading, transcribing, summarizing, and converting them to audio for WhatsApp integration.

Tools
1
Resources
0
Prompts
0

MCP YouTube Audio Summarizer Server

A complete Model Context Protocol (MCP) server that downloads YouTube videos, transcribes them with Whisper, summarizes using Google Gemini, converts to speech with TTS, and returns Base64-encoded audio for WhatsApp integration via Puch AI.

🚀 Features

  • YouTube Audio Download - Extract audio from YouTube videos using yt-dlp
  • Speech-to-Text - Transcribe audio using OpenAI Whisper (local, free)
  • AI Summarization - Summarize transcripts using Google Gemini 1.5 Flash (free tier)
  • Text-to-Speech - Convert summaries to natural speech using gTTS
  • MCP Protocol - Full Model Context Protocol compliance
  • HTTPS Server - Secure server with self-signed certificates
  • Base64 Audio - Direct audio embedding for WhatsApp
  • Error Handling - Comprehensive error handling and logging
  • Free Hosting - Works on Railway, Render, Replit (free tiers)

📋 Prerequisites

🛠️ Quick Setup

Windows

# 1. Clone/download the project
git clone [your-repo] or download the files

# 2. Run setup script
setup.bat

# 3. Get Gemini API key from https://makersuite.google.com/
# 4. Edit .env file and add your GEMINI_API_KEY

# 5. Start the server
run_server.bat

Linux/macOS

# 1. Clone/download the project
git clone [your-repo] or download the files

# 2. Make scripts executable and run setup
chmod +x setup.sh run_server.sh
./setup.sh

# 3. Get Gemini API key from https://makersuite.google.com/
# 4. Edit .env file and add your GEMINI_API_KEY

# 5. Start the server
./run_server.sh

🔧 Manual Setup

  1. Install Python Dependencies

    pip install -r requirements.txt
    
  2. Install FFmpeg

    • Windows: winget install FFmpeg or download from https://ffmpeg.org/
    • macOS: brew install ffmpeg
    • Ubuntu/Debian: sudo apt-get install ffmpeg
  3. Configure Environment

    cp .env.template .env
    # Edit .env and add your GEMINI_API_KEY
    
  4. Run Server

    python mcp_server.py
    

🌐 API Endpoints

  • HTTPS Server: https://localhost:8443
  • Health Check: https://localhost:8443/health
  • MCP Endpoint: https://localhost:8443/mcp

📡 MCP Tool Usage

The server exposes one MCP tool: youtube_audio_summary

Example MCP Request

{
  "jsonrpc": "2.0",
  "id": "req-1",
  "method": "tools/call",
  "params": {
    "name": "youtube_audio_summary",
    "arguments": {
      "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    }
  }
}

Example MCP Response

{
  "jsonrpc": "2.0",
  "id": "req-1",
  "result": {
    "content": [
      {
        "type": "text",
        "text": "YouTube video processed successfully. Audio summary generated."
      },
      {
        "type": "resource",
        "resource": {
          "uri": "data:audio/mpeg;base64,SUQzBAAAAAABEFRYWFgAAAAtAAADY29tbWVudABCaWdTb3VuZEJhbmsuY29tIC8gTGFTb25pYSBHU...",
          "mimeType": "audio/mpeg",
          "text": "Generated audio summary"
        }
      }
    ]
  }
}

🔗 Puch AI Integration

1. Configure Puch AI MCP Connection

In your Puch AI configuration, add the MCP server:

{
  "mcp_servers": {
    "youtube_summarizer": {
      "endpoint": "https://your-server-url:8443/mcp",
      "description": "YouTube video summarization service"
    }
  }
}

2. Usage in WhatsApp

Users can send YouTube links to your Puch AI bot, and it will:

  1. Call the youtube_audio_summary tool
  2. Receive the Base64-encoded audio
  3. Send the audio file directly to WhatsApp

Example chat flow:

User: "Summarize this video: https://www.youtube.com/watch?v=dQw4w9WgXcQ"
Bot: [Processes video and sends audio summary]
User: [Receives audio file in WhatsApp]

🚢 Deployment Options

⚠️ Important: Vercel and Netlify are NOT suitable for this project!

See DEPLOYMENT.md for detailed deployment instructions. Quick options:

🎯 Recommended FREE Hosting Platforms:

  1. Railway (BEST - Most generous free tier)

    npm install -g @railway/cli
    railway login && railway init
    railway variables set GEMINI_API_KEY=your_key
    railway up
    
  2. Replit (Always free, simple)

  3. Render (Good for production)

    • Connect GitHub repo
    • Set environment variables
    • Auto-deploy with render.yaml
  4. Heroku (Classic option)

    heroku create your-app
    heroku config:set GEMINI_API_KEY=your_key
    git push heroku main
    

Why not Vercel/Netlify? They have 10-60 second timeouts, but our processing takes 60+ seconds. Plus they don't support FFmpeg or the 1-2GB RAM needed for Whisper AI.

🔍 Testing the Server

Test Health Check

curl -k https://localhost:8443/health

Test MCP Tool List

curl -k -X POST https://localhost:8443/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": "test",
    "method": "tools/list"
  }'

Test YouTube Processing

curl -k -X POST https://localhost:8443/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": "test",
    "method": "tools/call",
    "params": {
      "name": "youtube_audio_summary",
      "arguments": {
        "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
      }
    }
  }'

⚙️ Configuration

Environment Variables (.env)

# Required
GEMINI_API_KEY=your_gemini_api_key_here

# Server Configuration
PORT=8443                    # HTTPS port
HOST=0.0.0.0                 # Listen on all interfaces
SSL_CERT_PATH=./certs/cert.pem
SSL_KEY_PATH=./certs/key.pem

# Optional Settings
TEMP_DIR=./temp              # Temporary file directory
AUDIO_BITRATE=128k           # Output audio bitrate
SAMPLE_RATE=22050            # Audio sample rate

Audio Quality Settings

  • High Quality: AUDIO_BITRATE=192k, SAMPLE_RATE=44100
  • Medium Quality (default): AUDIO_BITRATE=128k, SAMPLE_RATE=22050
  • Low Quality (smaller files): AUDIO_BITRATE=96k, SAMPLE_RATE=16000

🚨 Troubleshooting

Common Issues

  1. "FFmpeg not found"

    • Install FFmpeg: Windows: winget install FFmpeg, macOS: brew install ffmpeg, Linux: sudo apt install ffmpeg
  2. "Gemini API quota exceeded"

  3. "SSL certificate error"

    • Server auto-generates self-signed certificates
    • For production, use valid SSL certificates
  4. "YouTube download failed"

    • Some videos may be restricted
    • Try different videos or check yt-dlp compatibility
  5. "Import errors"

    • Ensure virtual environment is activated
    • Run pip install -r requirements.txt

Logs and Debugging

  • Server logs show detailed processing steps
  • Check /health endpoint for server status
  • Use curl -k to bypass SSL certificate validation during testing

📊 Performance Notes

  • Processing Time: ~30-60 seconds per video (depends on length)
  • Memory Usage: ~500MB-2GB (Whisper model + processing)
  • Storage: Temporary files cleaned automatically
  • Rate Limits: Respect Gemini API free tier limits

🔒 Security Considerations

  • Uses HTTPS with self-signed certificates (auto-generated)
  • API keys stored in environment variables
  • Temporary files cleaned after processing
  • No persistent storage of video content

📝 License

This project is provided as-is for hackathon and educational purposes. Ensure compliance with:

  • YouTube Terms of Service
  • Google Gemini API Terms
  • Whisper License (Apache 2.0)
  • All other service provider terms

🤝 Contributing

This is a hackathon project template. Feel free to:

  • Fork and modify for your needs
  • Add additional TTS providers
  • Implement different summarization strategies
  • Add more audio processing features

📞 Support

For issues during setup or deployment:

  1. Check this README thoroughly
  2. Verify all prerequisites are installed
  3. Check server logs for specific error messages
  4. Ensure API keys are correctly configured

Ready to use with Puch AI on WhatsApp! 🎉