Kenneth-Aidan-B/mcp-youtube-summarizer
If you are the rightful owner of mcp-youtube-summarizer and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
This server is designed to process YouTube videos by downloading, transcribing, summarizing, and converting them to audio for WhatsApp integration.
MCP YouTube Audio Summarizer Server
A complete Model Context Protocol (MCP) server that downloads YouTube videos, transcribes them with Whisper, summarizes using Google Gemini, converts to speech with TTS, and returns Base64-encoded audio for WhatsApp integration via Puch AI.
🚀 Features
- ✅ YouTube Audio Download - Extract audio from YouTube videos using yt-dlp
- ✅ Speech-to-Text - Transcribe audio using OpenAI Whisper (local, free)
- ✅ AI Summarization - Summarize transcripts using Google Gemini 1.5 Flash (free tier)
- ✅ Text-to-Speech - Convert summaries to natural speech using gTTS
- ✅ MCP Protocol - Full Model Context Protocol compliance
- ✅ HTTPS Server - Secure server with self-signed certificates
- ✅ Base64 Audio - Direct audio embedding for WhatsApp
- ✅ Error Handling - Comprehensive error handling and logging
- ✅ Free Hosting - Works on Railway, Render, Replit (free tiers)
📋 Prerequisites
- Python 3.8+
- FFmpeg (for audio processing)
- Google Gemini API Key (free at https://makersuite.google.com/)
🛠️ Quick Setup
Windows
# 1. Clone/download the project
git clone [your-repo] or download the files
# 2. Run setup script
setup.bat
# 3. Get Gemini API key from https://makersuite.google.com/
# 4. Edit .env file and add your GEMINI_API_KEY
# 5. Start the server
run_server.bat
Linux/macOS
# 1. Clone/download the project
git clone [your-repo] or download the files
# 2. Make scripts executable and run setup
chmod +x setup.sh run_server.sh
./setup.sh
# 3. Get Gemini API key from https://makersuite.google.com/
# 4. Edit .env file and add your GEMINI_API_KEY
# 5. Start the server
./run_server.sh
🔧 Manual Setup
-
Install Python Dependencies
pip install -r requirements.txt -
Install FFmpeg
- Windows:
winget install FFmpegor download from https://ffmpeg.org/ - macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt-get install ffmpeg
- Windows:
-
Configure Environment
cp .env.template .env # Edit .env and add your GEMINI_API_KEY -
Run Server
python mcp_server.py
🌐 API Endpoints
- HTTPS Server:
https://localhost:8443 - Health Check:
https://localhost:8443/health - MCP Endpoint:
https://localhost:8443/mcp
📡 MCP Tool Usage
The server exposes one MCP tool: youtube_audio_summary
Example MCP Request
{
"jsonrpc": "2.0",
"id": "req-1",
"method": "tools/call",
"params": {
"name": "youtube_audio_summary",
"arguments": {
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}
}
}
Example MCP Response
{
"jsonrpc": "2.0",
"id": "req-1",
"result": {
"content": [
{
"type": "text",
"text": "YouTube video processed successfully. Audio summary generated."
},
{
"type": "resource",
"resource": {
"uri": "data:audio/mpeg;base64,SUQzBAAAAAABEFRYWFgAAAAtAAADY29tbWVudABCaWdTb3VuZEJhbmsuY29tIC8gTGFTb25pYSBHU...",
"mimeType": "audio/mpeg",
"text": "Generated audio summary"
}
}
]
}
}
🔗 Puch AI Integration
1. Configure Puch AI MCP Connection
In your Puch AI configuration, add the MCP server:
{
"mcp_servers": {
"youtube_summarizer": {
"endpoint": "https://your-server-url:8443/mcp",
"description": "YouTube video summarization service"
}
}
}
2. Usage in WhatsApp
Users can send YouTube links to your Puch AI bot, and it will:
- Call the
youtube_audio_summarytool - Receive the Base64-encoded audio
- Send the audio file directly to WhatsApp
Example chat flow:
User: "Summarize this video: https://www.youtube.com/watch?v=dQw4w9WgXcQ"
Bot: [Processes video and sends audio summary]
User: [Receives audio file in WhatsApp]
🚢 Deployment Options
⚠️ Important: Vercel and Netlify are NOT suitable for this project!
See DEPLOYMENT.md for detailed deployment instructions. Quick options:
🎯 Recommended FREE Hosting Platforms:
-
Railway (BEST - Most generous free tier)
npm install -g @railway/cli railway login && railway init railway variables set GEMINI_API_KEY=your_key railway up -
Replit (Always free, simple)
- Upload to https://replit.com/
- Add
GEMINI_API_KEYin Secrets - Run:
python mcp_server.py
-
Render (Good for production)
- Connect GitHub repo
- Set environment variables
- Auto-deploy with
render.yaml
-
Heroku (Classic option)
heroku create your-app heroku config:set GEMINI_API_KEY=your_key git push heroku main
Why not Vercel/Netlify? They have 10-60 second timeouts, but our processing takes 60+ seconds. Plus they don't support FFmpeg or the 1-2GB RAM needed for Whisper AI.
🔍 Testing the Server
Test Health Check
curl -k https://localhost:8443/health
Test MCP Tool List
curl -k -X POST https://localhost:8443/mcp \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": "test",
"method": "tools/list"
}'
Test YouTube Processing
curl -k -X POST https://localhost:8443/mcp \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": "test",
"method": "tools/call",
"params": {
"name": "youtube_audio_summary",
"arguments": {
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}
}
}'
⚙️ Configuration
Environment Variables (.env)
# Required
GEMINI_API_KEY=your_gemini_api_key_here
# Server Configuration
PORT=8443 # HTTPS port
HOST=0.0.0.0 # Listen on all interfaces
SSL_CERT_PATH=./certs/cert.pem
SSL_KEY_PATH=./certs/key.pem
# Optional Settings
TEMP_DIR=./temp # Temporary file directory
AUDIO_BITRATE=128k # Output audio bitrate
SAMPLE_RATE=22050 # Audio sample rate
Audio Quality Settings
- High Quality:
AUDIO_BITRATE=192k,SAMPLE_RATE=44100 - Medium Quality (default):
AUDIO_BITRATE=128k,SAMPLE_RATE=22050 - Low Quality (smaller files):
AUDIO_BITRATE=96k,SAMPLE_RATE=16000
🚨 Troubleshooting
Common Issues
-
"FFmpeg not found"
- Install FFmpeg: Windows:
winget install FFmpeg, macOS:brew install ffmpeg, Linux:sudo apt install ffmpeg
- Install FFmpeg: Windows:
-
"Gemini API quota exceeded"
- Check your API key at https://makersuite.google.com/
- Ensure you're within free tier limits
-
"SSL certificate error"
- Server auto-generates self-signed certificates
- For production, use valid SSL certificates
-
"YouTube download failed"
- Some videos may be restricted
- Try different videos or check yt-dlp compatibility
-
"Import errors"
- Ensure virtual environment is activated
- Run
pip install -r requirements.txt
Logs and Debugging
- Server logs show detailed processing steps
- Check
/healthendpoint for server status - Use
curl -kto bypass SSL certificate validation during testing
📊 Performance Notes
- Processing Time: ~30-60 seconds per video (depends on length)
- Memory Usage: ~500MB-2GB (Whisper model + processing)
- Storage: Temporary files cleaned automatically
- Rate Limits: Respect Gemini API free tier limits
🔒 Security Considerations
- Uses HTTPS with self-signed certificates (auto-generated)
- API keys stored in environment variables
- Temporary files cleaned after processing
- No persistent storage of video content
📝 License
This project is provided as-is for hackathon and educational purposes. Ensure compliance with:
- YouTube Terms of Service
- Google Gemini API Terms
- Whisper License (Apache 2.0)
- All other service provider terms
🤝 Contributing
This is a hackathon project template. Feel free to:
- Fork and modify for your needs
- Add additional TTS providers
- Implement different summarization strategies
- Add more audio processing features
📞 Support
For issues during setup or deployment:
- Check this README thoroughly
- Verify all prerequisites are installed
- Check server logs for specific error messages
- Ensure API keys are correctly configured
Ready to use with Puch AI on WhatsApp! 🎉