summarize-mcp

FiveOhhWon/summarize-mcp

3.2

If you are the rightful owner of summarize-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

summarize-mcp is a Model Context Protocol server that converts text summaries to speech using OpenAI's TTS API, enabling audio playback across major platforms.

Tools
  1. play_summary

    Converts text to speech and plays it in the background.

  2. set_voice

    Sets the default voice for text-to-speech conversions.

  3. set_tone

    Sets the default tone for how text should be spoken.

summarize-mcp

๐Ÿค– Co-authored with Claude Code - Making AI summaries audible since 2025! ๐Ÿ”Š

A Model Context Protocol (MCP) server that converts text summaries to speech using OpenAI's TTS API and plays them in the background across all major platforms (macOS, Windows, Linux).

๐ŸŒŸ Overview

summarize-mcp enables LLMs to convert any text summary into natural-sounding speech using OpenAI's state-of-the-art text-to-speech models. Perfect for creating audio summaries of documents, articles, or any content that benefits from an auditory presentation.

๐Ÿš€ Key Features

  • ๐ŸŽฏ Simple & Focused: One tool that does one thing exceptionally well
  • ๐ŸŽค Multiple Voices: Choose from 10 distinct OpenAI voices (alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer)
  • ๐ŸŽจ Custom Instructions: Control how the text should be spoken
  • ๐Ÿ”ง Background Playback: Audio plays in the background without blocking
  • ๐ŸŒ Cross-Platform: Works on macOS, Windows, and Linux
  • ๐Ÿ’พ Persistent Preferences: Save your favorite voice and tone settings
  • ๐ŸŽฏ Multiple Tools: Set voice, set tone, and play summaries
  • ๐Ÿงน Automatic Cleanup: Temporary files are cleaned up automatically
  • ๐Ÿ›ก๏ธ Type-Safe: Full Python type hints with Pydantic validation
  • ๐Ÿ“Š Comprehensive Logging: Debug mode for troubleshooting
  • โšก Performance Optimized: Efficient file handling and cleanup

๐Ÿ“‹ Prerequisites

  • Python 3.8 or higher
  • OpenAI API Key with access to TTS models
  • Audio Player (automatically detected):
    • macOS: Built-in afplay (no installation needed)
    • Windows: Built-in Windows Media Player (no installation needed)
    • Linux: One of: mpg123, sox (play), ffmpeg (ffplay), vlc (cvlc), or alsa-utils (aplay)

๐Ÿ“ฆ Installation

git clone https://github.com/FiveOhhWon/summarize-mcp.git
cd summarize-mcp
pip install -e .

๐Ÿƒ Configuration

Claude Desktop

Add this configuration to your Claude Desktop config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Configuration:
{
  "mcpServers": {
    "summarize": {
      "command": "python",
      "args": ["/absolute/path/to/summarize-mcp/src/summarize_mcp/server.py"],
      "env": {
        "OPENAI_API_KEY": "your-openai-api-key"
      }
    }
  }
}

Environment Variables

  • OPENAI_API_KEY (required): Your OpenAI API key
  • DEBUG (optional): Set to "true" for verbose logging

๐Ÿ› ๏ธ Available Tools

play_summary

Converts text to speech and plays it in the background. Uses saved voice and tone preferences unless overridden.

Parameters:

  • summary (required): The text to convert to speech
  • voice (optional): Voice to use - alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, or shimmer (uses saved preference if not specified)
  • instructions (optional): Instructions for how the text should be spoken (uses saved tone if not specified)

Example:

{
  "summary": "The quick brown fox jumps over the lazy dog. This pangram contains all letters of the alphabet.",
  "voice": "nova",
  "instructions": "Speak slowly and clearly, emphasizing each word."
}

set_voice

Set the default voice for all future text-to-speech conversions.

Parameters:

  • voice (required): The voice to use - alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, or shimmer

Example:

{
  "voice": "nova"
}

set_tone

Set the default tone/instructions for how text should be spoken in all future TTS requests.

Parameters:

  • tone (required): The tone/instructions to use (e.g., "Speak slowly and calmly", "Be enthusiastic and energetic")

Example:

{
  "tone": "Speak in a warm, friendly manner with moderate pacing"
}

๐Ÿ“– Usage Examples

Basic Summary

"Please summarize this article and play it as audio"

The LLM will:

  1. Generate a summary of the content
  2. Use the play_summary tool to convert it to speech
  3. The audio will play in the background with saved preferences

Set Default Voice

"Set the default voice to nova"

This will save "nova" as your preferred voice for all future summaries.

Set Default Tone

"Set the tone to be warm and conversational with a slower pace"

This will save your tone preference for all future summaries.

Custom Voice (One-time)

"Summarize this document and play it using the 'sage' voice"

This will use "sage" for this summary only, without changing your default.

With Custom Instructions (One-time)

"Create an audio summary of this text. Make it sound enthusiastic and energetic."

This will use custom instructions for this summary only.

๐ŸŽฏ Voice Options

VoiceDescription
alloyNeutral and balanced
ashWarm and engaging
balladExpressive and dramatic
coralClear and professional (default)
echoSmooth and reflective
fableExpressive and animated
novaFriendly and upbeat
onyxDeep and authoritative
sageWise and measured
shimmerSoft and gentle

๐Ÿงช Development

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .

# Run the server
python -m summarize_mcp

# Run tests
python test.py

# Run with debug logging
DEBUG=true python -m summarize_mcp

๐Ÿ—๏ธ Architecture

summarize-mcp/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ summarize_mcp/
โ”‚       โ”œโ”€โ”€ __init__.py      # Package initialization
โ”‚       โ”œโ”€โ”€ __main__.py      # Entry point for python -m
โ”‚       โ””โ”€โ”€ server.py        # Main MCP server implementation
โ”œโ”€โ”€ pyproject.toml           # Python project metadata
โ”œโ”€โ”€ requirements.txt         # Python dependencies
โ”œโ”€โ”€ test.py                  # Test script
โ””โ”€โ”€ README.md               # This file

๐Ÿ”ง Technical Details

  • Audio Format: MP3 (OpenAI TTS output format)
  • Temporary Files: Stored in system temp directory
  • File Cleanup: Automatic cleanup after 10 seconds (configurable)
  • Old File Purge: Files older than 1 hour are cleaned on startup
  • Platform Support:
    • macOS: Uses built-in afplay
    • Windows: Uses PowerShell with Windows Media Player
    • Linux: Auto-detects available player (mpg123, sox, ffmpeg, vlc, alsa)
    • Fallback: Opens with system default audio application
  • State Management:
    • Preferences saved to ~/.summarize-mcp-state.json
    • Persists voice and tone settings between sessions
    • Automatic loading on startup
  • Error Handling: Comprehensive error handling with specific error types
  • Validation: Input validation using Pydantic models

๐Ÿšจ Troubleshooting

"OPENAI_API_KEY environment variable is not set"

Set your OpenAI API key in the Claude Desktop configuration.

"No audio player available"

Linux users: Install one of the supported audio players:

# Ubuntu/Debian
sudo apt-get install mpg123
# or
sudo apt-get install sox
# or
sudo apt-get install ffmpeg
# or
sudo apt-get install vlc

# Fedora/RHEL
sudo dnf install mpg123
# or similar for other players

# Arch
sudo pacman -S mpg123
# or similar for other players

Windows/macOS: Audio playback should work out of the box.

Audio doesn't play

  1. Check system volume
  2. Ensure no other audio issues on your system
  3. Enable debug logging with DEBUG=true
  4. Check the logs for any errors

๐Ÿ“ Changelog

v2.0.0 (Python Rewrite)

  • ๐Ÿ Complete rewrite in Python for better cross-platform support
  • ๐Ÿ”ง Improved async handling with Python's asyncio
  • ๐Ÿ“ฆ Simplified installation with pip
  • ๐Ÿ›ก๏ธ Enhanced type safety with Pydantic
  • ๐Ÿš€ Better performance and reliability

v1.2.0 (Persistent Preferences)

  • ๐Ÿ’พ Added persistent state management for voice and tone preferences
  • ๐ŸŽฏ Added set_voice tool to set default voice
  • ๐ŸŽฏ Added set_tone tool to set default speaking instructions
  • ๐ŸŽ† Added support for new OpenAI voices: ash, ballad, and sage
  • ๐Ÿ”„ play_summary now uses saved preferences unless overridden
  • ๐Ÿ“ State saved to ~/.summarize-mcp-state.json

v1.1.0 (Cross-Platform Support)

  • ๐ŸŒ Added Windows support using PowerShell/Windows Media Player
  • ๐Ÿง Added Linux support with auto-detection of audio players
  • ๐Ÿ”„ Added fallback to system default audio player
  • ๐Ÿ“ Updated documentation for multi-platform usage

v1.0.0 (Initial Release)

  • ๐ŸŽ‰ Initial release
  • โœจ Core TTS functionality with OpenAI integration
  • โœจ Support for 7 different voices
  • โœจ Custom speaking instructions
  • โœจ Background audio playback on macOS
  • โœจ Automatic file cleanup
  • โœจ TypeScript implementation
  • โœจ Comprehensive error handling

๐Ÿ’ฐ Estimated Costs

This tool uses OpenAI's gpt-4o-mini-tts model for text-to-speech conversion. Here's the pricing breakdown:

ModelAudio Output PriceEstimated Cost
gpt-4o-mini-tts$12.00 per 1M tokens$0.015 per minute of audio

Cost Examples:

  • 100-word summary (~30 seconds): ~$0.0075
  • 500-word summary (~2.5 minutes): ~$0.0375
  • 1000-word summary (~5 minutes): ~$0.075

The actual cost depends on:

  • Length of your summaries
  • Speaking speed (instructions can affect this)
  • How frequently you use the tool

For current pricing details, see OpenAI's pricing page.

๐Ÿ”ฎ Roadmap

  • Cross-platform audio playback (Windows, Linux)
  • Python implementation for better cross-platform support
  • Additional TTS providers (ElevenLabs, Amazon Polly)
  • Audio format options (WAV, OGG)
  • Playback control (pause, resume, stop)
  • Queue management for multiple summaries
  • Audio file caching
  • Speed and pitch controls
  • SSML support for advanced speech control

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the file for details.

๐Ÿ™ Acknowledgments