ty-talky-tts

Toowiredd/ty-talky-tts

3.2

If you are the rightful owner of ty-talky-tts and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Model Context Protocol (MCP) server is a crucial component for integrating AI agents with various applications, providing a structured communication framework.


title: Ty-talky TTS emoji: šŸŽ™ļø colorFrom: purple colorTo: indigo sdk: gradio sdk_version: 4.19.2 app_file: app.py pinned: false license: mit models:

  • suno/bark tags:
  • text-to-speech
  • tts
  • bark
  • audio
  • speech-synthesis

Ty-talky TTS System

Advanced Text-to-Speech system using Bark by Suno AI with SSML support, long document processing, and generation library.

Features

  • Bark TTS Engine: Neural codec-based emotional speech synthesis
  • SSML Processing: Full Speech Synthesis Markup Language support
  • Long Document Support: Process 40,000+ word documents with intelligent chunking
  • Multiple Voice Profiles: 11 distinct voice options including elderly-optimized
  • Generation Library: SQLite-based storage and management system
  • MCP Server: Model Context Protocol server for AI agent integration
  • REST API: Secured endpoints with JWT authentication
  • Real-time Updates: WebSocket streaming for progress tracking
  • Web SDK: JavaScript SDK for easy integration

Quick Start

HuggingFace Space

Visit: https://huggingface.co/spaces/toowired/Ty-talky

Local Installation

# Clone repository
git clone https://github.com/toowired/ty-talky-tts.git
cd ty-talky-tts

# Install dependencies
pip install -r requirements.txt

# Run MCP server
python api/mcp_server.py

# Or run UI directly
python src/app_enhanced.py

Project Structure

ty-talky-tts/
ā”œā”€ā”€ src/                  # Core application code
│   ā”œā”€ā”€ app_enhanced.py   # Main Gradio UI application
│   └── generation_library.py # Generation management system
ā”œā”€ā”€ api/                  # API and server implementations
│   └── mcp_server.py     # MCP server with REST endpoints
ā”œā”€ā”€ sdk/                  # Client SDKs
│   └── ty-talky-sdk.js   # JavaScript SDK
ā”œā”€ā”€ tests/                # Test suites
│   ā”œā”€ā”€ playwright_tests.js
│   └── run_tests.sh
ā”œā”€ā”€ notebooks/            # Jupyter notebooks
│   └── final_bark_notebook.ipynb
ā”œā”€ā”€ deploy/               # Deployment configurations
│   └── docker-compose-test.yml
ā”œā”€ā”€ docs/                 # Documentation
│   └── TEST_SCENARIOS.md
└── config/               # Configuration files

API Usage

Authentication

curl -X POST http://localhost:5000/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "user@example.com"}'

Generate TTS

curl -X POST http://localhost:5000/api/v1/tts/generate \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello world",
    "voice_profile": "Professional Male",
    "enable_ssml": false
  }'

SDK Usage

const TyTalkySDK = require('ty-talky-sdk');
const tts = new TyTalkySDK({ apiKey: 'YOUR_KEY' });

// Generate and wait for audio
const audio = await tts.generateAndWait('Hello world!', {
  voice: 'Professional Male'
});

// Stream generation with progress
await tts.generateStream('Long text...', {
  onProgress: (progress) => console.log(`${progress.percentage}% complete`)
});

Voice Profiles

  • Neutral: Standard voice
  • Professional Male/Female: Business presentations
  • Narrator: Audiobooks and stories
  • Podcast: Conversational style
  • Broadcast: News reading
  • Elderly Optimized: Clear, slower pace
  • Emotional Variants: Happy, sad, angry, excited

Testing

# Run Playwright tests
cd tests
./run_tests.sh

# Run with Docker
docker-compose -f deploy/docker-compose-test.yml up

Development

Requirements

  • Python 3.8+
  • CUDA GPU (optional, for faster processing)
  • 8GB+ RAM
  • 20GB+ storage for models

Environment Variables

SUNO_USE_SMALL_MODELS=1  # Use smaller models for testing
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
HF_TOKEN=your_huggingface_token

Architecture

Processing Pipeline

  1. Text Input: Receive text with optional SSML
  2. SSML Parsing: Extract markup and emotional context
  3. Chunking: Split long texts intelligently
  4. Generation: Process through Bark TTS
  5. Post-processing: Audio enhancement
  6. Storage: Save to generation library
  7. Delivery: Stream or download

Performance

  • 50 words: <10 seconds
  • 200 words: <30 seconds
  • 500 words: <60 seconds
  • 40,000+ words: Batch processing with progress

License

MIT License

Credits

  • Bark TTS by Suno AI
  • Built for elderly accessibility
  • Developed by toowired

Support

For issues or questions: