Spaces:

Revrse
/

sub200

Runtime error

App Files Files Community

Revrse commited on Nov 10, 2025

Commit

7875858

verified ·

1 Parent(s): d88d0d6

Upload 5 files

Browse files

Files changed (5) hide show

README.md +78 -5
SPACE_SETUP.md +132 -0
app.py +377 -0
download_models.py +54 -0
requirements.txt +22 -0

README.md CHANGED Viewed

@@ -1,12 +1,85 @@
 ---
-title: Sub200
-emoji: 🐠
 colorFrom: purple
-colorTo: green
 sdk: gradio
-sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: sub200
+emoji: 🎙️
 colorFrom: purple
+colorTo: blue
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
+license: mit
+hardware: zero-gpu-h200
 ---
+# sub200 - Ultra Low Latency TTS Hosting
+sub200 allows you to host different open source TTS (Text-to-Speech) engines with ultra low latency.
+## Features
+- 🚀 **Ultra Low Latency** - Optimized for real-time speech synthesis
+- 🎯 **Multiple Engines** - Support for Piper, Coqui TTS, Edge TTS, eSpeak, gTTS, and pyttsx3
+- 🌐 **Web UI** - Simple, modern Gradio interface
+- ⚡ **Fast** - Built with Gradio for high performance
+- 🎮 **GPU Support** - Optimized for GPU acceleration with Coqui TTS (H200 dynamic allocation)
+## Available TTS Engines
+1. **Piper TTS** - Ultra low latency, offline
+2. **Coqui TTS** - High quality neural TTS (GPU accelerated)
+3. **Edge TTS** - Microsoft Edge TTS (free, online)
+4. **eSpeak** - Fast, lightweight, offline
+5. **Google TTS (gTTS)** - Online, requires internet
+6. **pyttsx3** - Offline, uses system voices
+## Usage
+1. Enter your text in the text box
+2. Select a TTS engine from the dropdown
+3. Adjust speed if needed (0.5x to 2.0x)
+4. Click "Generate Speech"
+5. Audio will auto-play when ready!
+## GPU Support
+This Space is configured for **zero GPU** (H200 dynamic allocation):
+- GPU is allocated automatically when Coqui TTS is used
+- No GPU needed for other engines (Piper, Edge TTS, eSpeak, etc.)
+- Efficient resource usage with dynamic allocation
+## Model Files
+### Piper Models
+- Models are downloaded automatically at runtime if not present
+- Or you can include them in the repository (they're ~60MB each)
+### Coqui Models
+- Models are downloaded automatically on first use
+- They're cached in the Space's storage
+## Local Development
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run server
+python app.py
+```
+Then open http://localhost:7860
+## Performance Tips
+1. **Use GPU** - Coqui TTS benefits significantly from GPU acceleration
+2. **Choose Right Engine**:
+   - **Piper** - Fastest, lowest latency, offline
+   - **Edge TTS** - Best quality, requires internet
+   - **Coqui** - High quality, GPU accelerated
+   - **eSpeak** - Fastest, basic quality, offline
+## Troubleshooting
+- **No audio generated**: Check engine status in the accordion
+- **GPU not working**: Ensure zero GPU is enabled in Space settings
+- **Model download fails**: Check internet connection for online engines

SPACE_SETUP.md ADDED Viewed

	@@ -0,0 +1,132 @@

+# Hugging Face Space Setup Guide (Gradio + Zero GPU)
+This guide will help you deploy sub200 to a Hugging Face Space with **zero GPU** (H200 dynamic allocation) using Gradio SDK.
+## Quick Start
+1. **Create a new Space on Hugging Face**
+   - Go to https://huggingface.co/spaces
+   - Click "Create new Space"
+   - Name: `sub200` (or your preferred name)
+   - SDK: **Gradio** (not Docker!)
+   - Hardware: **Zero GPU** (H200 dynamic allocation)
+   - Visibility: Public or Private
+2. **Push this repository to the Space**
+   ```bash
+   git remote add huggingface https://huggingface.co/spaces/YOUR_USERNAME/sub200
+   git push huggingface main
+   ```
+   Or use the Hugging Face web interface to upload files.
+## Required Files
+The following files are already configured:
+- ✅ `README.md` - Space metadata with Gradio SDK configuration
+- ✅ `app.py` - Gradio application
+- ✅ `requirements.txt` - Python dependencies
+- ✅ `download_models.py` - Model download script
+- ✅ `.gitignore` - Git exclusions
+## Zero GPU Configuration
+**Zero GPU** (H200 dynamic allocation) means:
+- GPU is allocated **only when needed** (e.g., when using Coqui TTS)
+- No GPU needed for other engines (Piper, Edge TTS, eSpeak, etc.)
+- More efficient resource usage
+- **Only works with Gradio SDK**, not Docker
+## GPU Usage
+The GPU is automatically used when:
+- **Coqui TTS** is selected - GPU accelerates neural TTS models
+- Other engines (Piper, Edge TTS, eSpeak, gTTS, pyttsx3) work without GPU
+## Model Files
+### Piper Models
+- Models are downloaded automatically at runtime if not present
+- Or you can include them in the repository (they're ~60MB each)
+### Coqui Models
+- Models are downloaded automatically on first use
+- They're cached in the Space's storage
+- First download may take a few minutes
+## Environment Variables
+Hugging Face Spaces automatically sets:
+- `PORT` - Server port (default: 7860)
+- `SPACE_ID` - Your Space ID
+- GPU is allocated dynamically when needed
+## Customization
+### Change Default Engine
+Edit `app.py` and change the default value in `engine_select`:
+```python
+value=available_engines[0] if available_engines else "espeak",
+```
+### Add More Models
+1. Add model files to `models/` directory
+2. Or modify `download_models.py` to download additional models
+### Update Dependencies
+Edit `requirements.txt` and rebuild the Space.
+## Troubleshooting
+### Build Fails
+- Check `requirements.txt` syntax
+- Verify all dependencies are compatible
+- Check Space logs for specific errors
+### GPU Not Working
+- Confirm **Zero GPU** is enabled in Space settings
+- Check that Coqui TTS is selected
+- Verify PyTorch CUDA availability in logs
+### Models Not Loading
+- Ensure models directory exists
+- Check file permissions
+- Verify model file paths
+- Check internet connection for model downloads
+### Audio Not Playing
+- Check browser console for errors
+- Verify audio format is supported
+- Try a different TTS engine
+## Performance Tips
+1. **Use Zero GPU** - Efficient resource usage with dynamic allocation
+2. **Choose Right Engine**:
+   - **Piper** - Fastest, lowest latency, offline
+   - **Edge TTS** - Best quality, requires internet
+   - **Coqui** - High quality, GPU accelerated (uses GPU dynamically)
+   - **eSpeak** - Fastest, basic quality, offline
+3. **Cache Models** - Models are cached after first download
+## Monitoring
+- Check Space logs in the Hugging Face interface
+- Monitor GPU usage in Space metrics (when GPU is allocated)
+- Check engine status in the UI accordion
+## Differences from Docker Version
+- Uses **Gradio SDK** instead of Docker
+- Requires **Zero GPU** instead of persistent GPU
+- GPU is allocated dynamically only when needed
+- Simpler deployment (no Dockerfile needed)
+- Automatic port configuration (7860)
+## Support
+For issues or questions:
+- Check the main README.md
+- Review Space logs
+- Open an issue on GitHub (if applicable)

app.py ADDED Viewed

	@@ -0,0 +1,377 @@

+"""
+sub200 - Ultra Low Latency TTS Hosting Server
+Supports multiple open-source TTS engines
+Optimized for Hugging Face Spaces with Gradio and zero GPU (H200 dynamic allocation)
+"""
+import os
+import subprocess
+import tempfile
+from typing import Optional
+import concurrent.futures
+import asyncio
+import gradio as gr
+import numpy as np
+# Import spaces for GPU decorator
+try:
+    import spaces
+except ImportError:
+    # Fallback if spaces not available (local development)
+    class spaces:
+        @staticmethod
+        def GPU(func):
+            return func
+# Import TTS engines
+def check_engine_availability():
+    """Check which TTS engines are available"""
+    engines = {
+        "piper": False,
+        "coqui": False,
+        "espeak": False,
+        "gtts": False,
+        "pyttsx3": False,
+        "edge_tts": False
+    }
+    # Check piper
+    try:
+        import piper
+        models_dir = os.path.join(os.path.dirname(__file__), "models")
+        if os.path.exists(models_dir):
+            for file in os.listdir(models_dir):
+                if file.endswith('.onnx'):
+                    engines["piper"] = True
+                    break
+    except:
+        pass
+    # Check coqui
+    try:
+        import TTS
+        engines["coqui"] = True
+    except:
+        pass
+    # Check espeak
+    try:
+        result = subprocess.run(["espeak", "--version"],
+                              capture_output=True,
+                              timeout=2)
+        engines["espeak"] = result.returncode == 0
+    except:
+        pass
+    # Check gTTS
+    try:
+        from gtts import gTTS
+        engines["gtts"] = True
+    except:
+        pass
+    # Check pyttsx3
+    try:
+        import pyttsx3
+        engines["pyttsx3"] = True
+    except:
+        pass
+    # Check edge_tts
+    try:
+        import edge_tts
+        engines["edge_tts"] = True
+    except:
+        pass
+    return engines
+def run_async_blocking(coro):
+    """Run async coroutine from sync context"""
+    try:
+        loop = asyncio.get_event_loop()
+        if loop.is_running():
+            # Run in thread with new event loop
+            with concurrent.futures.ThreadPoolExecutor() as executor:
+                future = executor.submit(asyncio.run, coro)
+                return future.result()
+        else:
+            return loop.run_until_complete(coro)
+    except RuntimeError:
+        return asyncio.run(coro)
+def generate_audio_piper(text: str, speed: float = 1.0):
+    """Generate audio using Piper TTS"""
+    try:
+        import piper
+        import soundfile as sf
+        models_dir = os.path.join(os.path.dirname(__file__), "models")
+        model_path = None
+        if os.path.exists(models_dir):
+            for file in os.listdir(models_dir):
+                if file.endswith('.onnx'):
+                    model_path = os.path.join(models_dir, file)
+                    break
+        if not model_path or not os.path.exists(model_path):
+            raise FileNotFoundError("Piper model not found")
+        piper_voice = piper.PiperVoice.load(model_path)
+        audio_data_np = piper_voice.synthesize(text)
+        # Return as numpy array for Gradio
+        return (piper_voice.config.sample_rate, audio_data_np)
+    except Exception as e:
+        raise Exception(f"Piper TTS failed: {str(e)}")
+@spaces.GPU
+def generate_audio_coqui(text: str, speed: float = 1.0):
+    """Generate audio using Coqui TTS (GPU accelerated)"""
+    try:
+        from TTS.api import TTS
+        import soundfile as sf
+        models = [
+            "tts_models/en/ljspeech/tacotron2-DDC",
+            "tts_models/en/ljspeech/glow-tts",
+            "tts_models/en/vctk/vits",
+        ]
+        tts = None
+        for model in models:
+            try:
+                tts = TTS(model_name=model, progress_bar=False)
+                break
+            except:
+                continue
+        if tts is None:
+            raise Exception("No Coqui TTS model available")
+        wav = tts.tts(text=text)
+        sample_rate = 22050
+        if hasattr(tts, 'synthesizer') and hasattr(tts.synthesizer, 'output_sample_rate'):
+            sample_rate = tts.synthesizer.output_sample_rate
+        return (sample_rate, wav)
+    except Exception as e:
+        raise Exception(f"Coqui TTS failed: {str(e)}")
+def generate_audio_espeak(text: str, speed: float = 1.0):
+    """Generate audio using espeak"""
+    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as audio_file:
+        audio_file_path = audio_file.name
+    try:
+        cmd = ["espeak", "-s", str(int(150 * speed)), "-w", audio_file_path, text]
+        subprocess.run(cmd, check=True, capture_output=True)
+        import soundfile as sf
+        audio_data, sample_rate = sf.read(audio_file_path)
+        return (sample_rate, audio_data)
+    except Exception as e:
+        raise Exception(f"eSpeak TTS failed: {str(e)}")
+    finally:
+        try:
+            os.unlink(audio_file_path)
+        except:
+            pass
+def generate_audio_gtts(text: str, speed: float = 1.0):
+    """Generate audio using Google TTS"""
+    try:
+        from gtts import gTTS
+        import io
+        from pydub import AudioSegment
+        tts = gTTS(text=text, lang='en', slow=False)
+        audio_buffer = io.BytesIO()
+        tts.write_to_fp(audio_buffer)
+        audio_buffer.seek(0)
+        # Convert MP3 to WAV
+        audio = AudioSegment.from_mp3(audio_buffer)
+        wav_buffer = io.BytesIO()
+        audio.export(wav_buffer, format="wav")
+        wav_buffer.seek(0)
+        import soundfile as sf
+        audio_data, sample_rate = sf.read(wav_buffer)
+        return (sample_rate, audio_data)
+    except Exception as e:
+        raise Exception(f"gTTS failed: {str(e)}")
+def generate_audio_pyttsx3(text: str, speed: float = 1.0):
+    """Generate audio using pyttsx3"""
+    try:
+        import pyttsx3
+        engine = pyttsx3.init()
+        engine.setProperty('rate', int(150 * speed))
+        with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as audio_file:
+            audio_file_path = audio_file.name
+        engine.save_to_file(text, audio_file_path)
+        engine.runAndWait()
+        import soundfile as sf
+        audio_data, sample_rate = sf.read(audio_file_path)
+        os.unlink(audio_file_path)
+        return (sample_rate, audio_data)
+    except Exception as e:
+        raise Exception(f"pyttsx3 failed: {str(e)}")
+def generate_audio_edge_tts(text: str, speed: float = 1.0):
+    """Generate audio using Edge TTS"""
+    try:
+        import edge_tts
+        async def generate():
+            voices = await edge_tts.list_voices()
+            voice_obj = next((v for v in voices if v['Locale'].startswith('en')), None)
+            if voice_obj:
+                voice = voice_obj['ShortName']
+            else:
+                voice = "en-US-AriaNeural"
+            communicate = edge_tts.Communicate(text, voice, rate=f"+{int((speed - 1) * 100)}%")
+            audio_data = b""
+            async for chunk in communicate.stream():
+                if chunk["type"] == "audio":
+                    audio_data += chunk["data"]
+            return audio_data
+        audio_data = run_async_blocking(generate())
+        # Convert MP3 bytes to numpy array
+        import io
+        from pydub import AudioSegment
+        audio = AudioSegment.from_mp3(io.BytesIO(audio_data))
+        wav_buffer = io.BytesIO()
+        audio.export(wav_buffer, format="wav")
+        wav_buffer.seek(0)
+        import soundfile as sf
+        audio_array, sample_rate = sf.read(wav_buffer)
+        return (sample_rate, audio_array)
+    except Exception as e:
+        raise Exception(f"Edge TTS failed: {str(e)}")
+def generate_speech(text: str, engine: str, speed: float = 1.0):
+    """Main function to generate speech from text"""
+    if not text or not text.strip():
+        return None, "Please enter some text"
+    engines_status = check_engine_availability()
+    if not engines_status.get(engine, False):
+        available = [e for e, v in engines_status.items() if v]
+        if not available:
+            return None, "No TTS engines available"
+        engine = available[0]  # Fallback to first available
+    try:
+        if engine == "piper":
+            sample_rate, audio_data = generate_audio_piper(text, speed)
+        elif engine == "coqui":
+            sample_rate, audio_data = generate_audio_coqui(text, speed)
+        elif engine == "gtts":
+            sample_rate, audio_data = generate_audio_gtts(text, speed)
+        elif engine == "pyttsx3":
+            sample_rate, audio_data = generate_audio_pyttsx3(text, speed)
+        elif engine == "edge_tts":
+            sample_rate, audio_data = generate_audio_edge_tts(text, speed)
+        else:  # espeak
+            sample_rate, audio_data = generate_audio_espeak(text, speed)
+        return (sample_rate, audio_data), None
+    except Exception as e:
+        return None, f"Error: {str(e)}"
+# Create Gradio interface
+engines_status = check_engine_availability()
+available_engines = [e for e, v in engines_status.items() if v]
+if not available_engines:
+    available_engines = ["espeak"]  # Fallback
+# Create Gradio interface
+with gr.Blocks(title="sub200 - Ultra Low Latency TTS", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("""
+    # 🎙️ sub200 - Ultra Low Latency Text-to-Speech
+    Host different open source TTS engines with ultra low latency. Supports GPU acceleration for high-quality neural TTS.
+    """)
+    with gr.Row():
+        with gr.Column(scale=2):
+            text_input = gr.Textbox(
+                label="Enter text to convert",
+                placeholder="Type or paste your text here...",
+                lines=5,
+                value=""
+            )
+        with gr.Column(scale=1):
+            engine_select = gr.Dropdown(
+                label="TTS Engine",
+                choices=available_engines,
+                value=available_engines[0] if available_engines else "espeak",
+                info="Select the TTS engine to use"
+            )
+            speed_slider = gr.Slider(
+                label="Speed",
+                minimum=0.5,
+                maximum=2.0,
+                value=1.0,
+                step=0.1,
+                info="Speech speed multiplier"
+            )
+    generate_btn = gr.Button("Generate Speech", variant="primary", size="lg")
+    audio_output = gr.Audio(label="Generated Audio", type="numpy", autoplay=True)
+    error_output = gr.Textbox(label="Status", visible=True)
+    # Engine status
+    with gr.Accordion("Engine Status", open=False):
+        status_text = "\n".join([
+            f"**{engine}**: {'✓ Available' if engines_status.get(engine, False) else '✗ Not Available'}"
+            for engine in ["piper", "coqui", "espeak", "gtts", "pyttsx3", "edge_tts"]
+        ])
+        gr.Markdown(status_text)
+    # Connect the function
+    generate_btn.click(
+        fn=generate_speech,
+        inputs=[text_input, engine_select, speed_slider],
+        outputs=[audio_output, error_output]
+    )
+    # Auto-generate on text change (optional)
+    # text_input.submit(
+    #     fn=generate_speech,
+    #     inputs=[text_input, engine_select, speed_slider],
+    #     outputs=[audio_output, error_output]
+    # )
+# Try to download Piper models if not present
+try:
+    import download_models
+    download_models.download_piper_model()
+except:
+    pass
+if __name__ == "__main__":
+    demo.launch(server_name="0.0.0.0", server_port=7860, share=False)

download_models.py ADDED Viewed

	@@ -0,0 +1,54 @@

+#!/usr/bin/env python3
+"""
+Download Piper TTS models if not present
+This script can be run during Docker build or at startup
+"""
+import os
+import urllib.request
+import json
+def download_piper_model():
+    """Download Piper model if not present"""
+    models_dir = "models"
+    os.makedirs(models_dir, exist_ok=True)
+    model_name = "en_US-lessac-low"
+    model_file = f"{model_name}.onnx"
+    config_file = f"{model_name}.onnx.json"
+    model_path = os.path.join(models_dir, model_file)
+    config_path = os.path.join(models_dir, config_file)
+    # Check if model already exists
+    if os.path.exists(model_path) and os.path.exists(config_path):
+        print(f"Model {model_name} already exists, skipping download")
+        return
+    print(f"Downloading Piper model: {model_name}")
+    # Hugging Face model repository
+    base_url = "https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US"
+    try:
+        # Download model file
+        if not os.path.exists(model_path):
+            print(f"Downloading {model_file}...")
+            url = f"{base_url}/{model_file}"
+            urllib.request.urlretrieve(url, model_path)
+            print(f"Downloaded {model_file}")
+        # Download config file
+        if not os.path.exists(config_path):
+            print(f"Downloading {config_file}...")
+            url = f"{base_url}/{config_file}"
+            urllib.request.urlretrieve(url, config_path)
+            print(f"Downloaded {config_file}")
+    except Exception as e:
+        print(f"Error downloading model: {e}")
+        print("Model download failed, but app will continue with other engines")
+if __name__ == "__main__":
+    download_piper_model()

requirements.txt ADDED Viewed

	@@ -0,0 +1,22 @@

+gradio>=4.0.0
+spaces>=0.30.0
+fastapi==0.109.2
+uvicorn[standard]==0.27.1
+python-multipart==0.0.9
+pydantic==2.9.2
+# TTS engines
+TTS==0.21.3  # Coqui TTS - High quality neural TTS (GPU optimized)
+edge-tts==7.2.3  # Edge TTS - Microsoft Edge TTS (free, online)
+gTTS==2.5.4  # Google Text-to-Speech - Online, requires internet
+pyttsx3==2.99  # pyttsx3 - Offline, uses system voices
+piper-tts==1.3.0  # Piper TTS - Ultra low latency
+# Audio processing
+soundfile==0.13.1
+numpy==1.26.4
+pydub==0.25.1
+# Additional dependencies for Coqui TTS
+torch>=2.1.0
+torchaudio>=2.1.0