Revrse commited on
Commit
7875858
·
verified ·
1 Parent(s): d88d0d6

Upload 5 files

Browse files
Files changed (5) hide show
  1. README.md +78 -5
  2. SPACE_SETUP.md +132 -0
  3. app.py +377 -0
  4. download_models.py +54 -0
  5. requirements.txt +22 -0
README.md CHANGED
@@ -1,12 +1,85 @@
1
  ---
2
- title: Sub200
3
- emoji: 🐠
4
  colorFrom: purple
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: sub200
3
+ emoji: 🎙️
4
  colorFrom: purple
5
+ colorTo: blue
6
  sdk: gradio
7
+ sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
+ hardware: zero-gpu-h200
12
  ---
13
 
14
+ # sub200 - Ultra Low Latency TTS Hosting
15
+
16
+ sub200 allows you to host different open source TTS (Text-to-Speech) engines with ultra low latency.
17
+
18
+ ## Features
19
+
20
+ - 🚀 **Ultra Low Latency** - Optimized for real-time speech synthesis
21
+ - 🎯 **Multiple Engines** - Support for Piper, Coqui TTS, Edge TTS, eSpeak, gTTS, and pyttsx3
22
+ - 🌐 **Web UI** - Simple, modern Gradio interface
23
+ - ⚡ **Fast** - Built with Gradio for high performance
24
+ - 🎮 **GPU Support** - Optimized for GPU acceleration with Coqui TTS (H200 dynamic allocation)
25
+
26
+ ## Available TTS Engines
27
+
28
+ 1. **Piper TTS** - Ultra low latency, offline
29
+ 2. **Coqui TTS** - High quality neural TTS (GPU accelerated)
30
+ 3. **Edge TTS** - Microsoft Edge TTS (free, online)
31
+ 4. **eSpeak** - Fast, lightweight, offline
32
+ 5. **Google TTS (gTTS)** - Online, requires internet
33
+ 6. **pyttsx3** - Offline, uses system voices
34
+
35
+ ## Usage
36
+
37
+ 1. Enter your text in the text box
38
+ 2. Select a TTS engine from the dropdown
39
+ 3. Adjust speed if needed (0.5x to 2.0x)
40
+ 4. Click "Generate Speech"
41
+ 5. Audio will auto-play when ready!
42
+
43
+ ## GPU Support
44
+
45
+ This Space is configured for **zero GPU** (H200 dynamic allocation):
46
+ - GPU is allocated automatically when Coqui TTS is used
47
+ - No GPU needed for other engines (Piper, Edge TTS, eSpeak, etc.)
48
+ - Efficient resource usage with dynamic allocation
49
+
50
+ ## Model Files
51
+
52
+ ### Piper Models
53
+ - Models are downloaded automatically at runtime if not present
54
+ - Or you can include them in the repository (they're ~60MB each)
55
+
56
+ ### Coqui Models
57
+ - Models are downloaded automatically on first use
58
+ - They're cached in the Space's storage
59
+
60
+ ## Local Development
61
+
62
+ ```bash
63
+ # Install dependencies
64
+ pip install -r requirements.txt
65
+
66
+ # Run server
67
+ python app.py
68
+ ```
69
+
70
+ Then open http://localhost:7860
71
+
72
+ ## Performance Tips
73
+
74
+ 1. **Use GPU** - Coqui TTS benefits significantly from GPU acceleration
75
+ 2. **Choose Right Engine**:
76
+ - **Piper** - Fastest, lowest latency, offline
77
+ - **Edge TTS** - Best quality, requires internet
78
+ - **Coqui** - High quality, GPU accelerated
79
+ - **eSpeak** - Fastest, basic quality, offline
80
+
81
+ ## Troubleshooting
82
+
83
+ - **No audio generated**: Check engine status in the accordion
84
+ - **GPU not working**: Ensure zero GPU is enabled in Space settings
85
+ - **Model download fails**: Check internet connection for online engines
SPACE_SETUP.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hugging Face Space Setup Guide (Gradio + Zero GPU)
2
+
3
+ This guide will help you deploy sub200 to a Hugging Face Space with **zero GPU** (H200 dynamic allocation) using Gradio SDK.
4
+
5
+ ## Quick Start
6
+
7
+ 1. **Create a new Space on Hugging Face**
8
+ - Go to https://huggingface.co/spaces
9
+ - Click "Create new Space"
10
+ - Name: `sub200` (or your preferred name)
11
+ - SDK: **Gradio** (not Docker!)
12
+ - Hardware: **Zero GPU** (H200 dynamic allocation)
13
+ - Visibility: Public or Private
14
+
15
+ 2. **Push this repository to the Space**
16
+ ```bash
17
+ git remote add huggingface https://huggingface.co/spaces/YOUR_USERNAME/sub200
18
+ git push huggingface main
19
+ ```
20
+
21
+ Or use the Hugging Face web interface to upload files.
22
+
23
+ ## Required Files
24
+
25
+ The following files are already configured:
26
+ - ✅ `README.md` - Space metadata with Gradio SDK configuration
27
+ - ✅ `app.py` - Gradio application
28
+ - ✅ `requirements.txt` - Python dependencies
29
+ - ✅ `download_models.py` - Model download script
30
+ - ✅ `.gitignore` - Git exclusions
31
+
32
+ ## Zero GPU Configuration
33
+
34
+ **Zero GPU** (H200 dynamic allocation) means:
35
+ - GPU is allocated **only when needed** (e.g., when using Coqui TTS)
36
+ - No GPU needed for other engines (Piper, Edge TTS, eSpeak, etc.)
37
+ - More efficient resource usage
38
+ - **Only works with Gradio SDK**, not Docker
39
+
40
+ ## GPU Usage
41
+
42
+ The GPU is automatically used when:
43
+ - **Coqui TTS** is selected - GPU accelerates neural TTS models
44
+ - Other engines (Piper, Edge TTS, eSpeak, gTTS, pyttsx3) work without GPU
45
+
46
+ ## Model Files
47
+
48
+ ### Piper Models
49
+ - Models are downloaded automatically at runtime if not present
50
+ - Or you can include them in the repository (they're ~60MB each)
51
+
52
+ ### Coqui Models
53
+ - Models are downloaded automatically on first use
54
+ - They're cached in the Space's storage
55
+ - First download may take a few minutes
56
+
57
+ ## Environment Variables
58
+
59
+ Hugging Face Spaces automatically sets:
60
+ - `PORT` - Server port (default: 7860)
61
+ - `SPACE_ID` - Your Space ID
62
+ - GPU is allocated dynamically when needed
63
+
64
+ ## Customization
65
+
66
+ ### Change Default Engine
67
+ Edit `app.py` and change the default value in `engine_select`:
68
+ ```python
69
+ value=available_engines[0] if available_engines else "espeak",
70
+ ```
71
+
72
+ ### Add More Models
73
+ 1. Add model files to `models/` directory
74
+ 2. Or modify `download_models.py` to download additional models
75
+
76
+ ### Update Dependencies
77
+ Edit `requirements.txt` and rebuild the Space.
78
+
79
+ ## Troubleshooting
80
+
81
+ ### Build Fails
82
+ - Check `requirements.txt` syntax
83
+ - Verify all dependencies are compatible
84
+ - Check Space logs for specific errors
85
+
86
+ ### GPU Not Working
87
+ - Confirm **Zero GPU** is enabled in Space settings
88
+ - Check that Coqui TTS is selected
89
+ - Verify PyTorch CUDA availability in logs
90
+
91
+ ### Models Not Loading
92
+ - Ensure models directory exists
93
+ - Check file permissions
94
+ - Verify model file paths
95
+ - Check internet connection for model downloads
96
+
97
+ ### Audio Not Playing
98
+ - Check browser console for errors
99
+ - Verify audio format is supported
100
+ - Try a different TTS engine
101
+
102
+ ## Performance Tips
103
+
104
+ 1. **Use Zero GPU** - Efficient resource usage with dynamic allocation
105
+ 2. **Choose Right Engine**:
106
+ - **Piper** - Fastest, lowest latency, offline
107
+ - **Edge TTS** - Best quality, requires internet
108
+ - **Coqui** - High quality, GPU accelerated (uses GPU dynamically)
109
+ - **eSpeak** - Fastest, basic quality, offline
110
+
111
+ 3. **Cache Models** - Models are cached after first download
112
+
113
+ ## Monitoring
114
+
115
+ - Check Space logs in the Hugging Face interface
116
+ - Monitor GPU usage in Space metrics (when GPU is allocated)
117
+ - Check engine status in the UI accordion
118
+
119
+ ## Differences from Docker Version
120
+
121
+ - Uses **Gradio SDK** instead of Docker
122
+ - Requires **Zero GPU** instead of persistent GPU
123
+ - GPU is allocated dynamically only when needed
124
+ - Simpler deployment (no Dockerfile needed)
125
+ - Automatic port configuration (7860)
126
+
127
+ ## Support
128
+
129
+ For issues or questions:
130
+ - Check the main README.md
131
+ - Review Space logs
132
+ - Open an issue on GitHub (if applicable)
app.py ADDED
@@ -0,0 +1,377 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ sub200 - Ultra Low Latency TTS Hosting Server
3
+ Supports multiple open-source TTS engines
4
+ Optimized for Hugging Face Spaces with Gradio and zero GPU (H200 dynamic allocation)
5
+ """
6
+
7
+ import os
8
+ import subprocess
9
+ import tempfile
10
+ from typing import Optional
11
+ import concurrent.futures
12
+ import asyncio
13
+ import gradio as gr
14
+ import numpy as np
15
+
16
+ # Import spaces for GPU decorator
17
+ try:
18
+ import spaces
19
+ except ImportError:
20
+ # Fallback if spaces not available (local development)
21
+ class spaces:
22
+ @staticmethod
23
+ def GPU(func):
24
+ return func
25
+
26
+ # Import TTS engines
27
+ def check_engine_availability():
28
+ """Check which TTS engines are available"""
29
+ engines = {
30
+ "piper": False,
31
+ "coqui": False,
32
+ "espeak": False,
33
+ "gtts": False,
34
+ "pyttsx3": False,
35
+ "edge_tts": False
36
+ }
37
+
38
+ # Check piper
39
+ try:
40
+ import piper
41
+ models_dir = os.path.join(os.path.dirname(__file__), "models")
42
+ if os.path.exists(models_dir):
43
+ for file in os.listdir(models_dir):
44
+ if file.endswith('.onnx'):
45
+ engines["piper"] = True
46
+ break
47
+ except:
48
+ pass
49
+
50
+ # Check coqui
51
+ try:
52
+ import TTS
53
+ engines["coqui"] = True
54
+ except:
55
+ pass
56
+
57
+ # Check espeak
58
+ try:
59
+ result = subprocess.run(["espeak", "--version"],
60
+ capture_output=True,
61
+ timeout=2)
62
+ engines["espeak"] = result.returncode == 0
63
+ except:
64
+ pass
65
+
66
+ # Check gTTS
67
+ try:
68
+ from gtts import gTTS
69
+ engines["gtts"] = True
70
+ except:
71
+ pass
72
+
73
+ # Check pyttsx3
74
+ try:
75
+ import pyttsx3
76
+ engines["pyttsx3"] = True
77
+ except:
78
+ pass
79
+
80
+ # Check edge_tts
81
+ try:
82
+ import edge_tts
83
+ engines["edge_tts"] = True
84
+ except:
85
+ pass
86
+
87
+ return engines
88
+
89
+ def run_async_blocking(coro):
90
+ """Run async coroutine from sync context"""
91
+ try:
92
+ loop = asyncio.get_event_loop()
93
+ if loop.is_running():
94
+ # Run in thread with new event loop
95
+ with concurrent.futures.ThreadPoolExecutor() as executor:
96
+ future = executor.submit(asyncio.run, coro)
97
+ return future.result()
98
+ else:
99
+ return loop.run_until_complete(coro)
100
+ except RuntimeError:
101
+ return asyncio.run(coro)
102
+
103
+ def generate_audio_piper(text: str, speed: float = 1.0):
104
+ """Generate audio using Piper TTS"""
105
+ try:
106
+ import piper
107
+ import soundfile as sf
108
+
109
+ models_dir = os.path.join(os.path.dirname(__file__), "models")
110
+ model_path = None
111
+
112
+ if os.path.exists(models_dir):
113
+ for file in os.listdir(models_dir):
114
+ if file.endswith('.onnx'):
115
+ model_path = os.path.join(models_dir, file)
116
+ break
117
+
118
+ if not model_path or not os.path.exists(model_path):
119
+ raise FileNotFoundError("Piper model not found")
120
+
121
+ piper_voice = piper.PiperVoice.load(model_path)
122
+ audio_data_np = piper_voice.synthesize(text)
123
+
124
+ # Return as numpy array for Gradio
125
+ return (piper_voice.config.sample_rate, audio_data_np)
126
+
127
+ except Exception as e:
128
+ raise Exception(f"Piper TTS failed: {str(e)}")
129
+
130
+ @spaces.GPU
131
+ def generate_audio_coqui(text: str, speed: float = 1.0):
132
+ """Generate audio using Coqui TTS (GPU accelerated)"""
133
+ try:
134
+ from TTS.api import TTS
135
+ import soundfile as sf
136
+
137
+ models = [
138
+ "tts_models/en/ljspeech/tacotron2-DDC",
139
+ "tts_models/en/ljspeech/glow-tts",
140
+ "tts_models/en/vctk/vits",
141
+ ]
142
+
143
+ tts = None
144
+ for model in models:
145
+ try:
146
+ tts = TTS(model_name=model, progress_bar=False)
147
+ break
148
+ except:
149
+ continue
150
+
151
+ if tts is None:
152
+ raise Exception("No Coqui TTS model available")
153
+
154
+ wav = tts.tts(text=text)
155
+ sample_rate = 22050
156
+ if hasattr(tts, 'synthesizer') and hasattr(tts.synthesizer, 'output_sample_rate'):
157
+ sample_rate = tts.synthesizer.output_sample_rate
158
+
159
+ return (sample_rate, wav)
160
+
161
+ except Exception as e:
162
+ raise Exception(f"Coqui TTS failed: {str(e)}")
163
+
164
+ def generate_audio_espeak(text: str, speed: float = 1.0):
165
+ """Generate audio using espeak"""
166
+ with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as audio_file:
167
+ audio_file_path = audio_file.name
168
+
169
+ try:
170
+ cmd = ["espeak", "-s", str(int(150 * speed)), "-w", audio_file_path, text]
171
+ subprocess.run(cmd, check=True, capture_output=True)
172
+
173
+ import soundfile as sf
174
+ audio_data, sample_rate = sf.read(audio_file_path)
175
+
176
+ return (sample_rate, audio_data)
177
+ except Exception as e:
178
+ raise Exception(f"eSpeak TTS failed: {str(e)}")
179
+ finally:
180
+ try:
181
+ os.unlink(audio_file_path)
182
+ except:
183
+ pass
184
+
185
+ def generate_audio_gtts(text: str, speed: float = 1.0):
186
+ """Generate audio using Google TTS"""
187
+ try:
188
+ from gtts import gTTS
189
+ import io
190
+ from pydub import AudioSegment
191
+
192
+ tts = gTTS(text=text, lang='en', slow=False)
193
+ audio_buffer = io.BytesIO()
194
+ tts.write_to_fp(audio_buffer)
195
+ audio_buffer.seek(0)
196
+
197
+ # Convert MP3 to WAV
198
+ audio = AudioSegment.from_mp3(audio_buffer)
199
+ wav_buffer = io.BytesIO()
200
+ audio.export(wav_buffer, format="wav")
201
+ wav_buffer.seek(0)
202
+
203
+ import soundfile as sf
204
+ audio_data, sample_rate = sf.read(wav_buffer)
205
+
206
+ return (sample_rate, audio_data)
207
+ except Exception as e:
208
+ raise Exception(f"gTTS failed: {str(e)}")
209
+
210
+ def generate_audio_pyttsx3(text: str, speed: float = 1.0):
211
+ """Generate audio using pyttsx3"""
212
+ try:
213
+ import pyttsx3
214
+
215
+ engine = pyttsx3.init()
216
+ engine.setProperty('rate', int(150 * speed))
217
+
218
+ with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as audio_file:
219
+ audio_file_path = audio_file.name
220
+
221
+ engine.save_to_file(text, audio_file_path)
222
+ engine.runAndWait()
223
+
224
+ import soundfile as sf
225
+ audio_data, sample_rate = sf.read(audio_file_path)
226
+
227
+ os.unlink(audio_file_path)
228
+ return (sample_rate, audio_data)
229
+ except Exception as e:
230
+ raise Exception(f"pyttsx3 failed: {str(e)}")
231
+
232
+ def generate_audio_edge_tts(text: str, speed: float = 1.0):
233
+ """Generate audio using Edge TTS"""
234
+ try:
235
+ import edge_tts
236
+
237
+ async def generate():
238
+ voices = await edge_tts.list_voices()
239
+ voice_obj = next((v for v in voices if v['Locale'].startswith('en')), None)
240
+ if voice_obj:
241
+ voice = voice_obj['ShortName']
242
+ else:
243
+ voice = "en-US-AriaNeural"
244
+
245
+ communicate = edge_tts.Communicate(text, voice, rate=f"+{int((speed - 1) * 100)}%")
246
+ audio_data = b""
247
+ async for chunk in communicate.stream():
248
+ if chunk["type"] == "audio":
249
+ audio_data += chunk["data"]
250
+ return audio_data
251
+
252
+ audio_data = run_async_blocking(generate())
253
+
254
+ # Convert MP3 bytes to numpy array
255
+ import io
256
+ from pydub import AudioSegment
257
+
258
+ audio = AudioSegment.from_mp3(io.BytesIO(audio_data))
259
+ wav_buffer = io.BytesIO()
260
+ audio.export(wav_buffer, format="wav")
261
+ wav_buffer.seek(0)
262
+
263
+ import soundfile as sf
264
+ audio_array, sample_rate = sf.read(wav_buffer)
265
+
266
+ return (sample_rate, audio_array)
267
+
268
+ except Exception as e:
269
+ raise Exception(f"Edge TTS failed: {str(e)}")
270
+
271
+ def generate_speech(text: str, engine: str, speed: float = 1.0):
272
+ """Main function to generate speech from text"""
273
+ if not text or not text.strip():
274
+ return None, "Please enter some text"
275
+
276
+ engines_status = check_engine_availability()
277
+
278
+ if not engines_status.get(engine, False):
279
+ available = [e for e, v in engines_status.items() if v]
280
+ if not available:
281
+ return None, "No TTS engines available"
282
+ engine = available[0] # Fallback to first available
283
+
284
+ try:
285
+ if engine == "piper":
286
+ sample_rate, audio_data = generate_audio_piper(text, speed)
287
+ elif engine == "coqui":
288
+ sample_rate, audio_data = generate_audio_coqui(text, speed)
289
+ elif engine == "gtts":
290
+ sample_rate, audio_data = generate_audio_gtts(text, speed)
291
+ elif engine == "pyttsx3":
292
+ sample_rate, audio_data = generate_audio_pyttsx3(text, speed)
293
+ elif engine == "edge_tts":
294
+ sample_rate, audio_data = generate_audio_edge_tts(text, speed)
295
+ else: # espeak
296
+ sample_rate, audio_data = generate_audio_espeak(text, speed)
297
+
298
+ return (sample_rate, audio_data), None
299
+
300
+ except Exception as e:
301
+ return None, f"Error: {str(e)}"
302
+
303
+ # Create Gradio interface
304
+ engines_status = check_engine_availability()
305
+ available_engines = [e for e, v in engines_status.items() if v]
306
+
307
+ if not available_engines:
308
+ available_engines = ["espeak"] # Fallback
309
+
310
+ # Create Gradio interface
311
+ with gr.Blocks(title="sub200 - Ultra Low Latency TTS", theme=gr.themes.Soft()) as demo:
312
+ gr.Markdown("""
313
+ # 🎙️ sub200 - Ultra Low Latency Text-to-Speech
314
+
315
+ Host different open source TTS engines with ultra low latency. Supports GPU acceleration for high-quality neural TTS.
316
+ """)
317
+
318
+ with gr.Row():
319
+ with gr.Column(scale=2):
320
+ text_input = gr.Textbox(
321
+ label="Enter text to convert",
322
+ placeholder="Type or paste your text here...",
323
+ lines=5,
324
+ value=""
325
+ )
326
+ with gr.Column(scale=1):
327
+ engine_select = gr.Dropdown(
328
+ label="TTS Engine",
329
+ choices=available_engines,
330
+ value=available_engines[0] if available_engines else "espeak",
331
+ info="Select the TTS engine to use"
332
+ )
333
+ speed_slider = gr.Slider(
334
+ label="Speed",
335
+ minimum=0.5,
336
+ maximum=2.0,
337
+ value=1.0,
338
+ step=0.1,
339
+ info="Speech speed multiplier"
340
+ )
341
+
342
+ generate_btn = gr.Button("Generate Speech", variant="primary", size="lg")
343
+
344
+ audio_output = gr.Audio(label="Generated Audio", type="numpy", autoplay=True)
345
+ error_output = gr.Textbox(label="Status", visible=True)
346
+
347
+ # Engine status
348
+ with gr.Accordion("Engine Status", open=False):
349
+ status_text = "\n".join([
350
+ f"**{engine}**: {'✓ Available' if engines_status.get(engine, False) else '✗ Not Available'}"
351
+ for engine in ["piper", "coqui", "espeak", "gtts", "pyttsx3", "edge_tts"]
352
+ ])
353
+ gr.Markdown(status_text)
354
+
355
+ # Connect the function
356
+ generate_btn.click(
357
+ fn=generate_speech,
358
+ inputs=[text_input, engine_select, speed_slider],
359
+ outputs=[audio_output, error_output]
360
+ )
361
+
362
+ # Auto-generate on text change (optional)
363
+ # text_input.submit(
364
+ # fn=generate_speech,
365
+ # inputs=[text_input, engine_select, speed_slider],
366
+ # outputs=[audio_output, error_output]
367
+ # )
368
+
369
+ # Try to download Piper models if not present
370
+ try:
371
+ import download_models
372
+ download_models.download_piper_model()
373
+ except:
374
+ pass
375
+
376
+ if __name__ == "__main__":
377
+ demo.launch(server_name="0.0.0.0", server_port=7860, share=False)
download_models.py ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Download Piper TTS models if not present
4
+ This script can be run during Docker build or at startup
5
+ """
6
+
7
+ import os
8
+ import urllib.request
9
+ import json
10
+
11
+ def download_piper_model():
12
+ """Download Piper model if not present"""
13
+ models_dir = "models"
14
+ os.makedirs(models_dir, exist_ok=True)
15
+
16
+ model_name = "en_US-lessac-low"
17
+ model_file = f"{model_name}.onnx"
18
+ config_file = f"{model_name}.onnx.json"
19
+
20
+ model_path = os.path.join(models_dir, model_file)
21
+ config_path = os.path.join(models_dir, config_file)
22
+
23
+ # Check if model already exists
24
+ if os.path.exists(model_path) and os.path.exists(config_path):
25
+ print(f"Model {model_name} already exists, skipping download")
26
+ return
27
+
28
+ print(f"Downloading Piper model: {model_name}")
29
+
30
+ # Hugging Face model repository
31
+ base_url = "https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US"
32
+
33
+ try:
34
+ # Download model file
35
+ if not os.path.exists(model_path):
36
+ print(f"Downloading {model_file}...")
37
+ url = f"{base_url}/{model_file}"
38
+ urllib.request.urlretrieve(url, model_path)
39
+ print(f"Downloaded {model_file}")
40
+
41
+ # Download config file
42
+ if not os.path.exists(config_path):
43
+ print(f"Downloading {config_file}...")
44
+ url = f"{base_url}/{config_file}"
45
+ urllib.request.urlretrieve(url, config_path)
46
+ print(f"Downloaded {config_file}")
47
+
48
+ except Exception as e:
49
+ print(f"Error downloading model: {e}")
50
+ print("Model download failed, but app will continue with other engines")
51
+
52
+ if __name__ == "__main__":
53
+ download_piper_model()
54
+
requirements.txt ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ spaces>=0.30.0
3
+ fastapi==0.109.2
4
+ uvicorn[standard]==0.27.1
5
+ python-multipart==0.0.9
6
+ pydantic==2.9.2
7
+
8
+ # TTS engines
9
+ TTS==0.21.3 # Coqui TTS - High quality neural TTS (GPU optimized)
10
+ edge-tts==7.2.3 # Edge TTS - Microsoft Edge TTS (free, online)
11
+ gTTS==2.5.4 # Google Text-to-Speech - Online, requires internet
12
+ pyttsx3==2.99 # pyttsx3 - Offline, uses system voices
13
+ piper-tts==1.3.0 # Piper TTS - Ultra low latency
14
+
15
+ # Audio processing
16
+ soundfile==0.13.1
17
+ numpy==1.26.4
18
+ pydub==0.25.1
19
+
20
+ # Additional dependencies for Coqui TTS
21
+ torch>=2.1.0
22
+ torchaudio>=2.1.0