Good Audio Generation space, model, dataset
Good Audio Generation space, model, dataset collection
-
Audio-to-Audio β’ Updated β’ 152k β’ 84 -
KittenML/kitten-tts-nano-0.1
Updated β’ 26.4k β’ 490 -
FunAudioLLM/ThinkSound
Video-to-Video β’ Updated β’ 48 -
ThinkSound
π307Generate audio for a video using captions and descriptions
-
Higgs Audio Demo
π€392Higgs Audio Demo
-
bosonai/higgs-audio-v2-generation-3B-base
Text-to-Speech β’ 6B β’ Updated β’ 172k β’ 644 -
Song Generation
π΅524Generate a custom song from lyrics and optional prompts
-
Vui
π’183NotebookLM conversational speech model
-
Hibiki Samples
π€49Translate speech in real-time with high fidelity
-
kyutai/moshiko-pytorch-bf16
Updated β’ 329k β’ 190 -
kyutai/mimi
Feature Extraction β’ 96.2M β’ Updated β’ 410k β’ β’ 270 -
maya-research/Veena
Text-to-Speech β’ 4B β’ Updated β’ 2.64k β’ 214 -
MiniMax Speech Tech Report
π100Generate high-quality speech from text with voice cloning
-
google/magenta-realtime
Updated β’ 194 β’ 526 -
PlayDiffusion
π¨118Generate modified audio from text and voice
-
Qwen2.5 Omni 7B Demo
π363Generate text and speech from text, audio, images, and videos
-
Open ASR Leaderboard
π1.16kDisplay and request speech recognition model benchmarks
-
Open NotebookLM
π142Generate a podcast to discuss the topic of your choice!
-
Voila Demo
π»43Chat with a voice-clone AI
-
Voice Clone
π£2.53kClone a voice to say custom text
-
moonshotai/Kimi-Audio-7B-Instruct
Text-to-Speech β’ 10B β’ Updated β’ 804 β’ 371 -
moonshotai/Kimi-Audio-7B
Text-to-Speech β’ 10B β’ Updated β’ 5.9k β’ 71 -
Dia 1.6B
π―1.72kGenerate realistic dialogue from a script, using Dia!
-
nari-labs/Dia-1.6B
Text-to-Speech β’ Updated β’ 166k β’ β’ 2.81k -
ByteDance/MegaTTS3
Text-to-Speech β’ Updated β’ 175 β’ 412 -
DiβͺβͺRhythm
πΆ657Blazingly Fast and Embarrassingly Simple Song Generation
-
Gemini Audio Video
β35Gemini understands audio and video!
-
nvidia/diar_sortformer_4spk-v1
Audio Classification β’ 0.1B β’ Updated β’ 3.63k β’ 115 -
ACE Step
π»602A Step Towards Music Generation Foundation Model
-
ACE-Step/ACE-Step-v1-3.5B
Text-to-Audio β’ Updated β’ 640 -
stepfun-ai/Step-Audio-2-mini
Any-to-Any β’ 8B β’ Updated β’ 1.32k β’ 239 -
neuphonic/neutts-air
Text-to-Speech β’ 0.7B β’ Updated β’ 23k β’ 795 -
NeuTTS-Air
β282Generate speech using reference audio and text
-
KaniTTS
π»104Generate speech from text using selected models
-
microsoft/UserLM-8b
Text Generation β’ 8B β’ Updated β’ 2.07k β’ 353 -
pipecat-ai/smart-turn-v3
Voice Activity Detection β’ Updated β’ 75 -
meituan-longcat/LongCat-Audio-Codec
Updated β’ 39