EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper • 2506.09827 • Published Jun 11, 2025 • 20
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 133
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model Paper • 2506.08967 • Published Jun 10, 2025 • 2
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub +2 Feb 12, 2025 • 81
Cosmos Collection ⚠️ This collection is archived. 👉 https://huggingface.co/collections/nvidia/cosmos-predict25 • 31 items • Updated 11 days ago • 299
Step-Audio Collection Step-Audio model family, including Audio-Tokenizer, Audio-Chat and TTS • 4 items • Updated Jul 31, 2025 • 32