Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Paper • 2502.04128 • Published Feb 6, 2025 • 27
MOSPA: Human Motion Generation Driven by Spatial Audio Paper • 2507.11949 • Published Jul 16, 2025 • 24
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers Paper • 2507.12956 • Published Jul 17, 2025 • 24
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper • 2506.08279 • Published Jun 9, 2025 • 27