FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning Paper β’ 2601.11141 β’ Published 20 days ago β’ 23
End-to-End Video Character Replacement without Structural Guidance Paper β’ 2601.08587 β’ Published 23 days ago β’ 8
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion Paper β’ 2512.23709 β’ Published Dec 29, 2025 β’ 49
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper β’ 2512.17504 β’ Published Dec 19, 2025 β’ 97
Openly licensed large image datasets Collection Openly licensed dataset with allowed commercial usage β’ 3 items β’ Updated Jul 1, 2024 β’ 1
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 β’ 10 items β’ Updated Jul 10, 2025 β’ 63
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper β’ 2510.09608 β’ Published Oct 10, 2025 β’ 51
LongLive: Real-time Interactive Long Video Generation Paper β’ 2509.22622 β’ Published Sep 26, 2025 β’ 187
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes Paper β’ 2506.00227 β’ Published May 30, 2025 β’ 12