Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone Paper • 2512.22615 • Published 6 days ago • 37 • 3
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion Paper • 2512.23709 • Published 4 days ago • 36 • 3
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation Paper • 2512.23705 • Published 4 days ago • 38 • 3
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published 7 days ago • 55 • 3
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published 4 days ago • 61 • 3
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 4 days ago • 85 • 4
A 58-Addition, Rank-23 Scheme for General 3x3 Matrix Multiplication Paper • 2512.21980 • Published 7 days ago • 2 • 3
Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards Paper • 2512.21625 • Published 8 days ago • 3 • 3
SVBench: Evaluation of Video Generation Models on Social Reasoning Paper • 2512.21507 • Published 8 days ago • 7 • 3
SlideTailor: Personalized Presentation Slide Generation for Scientific Papers Paper • 2512.20292 • Published 10 days ago • 8 • 3
SWE-RM: Execution-free Feedback For Software Engineering Agents Paper • 2512.21919 • Published 7 days ago • 8 • 2
InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search Paper • 2512.18745 • Published 12 days ago • 10 • 3
Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding Paper • 2512.21643 • Published 8 days ago • 10 • 3
See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning Paper • 2512.22120 • Published 7 days ago • 12 • 3
ProEdit: Inversion-based Editing From Prompts Done Right Paper • 2512.22118 • Published 7 days ago • 16 • 3
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published 10 days ago • 17 • 3
TimeBill: Time-Budgeted Inference for Large Language Models Paper • 2512.21859 • Published 7 days ago • 18 • 4
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture Paper • 2512.21675 • Published 8 days ago • 24 • 4
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents Paper • 2512.22047 • Published 7 days ago • 25 • 2
Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding Paper • 2512.17220 • Published 14 days ago • 89 • 3