SIMA 2: A Generalist Embodied Agent for Virtual Worlds Paper • 2512.04797 • Published 3 days ago • 11
TV2TV: A Unified Framework for Interleaved Language and Video Generation Paper • 2512.05103 • Published 3 days ago • 10
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 5 days ago • 168
WorldGen: From Text to Traversable and Interactive 3D Worlds Paper • 2511.16825 • Published 16 days ago • 21
Mixture of States: Routing Token-Level Dynamics for Multimodal Generation Paper • 2511.12207 • Published 22 days ago • 7
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published 25 days ago • 194
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published Oct 30 • 81
Surfer 2: The Next Generation of Cross-Platform Computer Use Agents Paper • 2510.19949 • Published Oct 22 • 38
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30 • 33
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published Oct 9 • 109
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published Oct 7 • 141
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Paper • 2510.19808 • Published Oct 22 • 28