ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published 5 days ago • 45
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation Paper • 2512.03036 • Published 7 days ago • 20
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation Paper • 2512.03036 • Published 7 days ago • 20 • 2
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence Paper • 2510.24693 • Published Oct 28 • 18
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation Paper • 2510.01284 • Published Sep 30 • 34
SPARK: Synergistic Policy And Reward Co-Evolving Framework Paper • 2509.22624 • Published Sep 26 • 17
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning Paper • 2509.22647 • Published Sep 26 • 32
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases Paper • 2312.15011 • Published Dec 22, 2023 • 19
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models Paper • 2503.21745 • Published Mar 27 • 1
Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity Paper • 2508.05609 • Published Aug 7 • 29
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience Paper • 2508.04700 • Published Aug 6 • 52
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Paper • 2507.15852 • Published Jul 21 • 38