Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published 27 days ago • 137
LLaDA2.1: Speeding Up Text Diffusion via Token Editing Paper • 2602.08676 • Published 20 days ago • 68
MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper • 2602.08794 • Published 20 days ago • 154
Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory Paper • 2602.02393 • Published 27 days ago • 16
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture Paper • 2512.21675 • Published Dec 25, 2025 • 25
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 240
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization Paper • 2510.13554 • Published Oct 15, 2025 • 58
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs Paper • 2509.18056 • Published Sep 22, 2025 • 27
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models Paper • 2508.01548 • Published Aug 3, 2025 • 14
GlimpsePrune Collection A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models. https://github.com/HVision-NKU/GlimpsePrune • 6 items • Updated Aug 5, 2025 • 1
Gaussian Splatting with Discretized SDF for Relightable Assets Paper • 2507.15629 • Published Jul 21, 2025 • 23
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs Paper • 2506.21862 • Published Jun 27, 2025 • 36