Recursive Think-Answer Process for LLMs and VLMs Paper • 2603.02099 • Published 17 days ago • 6
Selective Training for Large Vision Language Models via Visual Information Gain Paper • 2602.17186 • Published 28 days ago • 3
Self-EvolveRec: Self-Evolving Recommender Systems with LLM-based Directional Feedback Paper • 2602.12612 • Published Feb 13 • 4
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration Paper • 2601.22674 • Published Jan 30 • 5
Toward Cognitive Supersensing in Multimodal Large Language Model Paper • 2602.01541 • Published Feb 2 • 16
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published Dec 23, 2025 • 30
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published Dec 19, 2025 • 98
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 122
Exploring MLLM-Diffusion Information Transfer with MetaCanvas Paper • 2512.11464 • Published Dec 12, 2025 • 15
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction Paper • 2511.23386 • Published Nov 28, 2025 • 16
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Paper • 2512.10739 • Published Dec 11, 2025 • 47
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Paper • 2512.01949 • Published Dec 1, 2025 • 9
Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization Paper • 2511.22586 • Published Nov 27, 2025 • 7
Monet: Reasoning in Latent Visual Space Beyond Images and Language Paper • 2511.21395 • Published Nov 26, 2025 • 18