LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model Paper • 2603.01068 • Published 9 days ago • 19
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published Oct 16, 2025 • 68
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published 6 days ago • 80
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published 25 days ago • 20
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published 28 days ago • 50
CoPE-VideoLM: Codec Primitives For Efficient Video Language Models Paper • 2602.13191 • Published 24 days ago • 30
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published 28 days ago • 50
lmms-lab-encoder/wd_temporal_grounding_frames_max_64_max_448x448_pixels_with_fps Updated 23 days ago • 157
lmms-lab-encoder/wd_temporal_grounding_frames_max_64_max_448x448_pixels_with_fps Updated 23 days ago • 157
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning Paper • 2602.12099 • Published 25 days ago • 57
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder Paper • 2510.18795 • Published Oct 21, 2025 • 11