Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2512.05145

about 3 hours ago

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 103
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6, 2024 • 39
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 119

about 10 hours ago

Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published 9 days ago • 48
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29 • 140
Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19 • 7
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 266

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31
Self-Improving VLM Judges Without Human Annotations

Paper • 2512.05145 • Published 9 days ago • 17

about 21 hours ago

Self-Improving VLM Judges Without Human Annotations

Paper • 2512.05145 • Published 9 days ago • 17

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165
Spotlight on Token Perception for Multimodal Reinforcement Learning

Paper • 2510.09285 • Published Oct 10 • 36
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Paper • 2510.17354 • Published Oct 20 • 33

Multimodal Alignment

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 36
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Paper • 2411.18203 • Published Nov 27, 2024 • 41
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

about 3 hours ago

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 103
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6, 2024 • 39
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 119

about 21 hours ago

Self-Improving VLM Judges Without Human Annotations

Paper • 2512.05145 • Published 9 days ago • 17

about 10 hours ago

Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published 9 days ago • 48
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29 • 140
Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19 • 7
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 266

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165
Spotlight on Token Perception for Multimodal Reinforcement Learning

Paper • 2510.09285 • Published Oct 10 • 36
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Paper • 2510.17354 • Published Oct 20 • 33

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31
Self-Improving VLM Judges Without Human Annotations

Paper • 2512.05145 • Published 9 days ago • 17

Multimodal Alignment

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 36
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Paper • 2411.18203 • Published Nov 27, 2024 • 41
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs