Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published Sep 8 • 55
Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning Paper • 2509.06461 • Published Sep 8 • 19
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning Paper • 2509.03646 • Published Sep 3 • 30
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9 • 101
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 249
A General Theoretical Paradigm to Understand Learning from Human Preferences Paper • 2310.12036 • Published Oct 18, 2023 • 17
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published Feb 17 • 46
ReLearn: Unlearning via Learning for Large Language Models Paper • 2502.11190 • Published Feb 16 • 30
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 140
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Paper • 2405.15613 • Published May 24, 2024 • 17