SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28 • 15
The Art of Scaling Reinforcement Learning Compute for LLMs Paper • 2510.13786 • Published Oct 15 • 30
The Role of Computing Resources in Publishing Foundation Model Research Paper • 2510.13621 • Published Oct 15 • 16
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data Paper • 2510.09781 • Published Oct 10 • 26
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Paper • 2510.01591 • Published Oct 2 • 26
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Paper • 2510.01591 • Published Oct 2 • 26
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation Paper • 2509.15194 • Published Sep 18 • 33
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation Paper • 2509.15194 • Published Sep 18 • 33
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27 • 84
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27 • 84
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving Paper • 2507.06804 • Published Jul 7 • 16
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving Paper • 2507.06804 • Published Jul 7 • 16 • 1
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning Paper • 2505.23754 • Published May 29 • 15
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 142
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning Paper • 2505.23754 • Published May 29 • 15
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation Paper • 2505.10962 • Published May 16 • 8
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation Paper • 2505.10962 • Published May 16 • 8