From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR Paper • 2508.07534 • Published Aug 11, 2025 • 1
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models Paper • 2406.12397 • Published Jun 18, 2024
Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework Paper • 2509.05007 • Published Sep 5, 2025
Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning Paper • 2508.02260 • Published Aug 4, 2025
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 30 items • Updated 3 days ago • 83