ETR: Outcome-Guided Elastic Trust Regions for Policy Optimization Paper • 2601.03723 • Published Jan 7 • 1
CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning Paper • 2509.25004 • Published Sep 29, 2025 • 2