Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published 14 days ago • 96
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger Paper • 2602.08222 • Published 4 days ago • 245
K-EXAONE Collection First journey to foundation models with frontier-level performance. • 4 items • Updated Jan 9 • 34
EXAONE-4.0 Collection EXAONE unified model series of 1.2B and 32B, integrating non-reasoning and reasoning modes. • 20 items • Updated Jul 29, 2025 • 55
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published 23 days ago • 72
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 225