Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation Paper • 2603.12247 • Published 12 days ago • 23
RLPR Collection Extrapolating RLVR to General Domains without Verifiers • 6 items • Updated Feb 9 • 5
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published Jun 23, 2025 • 32