ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts Paper • 2505.10010 • Published May 15 • 2