AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper β’ 2504.08942 β’ Published Apr 11, 2025 β’ 28