GEBench: Benchmarking Image Generation Models as GUI Environments Paper • 2602.09007 • Published 5 days ago • 38
MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments Paper • 2602.06075 • Published 11 days ago • 13
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published 8 days ago • 30
Accurate Failure Prediction in Agents Does Not Imply Effective Failure Prevention Paper • 2602.03338 • Published 11 days ago • 26
InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions Paper • 2602.06035 • Published 9 days ago • 22
Reinforcement World Model Learning for LLM-based Agents Paper • 2602.05842 • Published 9 days ago • 25
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents Paper • 2602.02474 • Published 12 days ago • 53
VLS: Steering Pretrained Robot Policies via Vision-Language Models Paper • 2602.03973 • Published 11 days ago • 22
Likelihood-Based Reward Designs for General LLM Reasoning Paper • 2602.03979 • Published 11 days ago • 8
EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models Paper • 2602.04515 • Published 10 days ago • 38
Self-Hinting Language Models Enhance Reinforcement Learning Paper • 2602.03143 • Published 11 days ago • 28
VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published 23 days ago • 4
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 23 days ago • 14
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models Paper • 2601.15224 • Published 24 days ago • 12
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 28 days ago • 32