SPHINX: A Synthetic Environment for Visual Perception and Reasoning Paper • 2511.20814 • Published 12 days ago • 2
SPHINX: A Synthetic Environment for Visual Perception and Reasoning Paper • 2511.20814 • Published 12 days ago • 2 • 2
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning Paper • 2510.27044 • Published Oct 30 • 5
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence Paper • 2511.01144 • Published Nov 3 • 3
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence Paper • 2511.01144 • Published Nov 3 • 3 • 1
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning Paper • 2510.27044 • Published Oct 30 • 5 • 1