FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use Paper • 2603.08262 • Published 15 days ago • 41
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following Paper • 2601.06431 • Published Jan 10 • 12
Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning Paper • 2601.07641 • Published Jan 12 • 48
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published Nov 19, 2025 • 78
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 134
MemMamba: Rethinking Memory Patterns in State Space Model Paper • 2510.03279 • Published Sep 28, 2025 • 74
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning Paper • 2509.23768 • Published Sep 28, 2025 • 49