VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction Paper • 2602.13294 • Published 17 days ago • 12
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 126
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 126
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 126
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25, 2025 • 185
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25, 2025 • 185