RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published 4 days ago • 50
EvolProver: Advancing Automated Theorem Proving by Evolving Formalized Problems via Symmetry and Difficulty Paper • 2510.00732 • Published Oct 1, 2025 • 6
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper • 2508.14704 • Published Aug 20, 2025 • 43
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey Paper • 2505.03418 • Published May 6, 2025 • 9
view article Article ✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use Jan 3, 2025 • 23