view article Article Benchmark Smarter: Tailor Your Model Evaluation Suite with EvalScope 20 days ago β’ 7
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 β’ 40 items β’ Updated Dec 31, 2025 β’ 355
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. β’ 6 items β’ Updated 6 days ago β’ 155