simonycl/game-eval-qwen-Qwen3-1.7B-vs-openai-gpt-4.1-mini-20250715-095147 Viewer • Updated Jul 15, 2025 • 30k • 86
simonycl/game-eval-qwen-Qwen3-1.7B-vs-openai-gpt-4.1-mini-20250715-094611 Viewer • Updated Jul 15, 2025 • 30k • 86
simonycl/game-eval-qwen-Qwen3-0.6B-vs-openai-gpt-4.1-mini-20250715-094549 Viewer • Updated Jul 15, 2025 • 35.5k • 89
simonycl/game-eval-qwen-Qwen3-14B-Base-vs-openai-gpt-4.1-mini-20250715-094156 Viewer • Updated Jul 15, 2025 • 34.2k • 85
simonycl/game-eval-qwen-Qwen3-8B-Base-vs-openai-gpt-4.1-mini-20250715-094130 Viewer • Updated Jul 15, 2025 • 35.5k • 86
simonycl/game-eval-qwen-Qwen3-4B-Base-vs-openai-gpt-4.1-mini-20250715-094045 Viewer • Updated Jul 15, 2025 • 20.1k • 82
simonycl/game-eval-qwen-Qwen3-1.7B-Base-vs-openai-gpt-4.1-mini-20250715-094027 Viewer • Updated Jul 15, 2025 • 25.5k • 81
simonycl/game-eval-qwen-Qwen3-0.6B-Base-vs-openai-gpt-4.1-mini-20250715-094001 Viewer • Updated Jul 15, 2025 • 15.1k • 89
simonycl/game-eval-qwen-Qwen3-14B-Base-vs-openai-gpt-4.1-mini-20250715-002156 Viewer • Updated Jul 15, 2025 • 512 • 85
simonycl/game-eval-qwen-Qwen3-8B-Base-vs-openai-gpt-4.1-mini-20250715-000425 Viewer • Updated Jul 15, 2025 • 512 • 81
simonycl/game-eval-qwen-Qwen3-4B-Base-vs-openai-gpt-4.1-mini-20250714-235053 Viewer • Updated Jul 15, 2025 • 512 • 83
simonycl/game-eval-qwen-Qwen3-1.7B-Base-vs-openai-gpt-4.1-mini-20250714-233606 Viewer • Updated Jul 15, 2025 • 512 • 84
simonycl/game-eval-qwen-Qwen3-0.6B-Base-vs-openai-gpt-4.1-mini-20250714-232425 Viewer • Updated Jul 15, 2025 • 512 • 84
simonycl/Meta-Llama-3-8B-Instruct_ultrafeedback-annotate-judge-mtbench_cot_truth Viewer • Updated Dec 1, 2024 • 6 • 4
simonycl/ultrafeedback_binarized_raw-annotate-judge-mtbench_cot_reason Viewer • Updated Nov 30, 2024 • 61.1k • 3
simonycl/ultrafeedback_binarized_raw-annotate-judge-mtbench_cot_safe Viewer • Updated Nov 29, 2024 • 61.1k • 4
simonycl/ultrafeedback_binarized_raw-annotate-judge-mtbench_cot_hon Viewer • Updated Nov 28, 2024 • 61.1k • 4
simonycl/ultrafeedback_binarized_raw-annotate-judge-mtbench_cot_truth Viewer • Updated Nov 27, 2024 • 61.1k • 5