A collection of mutiple benchmarks for large reasoning model evaluation
datasets-and-models
non-profit
AI & ML interests
None defined yet.
Recent Activity
models
32
guanning-ai/maze_sft_weights_1207
Updated
guanning-ai/Gai
Updated
guanning-ai/1027-math4b-bz1024-pposz128-rollout4-seed20
Updated
guanning-ai/1024-1.5b-knk23-debug1004
Updated
guanning-ai/1024-jspo-4b-lr1e-6-bz64-pposz32-rollout4-seed6
Updated
guanning-ai/significance-test-1016
Updated
guanning-ai/Gai0
Updated
guanning-ai/jspo-0921
Updated
guanning-ai/jspo-0909
Updated
guanning-ai/jspo-0910
Updated
datasets
130
guanning-ai/gsm8k-metamath
Viewer
•
Updated
•
160k
•
6
guanning-ai/gsm8k-mumath
Viewer
•
Updated
•
92k
•
14
guanning-ai/gsm8k-mugglemath
Viewer
•
Updated
•
157k
•
8
guanning-ai/openr1-93K
Viewer
•
Updated
•
93.7k
•
24
guanning-ai/Polaris-53K
Viewer
•
Updated
•
53.3k
•
12
guanning-ai/maze_11x11_1m
Updated
•
7
guanning-ai/maze_13x13_1m
Updated
•
5
guanning-ai/maze_15x15_1m
Updated
•
12
guanning-ai/maze_19x19_1m
Updated
•
7
guanning-ai/maze_21x21_1m
Updated
•
9