chenggong1995/Qwen2.5-3B-Instruct-Distill-bs17k-batch32-epoch3-8192-grpo-E3 Text Generation • 3B • Updated Mar 17 • 5
chenggong1995/Qwen-2.5-Math-7B-Max-v6-accuracy-format-gen14 Text Generation • 8B • Updated Mar 15 • 3
chenggong1995/Qwen-2.5-Base-7B-Max-v1-accuracy-format-gen14 Text Generation • 8B • Updated Mar 16 • 3
chenggong1995/Qwen2.5-3B-Instruct-Distill-om220k-fem32768-batch32-epoch3-8192-grpo-E3 Text Generation • 3B • Updated Mar 18 • 3
zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-low-medium-0_01-1024 Text Generation • 8B • Updated Mar 20 • 3
zijianh/DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-new Text Generation • 8B • Updated Mar 21 • 4
zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-low-high-0_1-1024 Text Generation • 8B • Updated Mar 21 • 4
zijianh/Qwen-2.5-7B-Simple-RL-length-penalty-low-high-0_3-1024 Text Generation • 8B • Updated Mar 22 • 5