LegendaryDawn/SDRL-freq-Qwen3-8B-Base-majority_n8_l4096-DAPO_n8_bs256_long12-yarn2-step200 8B • Updated 2 days ago • 36
LegendaryDawn/SDRL-freq-Qwen3-8B-Base-majority_n8_l4096-DAPO_n8_bs256_long12-yarn2-step200 8B • Updated 2 days ago • 36
LegendaryDawn/SDRL-freq-Qwen3-8B-Base-majority_n8_l4096-DAPO_n8_bs256_long12-yarn2-step125 8B • Updated 4 days ago • 7
LegendaryDawn/SDRL-freq-Qwen3-8B-Base-majority_n8_l4096-DAPO_n8_bs256_long12-yarn2-step125 8B • Updated 4 days ago • 7
LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long12-yarn2-step125 8B • Updated 4 days ago • 9
LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long12-yarn2-step125 8B • Updated 4 days ago • 9
LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long12-yarn2-step200 8B • Updated 6 days ago • 36
LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long12-yarn2-step200 8B • Updated 6 days ago • 36
Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning Paper • 2601.22297 • Published 21 days ago • 2
LegendaryDawn/SDRL-baseline-Qwen3-8B-Base-DAPO-n8-bs256-long12-yarn2-step200 8B • Updated 8 days ago • 35
LegendaryDawn/SDRL-baseline-Qwen3-8B-Base-DAPO-n8-bs256-long12-yarn2-step200 8B • Updated 8 days ago • 35
LegendaryDawn/SDRL-freq-Qwen3-8B-Base-majority_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 14 days ago • 410
LegendaryDawn/SDRL-freq-Qwen3-8B-Base-majority_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 14 days ago • 410
LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 15 days ago • 411
LegendaryDawn/SDRL-rand-Qwen3-8B-Base-random_n8_l4096-DAPO_n8_bs256_long8-step200 8B • Updated 15 days ago • 411
LegendaryDawn/SDRL-rand-Qwen3-4B-Base-icml-self-debate-random_n8_l2048-DAPO_n8_bs256_long8-step200 4B • Updated 23 days ago • 79
LegendaryDawn/SDRL-rand-Qwen3-4B-Base-icml-self-debate-random_n8_l2048-DAPO_n8_bs256_long8-step200 4B • Updated 23 days ago • 79