Parallel-R1
/

Parallel-R1-Unseen_Step_200

Model card Files Files and versions

TongZheng1999 commited on Oct 6, 2025

Commit

ee90a38

·

verified ·

1 Parent(s): 7169f8e

Create README.md

Files changed (1) hide show

README.md +12 -0

README.md ADDED Viewed

	@@ -0,0 +1,12 @@

+---
+license: mit
+datasets:
+- Leo-Dai/dapo-math-17k_dedup
+---
+# 🧠 Parallel-R1-Unseen_Step_200
+> **Mid-Training Checkpoint of Parallel-R1: Towards Parallel Thinking via Reinforcement Learning**
+> Stage: **After 200 RL steps via alternating rewards** — showing the adaptive parallel reasoning ability and serve as structure exploration stage.
+This checkpoint aims to help you reproduce experimental results in Section 4.5: Extra Bonus: Parallel Thinking as a Mid-Training Exploration Strategy for RL Training.