File size: 4,702 Bytes
f28eef5
 
 
11456a3
f28eef5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-Coder-7B
tags:
- code
---
# Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

[![Paper](https://img.shields.io/badge/Paper-arXiv:2510.04081-B31B1B)](https://arxiv.org/abs/2510.04081)
[![Conference](https://img.shields.io/badge/NeurIPS-2025-1E90FF)](https://neurips.cc/)
[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)

**Caco-CodeGen** is a code-driven reasoning generation model trained under the Caco framework.
It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale.

---

## πŸš€ Overview

Traditional Chain-of-Thought (CoT) data often lacks **verifiability** and **diversity**.
**Caco** addresses this by grounding reasoning in *executable programs*, enabling automatic correctness checks and scalable reasoning synthesis.

| Property               | Description                                                                |
| ---------------------- | -------------------------------------------------------------------------- |
| **Model Type**         | Code LLM (Code-Aware Generator)                                            |
| **Base Model**         | Qwen2.5-Coder-7B                                                           |
| **Training Objective** | Next-token prediction on executable reasoning traces                       |
| **Training Data**      | Code CoTs extracted and unified from math and algorithmic datasets         |
| **Output Type**        | Python-like executable reasoning steps (`code_cot`)                        |
| **Verification**       | Code execution + output consistency filter                                 |

---

## 🧠 Methodology
<p align="center"> <img src="https://github.com/LHL3341/Caco/blob/main/caco.png?raw=true" alt="Caco Framework Overview" width="600"/> </p>

Caco constructs reasoning data through **three scalable stages**:

### 1. Unifying Code CoT

Collect diverse **seed reasoning traces** (mathematical + algorithmic), normalize them into a unified executable format.

### 2. Scaling Code CoT

Train a **Code Generator** to expand reasoning traces via **Pattern-level Augmentation** β€” restructuring logic (e.g., decomposition, reformulation, alternative solution paths).

### 3. Instruction Reversing

Back-translate executable reasoning into **natural language problems and solutions**, and apply **dual correctness verification**.

---


## βš™οΈ Usage

### Example Inference

```bash
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "LHL3341/Caco-CodeGen"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")

prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```

### Example use cases

* Fine-tuning reasoning LLMs (math, logic, or code tasks)
* Verifiable reasoning data augmentation
* Program-based RL reward modeling (RLVR)
* Cross-domain reasoning transfer experiments

---

## πŸ“ˆ Benchmarks (Caco Models)

| Model                | MATH     | Olympiad | Theorem-QA |
| -------------------- | -------- | -------- | ---------- |
| DeepSeekMath-7B-Caco | 68.2     | 29.5     | 33.8       |
| Qwen2.5-7B-Caco      | **82.4** | **46.5** | **46.0**   |
| Llama3-8B-Caco       | 70.6     | 34.1     | 31.0       |

Models trained on Caco show **consistent improvements** across multiple reasoning benchmarks and domains.

---

## πŸ”¬ Citation

If you use **Caco** in your research, please cite:

```bibtex
@article{caco,
  title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning},
  author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu},
  journal={arXiv preprint arXiv:2510.04081},
  year={2025}
}
```

---

## πŸ“œ License

Apache 2.0 β€” free for academic and commercial use, with attribution.

---

## 🌱 Related Resources

* [🧠 Caco Paper (arXiv:2510.04081)](https://arxiv.org/abs/2510.04081)
* [🧩 Caco-1.3M Dataset](https://huggingface.co/datasets/LHL3341/Caco-1.3M)

---

## πŸ’‘ Future Directions

* **Raising Difficulty:** integrate harder datasets (AM-Thinking-distill, DAPO)
* **Expanding Diversity:** add science, proofs, procedural planning
* **RL with Verifiable Rewards (RLVR):** use code execution as low-noise reward signal