File size: 4,702 Bytes

f28eef5
 
 
11456a3
f28eef5

---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-Coder-7B
tags:
- code
---
# Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

[![Paper](https://img.shields.io/badge/Paper-arXiv:2510.04081-B31B1B)](https://arxiv.org/abs/2510.04081)
[![Conference](https://img.shields.io/badge/NeurIPS-2025-1E90FF)](https://neurips.cc/)
[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)

**Caco-CodeGen** is a code-driven reasoning generation model trained under the Caco framework.
It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale.

---

## 🚀 Overview

Traditional Chain-of-Thought (CoT) data often lacks **verifiability** and **diversity**.
**Caco** addresses this by grounding reasoning in *executable programs*, enabling automatic correctness checks and scalable reasoning synthesis.

| Property               | Description                                                                |
| ---------------------- | -------------------------------------------------------------------------- |
| **Model Type**         | Code LLM (Code-Aware Generator)                                            |
| **Base Model**         | Qwen2.5-Coder-7B                                                           |
| **Training Objective** | Next-token prediction on executable reasoning traces                       |
| **Training Data**      | Code CoTs extracted and unified from math and algorithmic datasets         |
| **Output Type**        | Python-like executable reasoning steps (`code_cot`)                        |
| **Verification**       | Code execution + output consistency filter                                 |

---

## 🧠 Methodology
<p align="center"> <img src="https://github.com/LHL3341/Caco/blob/main/caco.png?raw=true" alt="Caco Framework Overview" width="600"/> </p>

Caco constructs reasoning data through **three scalable stages**:

### 1. Unifying Code CoT

Collect diverse **seed reasoning traces** (mathematical + algorithmic), normalize them into a unified executable format.

### 2. Scaling Code CoT

Train a **Code Generator** to expand reasoning traces via **Pattern-level Augmentation** — restructuring logic (e.g., decomposition, reformulation, alternative solution paths).

### 3. Instruction Reversing

Back-translate executable reasoning into **natural language problems and solutions**, and apply **dual correctness verification**.

---


## ⚙️ Usage

### Example Inference

```bash
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "LHL3341/Caco-CodeGen"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")

prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```

### Example use cases

* Fine-tuning reasoning LLMs (math, logic, or code tasks)
* Verifiable reasoning data augmentation
* Program-based RL reward modeling (RLVR)
* Cross-domain reasoning transfer experiments

---

## 📈 Benchmarks (Caco Models)

| Model                | MATH     | Olympiad | Theorem-QA |
| -------------------- | -------- | -------- | ---------- |
| DeepSeekMath-7B-Caco | 68.2     | 29.5     | 33.8       |
| Qwen2.5-7B-Caco      | **82.4** | **46.5** | **46.0**   |
| Llama3-8B-Caco       | 70.6     | 34.1     | 31.0       |

Models trained on Caco show **consistent improvements** across multiple reasoning benchmarks and domains.

---

## 🔬 Citation

If you use **Caco** in your research, please cite:

```bibtex
@article{caco,
  title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning},
  author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu},
  journal={arXiv preprint arXiv:2510.04081},
  year={2025}
}
```

---

## 📜 License

Apache 2.0 — free for academic and commercial use, with attribution.

---

## 🌱 Related Resources

* [🧠 Caco Paper (arXiv:2510.04081)](https://arxiv.org/abs/2510.04081)
* [🧩 Caco-1.3M Dataset](https://huggingface.co/datasets/LHL3341/Caco-1.3M)

---

## 💡 Future Directions

* **Raising Difficulty:** integrate harder datasets (AM-Thinking-distill, DAPO)
* **Expanding Diversity:** add science, proofs, procedural planning
* **RL with Verifiable Rewards (RLVR):** use code execution as low-noise reward signal