Update README.md

11456a3 verified 4 months ago

4.7 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen2.5-Coder-7B
	tags:
	- code
	---
	# Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

	[![Paper](https://img.shields.io/badge/Paper-arXiv:2510.04081-B31B1B)](https://arxiv.org/abs/2510.04081)
	[![Conference](https://img.shields.io/badge/NeurIPS-2025-1E90FF)](https://neurips.cc/)
	[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)

	Caco-CodeGen is a code-driven reasoning generation model trained under the Caco framework.
	It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale.

	---

	## 🚀 Overview

	Traditional Chain-of-Thought (CoT) data often lacks verifiability and diversity.
	Caco addresses this by grounding reasoning in executable programs, enabling automatic correctness checks and scalable reasoning synthesis.

	\| Property \| Description \|
	\| ---------------------- \| -------------------------------------------------------------------------- \|
	\| Model Type \| Code LLM (Code-Aware Generator) \|
	\| Base Model \| Qwen2.5-Coder-7B \|
	\| Training Objective \| Next-token prediction on executable reasoning traces \|
	\| Training Data \| Code CoTs extracted and unified from math and algorithmic datasets \|
	\| Output Type \| Python-like executable reasoning steps (`code_cot`) \|
	\| Verification \| Code execution + output consistency filter \|

	---

	## 🧠 Methodology
	<p align="center"> <img src="https://github.com/LHL3341/Caco/blob/main/caco.png?raw=true" alt="Caco Framework Overview" width="600"/> </p>

	Caco constructs reasoning data through three scalable stages:

	### 1. Unifying Code CoT

	Collect diverse seed reasoning traces (mathematical + algorithmic), normalize them into a unified executable format.

	### 2. Scaling Code CoT

	Train a Code Generator to expand reasoning traces via Pattern-level Augmentation — restructuring logic (e.g., decomposition, reformulation, alternative solution paths).

	### 3. Instruction Reversing

	Back-translate executable reasoning into natural language problems and solutions, and apply dual correctness verification.

	---


	## ⚙️ Usage

	### Example Inference

	```bash
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "LHL3341/Caco-CodeGen"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")

	prompt = "<\|im_start\|>system\nYou are a helpful assistant.<\|im_end\|>\n<\|im_start\|>user\n"
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

	outputs = model.generate(**inputs, max_new_tokens=1024)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))

	```

	### Example use cases

	* Fine-tuning reasoning LLMs (math, logic, or code tasks)
	* Verifiable reasoning data augmentation
	* Program-based RL reward modeling (RLVR)
	* Cross-domain reasoning transfer experiments

	---

	## 📈 Benchmarks (Caco Models)

	\| Model \| MATH \| Olympiad \| Theorem-QA \|
	\| -------------------- \| -------- \| -------- \| ---------- \|
	\| DeepSeekMath-7B-Caco \| 68.2 \| 29.5 \| 33.8 \|
	\| Qwen2.5-7B-Caco \| 82.4 \| 46.5 \| 46.0 \|
	\| Llama3-8B-Caco \| 70.6 \| 34.1 \| 31.0 \|

	Models trained on Caco show consistent improvements across multiple reasoning benchmarks and domains.

	---

	## 🔬 Citation

	If you use Caco in your research, please cite:

	```bibtex
	@article{caco,
	title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning},
	author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu},
	journal={arXiv preprint arXiv:2510.04081},
	year={2025}
	}
	```

	---

	## 📜 License

	Apache 2.0 — free for academic and commercial use, with attribution.

	---

	## 🌱 Related Resources

	* [🧠 Caco Paper (arXiv:2510.04081)](https://arxiv.org/abs/2510.04081)
	* [🧩 Caco-1.3M Dataset](https://huggingface.co/datasets/LHL3341/Caco-1.3M)

	---

	## 💡 Future Directions

	* Raising Difficulty: integrate harder datasets (AM-Thinking-distill, DAPO)
	* Expanding Diversity: add science, proofs, procedural planning
	* RL with Verifiable Rewards (RLVR): use code execution as low-noise reward signal