| | --- |
| | license: apache-2.0 |
| | base_model: |
| | - Qwen/Qwen2.5-Coder-7B |
| | tags: |
| | - code |
| | --- |
| | # Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning |
| |
|
| | [](https://arxiv.org/abs/2510.04081) |
| | [](https://neurips.cc/) |
| | [](https://opensource.org/licenses/Apache-2.0) |
| |
|
| | **Caco-CodeGen** is a code-driven reasoning generation model trained under the Caco framework. |
| | It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale. |
| |
|
| | --- |
| |
|
| | ## π Overview |
| |
|
| | Traditional Chain-of-Thought (CoT) data often lacks **verifiability** and **diversity**. |
| | **Caco** addresses this by grounding reasoning in *executable programs*, enabling automatic correctness checks and scalable reasoning synthesis. |
| |
|
| | | Property | Description | |
| | | ---------------------- | -------------------------------------------------------------------------- | |
| | | **Model Type** | Code LLM (Code-Aware Generator) | |
| | | **Base Model** | Qwen2.5-Coder-7B | |
| | | **Training Objective** | Next-token prediction on executable reasoning traces | |
| | | **Training Data** | Code CoTs extracted and unified from math and algorithmic datasets | |
| | | **Output Type** | Python-like executable reasoning steps (`code_cot`) | |
| | | **Verification** | Code execution + output consistency filter | |
| |
|
| | --- |
| |
|
| | ## π§ Methodology |
| | <p align="center"> <img src="https://github.com/LHL3341/Caco/blob/main/caco.png?raw=true" alt="Caco Framework Overview" width="600"/> </p> |
| |
|
| | Caco constructs reasoning data through **three scalable stages**: |
| |
|
| | ### 1. Unifying Code CoT |
| |
|
| | Collect diverse **seed reasoning traces** (mathematical + algorithmic), normalize them into a unified executable format. |
| |
|
| | ### 2. Scaling Code CoT |
| |
|
| | Train a **Code Generator** to expand reasoning traces via **Pattern-level Augmentation** β restructuring logic (e.g., decomposition, reformulation, alternative solution paths). |
| |
|
| | ### 3. Instruction Reversing |
| |
|
| | Back-translate executable reasoning into **natural language problems and solutions**, and apply **dual correctness verification**. |
| |
|
| | --- |
| |
|
| |
|
| | ## βοΈ Usage |
| |
|
| | ### Example Inference |
| |
|
| | ```bash |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | |
| | model_name = "LHL3341/Caco-CodeGen" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda") |
| | |
| | prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n" |
| | inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
| | |
| | outputs = model.generate(**inputs, max_new_tokens=1024) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | |
| | ``` |
| |
|
| | ### Example use cases |
| |
|
| | * Fine-tuning reasoning LLMs (math, logic, or code tasks) |
| | * Verifiable reasoning data augmentation |
| | * Program-based RL reward modeling (RLVR) |
| | * Cross-domain reasoning transfer experiments |
| |
|
| | --- |
| |
|
| | ## π Benchmarks (Caco Models) |
| |
|
| | | Model | MATH | Olympiad | Theorem-QA | |
| | | -------------------- | -------- | -------- | ---------- | |
| | | DeepSeekMath-7B-Caco | 68.2 | 29.5 | 33.8 | |
| | | Qwen2.5-7B-Caco | **82.4** | **46.5** | **46.0** | |
| | | Llama3-8B-Caco | 70.6 | 34.1 | 31.0 | |
| |
|
| | Models trained on Caco show **consistent improvements** across multiple reasoning benchmarks and domains. |
| |
|
| | --- |
| |
|
| | ## π¬ Citation |
| |
|
| | If you use **Caco** in your research, please cite: |
| |
|
| | ```bibtex |
| | @article{caco, |
| | title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning}, |
| | author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu}, |
| | journal={arXiv preprint arXiv:2510.04081}, |
| | year={2025} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## π License |
| |
|
| | Apache 2.0 β free for academic and commercial use, with attribution. |
| |
|
| | --- |
| |
|
| | ## π± Related Resources |
| |
|
| | * [π§ Caco Paper (arXiv:2510.04081)](https://arxiv.org/abs/2510.04081) |
| | * [π§© Caco-1.3M Dataset](https://huggingface.co/datasets/LHL3341/Caco-1.3M) |
| |
|
| | --- |
| |
|
| | ## π‘ Future Directions |
| |
|
| | * **Raising Difficulty:** integrate harder datasets (AM-Thinking-distill, DAPO) |
| | * **Expanding Diversity:** add science, proofs, procedural planning |
| | * **RL with Verifiable Rewards (RLVR):** use code execution as low-noise reward signal |