Can Post-Training Transform LLMs into Causal Reasoners?
Paper
•
2602.06337
•
Published
•
1
CauGym model is a model trained via GRPO (Group Relative Policy Optimization) on VERL framework (https://github.com/verl-project/verl), and it is specialized for causal inference.
We have evaluated this model on CALM benchmark and CauGym benchmark, and the evaluation metric is accuracy.
| Benchmark | ATE | CDE | ETT | NDE | NIE | PN | PS |
|---|---|---|---|---|---|---|---|
| CALM | 0.990 | 0.994 | 0.900 | 0.940 | 0.930 | 0.928 | 0.866 |
| CauGym-rephrased | 0.948 | 0.982 | 0.856 | 0.890 | 0.888 | 0.778 | 0.816 |
| CauGym-ommitted | 0.935 | 0.963 | 0.837 | 0.934 | 0.838 | 0.900 | 0.907 |
| CauGym-deconfounding | 0.976 | 0.986 | 0.854 | 0.572 | 0.872 | 0.952 | 0.848 |
| CauGym-redundant | 0.972 | 0.966 | 0.918 | 0.850 | 0.888 | 0.934 | 0.910 |
| CauGym-insufficient | 0.884 | 0.902 | 0.686 | 0.696 | 0.958 | 0.940 | 0.954 |
@misc{chen2026posttrainingtransformllmscausal,
title={Can Post-Training Transform LLMs into Causal Reasoners?},
author={Junqi Chen and Sirui Chen and Chaochao Lu},
year={2026},
eprint={2602.06337},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.06337},
}
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B