--- license: cc-by-nc-4.0 language: - en base_model: - Qwen/Qwen2.5-VL-7B-Instruct pipeline_tag: image-text-to-text library_name: transformers tags: - chart-to-code - multimodal - vision-language - reinforcement-learning - self-correction - matplotlib --- # MM-ReCoder

CVPR 2026  |  Project Page  |  arXiv  |  Code  |  SFT Cold-Start

**MM-ReCoder** is the 7B vision-language model from the CVPR 2026 paper [*MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction*](https://arxiv.org/abs/2604.01600). It converts a chart image into the matplotlib code that reproduces it. At inference time the model renders its own code with a sandboxed matplotlib tool, inspects the result, and self-corrects across multiple turns. This is the **final** RL-trained checkpoint. It is fine-tuned from [`Qwen/Qwen2.5-VL-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) via: 1. **SFT cold-start** — released separately as [`cwbc/MM-ReCoder-SFT-Cold-Start`](https://huggingface.co/cwbc/MM-ReCoder-SFT-Cold-Start). 2. **Multi-turn RL (GRPO), stage 1** — shared-first-turn optimization. 3. **Multi-turn RL (GRPO), stage 2** — full-trajectory optimization, resumed from stage 1. ## Usage The recommended way to use MM-ReCoder is through the inference scripts in the [official repository](https://github.com/ZitianTang/MM-ReCoder), which wrap the model with the self-correction agent loop (render → critique → revise): ```bash git clone https://github.com/ZitianTang/MM-ReCoder.git cd MM-ReCoder # Follow the Installation section in the repo README. # Downalod the MM-ReCoder checkpoint from Hugging Face hf download cwbc/MM-ReCoder # Two-turn self-correction on ChartMimic. bash examples/mmrecoder/inference/chartmimic_2turns.sh ``` ### Direct single-turn use (no self-correction) You can also load the model in a single-pass setting via `transformers`: ```python from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration from PIL import Image import torch model_id = "cwbc/MM-ReCoder" processor = AutoProcessor.from_pretrained(model_id) model = Qwen2_5_VLForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) image = Image.open("path/to/chart.png").convert("RGB") messages = [{ "role": "user", "content": [ {"type": "image", "image": image}, {"type": "text", "text": "Generate the matplotlib code that reproduces this chart."}, ], }] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=4096, do_sample=False) print(processor.batch_decode(out[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]) ``` This emits code in one shot. The full self-correction behavior requires the agent loop in the repository. ## Training - **Base model:** Qwen2.5-VL-7B-Instruct. - **RL algorithm:** GRPO with chart-specific rule-based rewards (format, color, text, layout, type) plus an LLM-as-a-judge model reward. - **RL data:** [Chart2Code-160k](https://huggingface.co/datasets/xxxllz/Chart2Code-160k) prompts. - **Evaluation:** [ChartMimic](https://github.com/ChartMimic/ChartMimic) (direct-600), [Plot2Code](https://github.com/TencentARC/Plot2Code), and [ChartX](https://github.com/InternScience/ChartVLM). See the [repository](https://github.com/ZitianTang/MM-ReCoder) for full training scripts and configs. ## Citation ```bibtex @inproceedings{tang2026mmrecoder, title={MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction}, author={Zitian Tang and Xu Zhang and Jianbo Yuan and Yang Zou and Varad Gunjal and Songyao Jiang and Davide Modolo}, booktitle={CVPR}, year={2026} } ``` ## License Released under the Apache 2.0 License, inheriting from the base Qwen2.5-VL-7B-Instruct license.