arithmetic-grpo / docs /examples /multi_modal_example.rst

initial clean commit

1faccd4 18 days ago

944 Bytes

	Multi-Modal Example Architecture
	=================================

	Last updated: 04/28/2025.

	Introduction
	------------

	Now, verl has supported multi-modal training. You can use fsdp and
	vllm/sglang to start a multi-modal RL task. Megatron supports is also
	on the way.

	Follow the steps below to quickly start a multi-modal RL task.

	Step 1: Prepare dataset
	-----------------------

	.. code:: python

	# it will be saved in the $HOME/data/geo3k folder
	python examples/data_preprocess/geo3k.py

	Step 2: Download Model
	----------------------

	.. code:: bash

	# download the model from huggingface
	python3 -c "import transformers; transformers.pipeline(model='Qwen/Qwen2.5-VL-7B-Instruct')"

	Step 3: Perform GRPO training with multi-modal model on Geo3K Dataset
	---------------------------------------------------------------------

	.. code:: bash

	# run the task
	bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh