Example to Finetune PLM on New Data
We provide a step-by-step walkthrough for finetuning PLM on a custom dataset based on the high-level instructions in training.md. For this example, we will finetune PLM-8B on a specific domain (Radiology images) and compare model performance before and after finetuning.
Setup
Install required packages:
pip install datasets tqdm
1. Download dataset and prepare for training
import json
import os
import tqdm
from datasets import load_dataset
def convert_to_training_jsonl(dataset, split):
out_dir = "apps/plm/dummy_datasets/Radiology_mini"
os.makedirs(f"{out_dir}/images", exist_ok=True)
parsed_data = []
for entry in tqdm.tqdm(dataset[split]):
# save image
image_path = f"{out_dir}/images/{entry["image_id"]}.png"
entry["image"].save(image_path)
# create training conversation template
conversations = [
{"from": "human", "value": "You are an expert radiographer. Describe accurately what you see in this image."},
{"from": "assistant", "value": entry["caption"]}
]
parsed_data.append({
"image": f"{entry["image_id"]}.png",
"conversations": conversations,
})
# Write jsonl for training / evaluation
with open(f"{out_dir}/{split}.jsonl", "w") as f:
for entry in parsed_data:
f.write(json.dumps(entry) + "\n")
dataset = load_dataset("unsloth/Radiology_mini")
convert_to_training_jsonl(dataset, "train")
convert_to_training_jsonl(dataset, "test")
After running this code, the training data will be ready for use with the codebase:
apps/plm/dummy_datasets/Radiology_mini
βββ train.jsonl
βββ test.jsonl
βββ images
β βββ ROCOv2_2023_test_000022.png
β βββ ROCOv2_2023_train_059888.png
β βββ ...
where each data jsonl will contain data in the required training format.
# train.jsonl
{"image": "ROCOv2_2023_train_054311.png", "conversations": [{"from": "human", "value": "You are an expert radiographer. Describe accurately what you see in this image."}, {"from": "assistant", "value": "Panoramic radiography shows an osteolytic lesion in the right posterior maxilla with resorption of the floor of the maxillary sinus (arrows)."}]}
{"image": "ROCOv2_2023_train_058916.png", "conversations": [{"from": "human", "value": "You are an expert radiographer. Describe accurately what you see in this image."}, {"from": "assistant", "value": "ERCP showing distal CBD compression. ERCP - endoscopic retrograde cholangiopancreatography; CBD - common bile duct"}]}
...
2. Add dataset config to configs/datasets.yaml
Point to the newly created data in configs/datasets.yaml by adding these lines at the bottom.
radiology_finetune:
annotation: apps/plm/dummy_datasets/Radiology_mini/train.jsonl
root_dir: apps/plm/dummy_datasets/Radiology_mini/images
3. Copy and modify the provided finetuning config
The stage # 3 configs can be used to further finetune PLM configs/stage_3.
cp apps/plm/configs/stage_3/plm_8b.yaml apps/plm/configs/finetune/plm_8b_custom.yaml
Copy the config and modify the fields below.
# Set the path to save checkpoints to
dump_dir: checkpoints/finetune_example/
# Total number of training iterations
steps: 500
# Pointer to previously created datamix. Ideally, you would incorporate the new data into a larger datamix
# but for now, we finetune only on this data
data:
datamix: radiology_finetune:1
# Pointer to the initial model weights
checkpoint:
init_ckpt_path: facebook/Perception-LM-8B
Various other parameters can be changed such as learning rate, batch_size, etc. See comments in configs/stage_3/plm_8b.yaml for details.
4. Finetune the model
Finetune a model on a single node. For multi-node training, refer to the main training.md doc.
torchrun --nproc-per-node 8 -m apps.plm.train \
config=apps/plm/configs/finetune/plm_8b_custom.yaml
This will start training and save checkpoints, logs and configs in the previously specified dump_dir.
checkpoints/finetune_example/
βββ checkpoints
β βββ 0000000500
β βββ __0_0.distcp
β βββ __1_0.distcp
β βββ ...
β βββ params.json
β βββ train_state_00000.json
β βββ train_state_00001.json
β βββ ...
βββ config.yaml
βββ metrics.jsonl
βββ train.log
5. Consolidate the checkpoint
Models trained with FSDP require their weights to be consolidated before inference to create consolidated.pth.
python apps/plm/consolidate.py --ckpt checkpoints/finetune_example/checkpoints/0000000500/
6. Test and compare model generation
Use the provided generate helper script to compare the base model (before finetuning) to the finetuned version on an unseen test image from the same dataset.
python apps/plm/generate.py \
--ckpt facebook/Perception-LM-8B \
--media_type image \
--media_path apps/plm/dummy_datasets/Radiology_mini/images/ROCOv2_2023_test_000022.png \
--question 'You are an expert radiographer. Describe accurately what you see in this image.'
# Generation:
# The image is a medical scan of a person's abdomen, likely an MRI or CT scan. The scan shows the internal organs of the abdomen, including the liver, stomach, and intestines. The liver is located on the left side of the image, and it appears to be slightly enlarged. The stomach is located in the center of the image, and it appears to be normal in size. The intestines are located on the right side of the image, and they appear to be normal in size and shape. There are no visible abnormalities or tumors in the image. The scan is in black and white, with the organs appearing in shades of gray. The background of the image is black, which helps to highlight the details of the organs. Overall, the image suggests that the person's abdominal organs are healthy and normal.
python apps/plm/generate.py \
--ckpt checkpoints/finetune_example/checkpoints/0000000500/ \
--media_type image \
--media_path apps/plm/dummy_datasets/Radiology_mini/images/ROCOv2_2023_test_000022.png \
--question 'You are an expert radiographer. Describe accurately what you see in this image.'
# Generation:
# CT scan of the abdomen demonstrating a large liver metastasis (yellow arrow) in segment VII.
Comparing the two, we see the finetuned model provide concise descriptions following the style of the training set. Note that we use the same prompt as training since the dataset is small and the model has likely overfit to it. For robust training, include the new data in a large data mix (e.g., our provided SFT blend).
Wrap up
From here, the model is trained and ready for evaluation. The generation script can be modified to directly evaluate the model on the radiology image captioning task (test set) using captioning metrics (e.g., CIDEr). Alternately, if trained with a larger SFT blend, it can be used for domain-specific QA (e.g., VQA-Radiology).