Eagle3 for Llama3 - Offline
Introduction
This document provides a step-by-step guide on how to train the EAGLE3 model for the Llama3.1-8B-Instruct model in an offline manner. In offline training, we generate the hidden states required by EAGLE3 draft model beforehand and store them to the disk. During training, we load them back to the GPU memory. As offline training requires a lot of disk space, we do not recommend running this on large datasets such as Perfect-Blend.
Training on ShareGPT dataset
Step 1. Prepare ShareGPT dataset
First of all, we should download the dataset.
python ./scripts/prepare_data.py --dataset sharegpt
Step 2. Prepare Hidden States
We need to prepare the hidden states for the training.
torchrun \
--standalone \
--nproc_per_node 8 \
scripts/prepare_hidden_states.py \
--target-model-path meta-llama/Llama-3.1-8B-Instruct \
--enable-aux-hidden-states \
--data-path ./cache/dataset/sharegpt_train.jsonl \
--output-path ./cache/hidden_states/sharegpt_train_Llama-3.1-8B-Instruct \
--chat-template llama3 \
--max-length 4096 \
--tp-size 1 \
--batch-size 32
The hidden states will be saved to the disk in the output-path directory.
Step 3. Start Training
torchrun \
--standalone \
--nproc_per_node 8 \
./scripts/train_eagle3.py \
--target-model-path meta-llama/Llama-3.1-8B-Instruct \
--draft-model-config ./configs/llama3-8B-eagle3.json \
--train-data-path ./cache/dataset/sharegpt_train.jsonl \
--train-hidden-states-path ./cache/hidden_states/sharegpt_train_Llama-3.1-8B-Instruct \
--output-dir ./outputs/llama3-8b-eagle3-sharegpt-offline \
--num-epochs 10 \
--batch-size 1 \
--tp-size 1 \
--learning-rate 1e-4 \
--max-length 4096 \
--chat-template llama3 \
--cache-dir ./cache