distill-m-6a3lnzvb-code / configs /replicate_zero4.toml

Commit History

add 9-config hparam sweep + new_layer_lr_mul param-groups support
3af7f4c
verified

Delta-Vector commited on

add micro_batch_size config key + per-micro inner loop in train step (fixes OOM for fp32+seq2048)
be991b1
verified

Delta-Vector commited on

fix OOM: chunked KL with checkpointing + PYTORCH_CUDA_ALLOC_CONF expandable_segments; add kl_chunk_size config key
eb5278f
verified

Delta-Vector commited on

add grow_layers, sweep configs (replicate_zero4, grow40_winning, grow40_simple), sweep runner
3f04365
verified

Delta-Vector commited on