add 9-config hparam sweep + new_layer_lr_mul param-groups support 3af7f4c verified Delta-Vector commited on about 1 month ago
add micro_batch_size config key + per-micro inner loop in train step (fixes OOM for fp32+seq2048) be991b1 verified Delta-Vector commited on Apr 7
fix OOM: chunked KL with checkpointing + PYTORCH_CUDA_ALLOC_CONF expandable_segments; add kl_chunk_size config key eb5278f verified Delta-Vector commited on Apr 7
add grow_layers, sweep configs (replicate_zero4, grow40_winning, grow40_simple), sweep runner 3f04365 verified Delta-Vector commited on Apr 7