PAWN / scripts /train_all.py

Commit History

Consolidate all training logging through MetricsLogger
b103659

thomas-schweich commited on

Log patience counter, best val loss/step in val records
a050f72

thomas-schweich commited on

Per-model early stopping: freeze converged variants individually
190085d

thomas-schweich commited on

Revert batch size default to 256
2aee25d

thomas-schweich commited on

Push metrics to HF at eval intervals, add dashboard HF sync
86ec60c

thomas-schweich commited on

Add early stopping patience to multi-model training
07c93ac

thomas-schweich commited on

Remove .item() CUDA sync from hot path, batch size 512, run slugs
fc9d7f7

thomas-schweich commited on

Add post-training evals, /dev/shm checkpoints, async HF push, and _orig_mod fix
87b2fa6

thomas-schweich commited on

Safetensors migration, checkpoint integrity, and multi-model training. (#1)
230508d
unverified

thomas-schweich commited on