Audit fixes: remove duplicate torch import, add metadata field, fix stale strings, fix test assertions, update reward docs 36f4bdf Aswini-Kumar commited on Apr 26
Redesign reward for discrimination: efficiency multiplier, strict penalties, stretch bonus, start at level 1 46f0850 Aswini-Kumar commited on Apr 26
Fix demo mode crash: use max_steps param instead of unpicklable local class 3f7380e Aswini-Kumar commited on Apr 26
Optimize for fast iteration: 1.5B model, LoRA r=8, GRPO batch=2/gen=2, seq=512 3807e67 Aswini-Kumar commited on Apr 26
Switch experiment tracking from W&B to TensorBoard (no API key required) b80a8b2 Aswini-Kumar commited on Apr 26
Enable W&B experiment tracking in SFT+GRPO phases (required by hackathon) ffbb7d8 Aswini-Kumar commited on Apr 26
refactor: extract agent_utils.py (shared prompt/commands/server utils), simplify reward to env+format, add audit.py 51a79ee Aswini-Kumar commited on Apr 26
Data-Centric AI RL Environment — OpenEnv Hackathon Submission 71dc210 Aswini-Kumar commited on Apr 25