data-centric-env / train_data_centric.py

Commit History

Fix num_generations=2
5d27dfe
verified

Aswini-Kumar commited on

SFT 200->350 steps for better format compliance
def99af
verified

Aswini-Kumar commited on

Fix curriculum window=50 and threshold=0.80
c27aa07
verified

Aswini-Kumar commited on

GRPO speed fix: 50 steps, 1 generation
4c55113
verified

Aswini-Kumar commited on

Add WHY comments + GRPO/SFT speed caps
8843277
verified

Aswini-Kumar commited on

Speed fix: cap SFT at 200 steps
49d8e89
verified

Aswini-Kumar commited on

Update train_data_centric.py
b572d19
verified

Aswini-Kumar commited on

Audit fixes: remove duplicate torch import, add metadata field, fix stale strings, fix test assertions, update reward docs
36f4bdf

Aswini-Kumar commited on

Redesign reward for discrimination: efficiency multiplier, strict penalties, stretch bonus, start at level 1
46f0850

Aswini-Kumar commited on

Fix demo mode crash: use max_steps param instead of unpicklable local class
3f7380e

Aswini-Kumar commited on

Optimize for fast iteration: 1.5B model, LoRA r=8, GRPO batch=2/gen=2, seq=512
3807e67

Aswini-Kumar commited on

Switch experiment tracking from W&B to TensorBoard (no API key required)
b80a8b2

Aswini-Kumar commited on

Enable W&B experiment tracking in SFT+GRPO phases (required by hackathon)
ffbb7d8

Aswini-Kumar commited on

refactor: extract agent_utils.py (shared prompt/commands/server utils), simplify reward to env+format, add audit.py
51a79ee

Aswini-Kumar commited on

Data-Centric AI RL Environment — OpenEnv Hackathon Submission
71dc210

Aswini-Kumar commited on