Fix critical RL reward function exploits and training hyperparameters 803c93e nihalaninihal Claude Opus 4.6 commited on 15 days ago
Align with Advanced Llama 3.2 GRPO LoRA reference notebook pattern c7d253a nihalaninihal Claude Opus 4.6 commited on 15 days ago
Fix VALID_TARGETS_FOR_ATTACK and attacker heuristic/prompt inconsistencies 3ffb78a nihalaninihal Claude Opus 4.6 commited on 15 days ago
Align train.py and Colab notebook with official Unsloth+OpenEnv GRPO patterns e09a415 nihalaninihal Claude Opus 4.6 commited on 15 days ago
Add multi-agent GRPO training for all 3 agents (worker, attacker, oversight) 389e3bf nihalaninihal Claude Opus 4.6 commited on 15 days ago
Remove hackathon_env template, rewrite train.py for SentinelOpsArena 0e5a0a6 nihalaninihal Claude Opus 4.6 commited on 15 days ago
Initial project setup for OpenEnv Hackathon ccb5f4e nihalaninihal Claude Opus 4.6 commited on 16 days ago