Spaces:

Nitishkumar-ai
/

commitguard-env

Running on A10G

App Files Files Community

Nitishkumar-ai commited on about 12 hours ago

Commit

b74db43

0 Parent(s):

Initial clean deploy commit

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.agent/FUTURE_WORK.md +16 -0
.agent/README.md +38 -0
.agent/agent_instructions.md +69 -0
.agent/architecture.md +149 -0
.agent/checkpoints.md +57 -0
.agent/coding_conventions.md +63 -0
.agent/decision_log.md +40 -0
.agent/git_workflow.md +85 -0
.agent/project_context.md +82 -0
.agent/test_contracts.md +48 -0
.gitignore +0 -0
AGENT.md +25 -0
Dockerfile +61 -0
GEMINI.md +61 -0
README.md +101 -0
README_SUBMISSION.md +52 -0
agent_prompt.py +45 -0
client.py +26 -0
commitguard_env/__init__.py +8 -0
commitguard_env/environment.py +151 -0
commitguard_env/models.py +61 -0
commitguard_env/parse_action.py +98 -0
commitguard_env/reward.py +71 -0
commitguard_env/server.py +89 -0
commitguard_hf_blog.md +43 -0
current.md +426 -0
data/cwe_keywords.json +11 -0
data/devign_filtered.jsonl +0 -0
data/devign_test.jsonl +0 -0
data/devign_train.jsonl +0 -0
models.py +61 -0
openenv.yaml +6 -0
prd.md +381 -0
pyproject.toml +39 -0
scripts/README.md +7 -0
scripts/agent_prompt.py +38 -0
scripts/check_cuda.py +6 -0
scripts/check_disjoint.py +20 -0
scripts/evaluate.py +169 -0
scripts/gce_vm_runbook.md +149 -0
scripts/gcp_setup.sh +99 -0
scripts/lightning_ai_runbook.md +44 -0
scripts/lightning_setup.sh +61 -0
scripts/plot_results.py +103 -0
scripts/preprocess_devign.py +277 -0
scripts/run_and_plot_baseline.py +55 -0
scripts/train_grpo.py +173 -0
scripts/verify_3_action_loop.py +70 -0
server/__init__.py +0 -0
server/app.py +7 -0

.agent/FUTURE_WORK.md ADDED Viewed

	@@ -0,0 +1,16 @@

+<!--
+If an agent is tempted to build something not in the current scope, append it here instead and continue with the locked task.
+Source: ../prd.md 14 (Future Work). Do not execute these during the hackathon build unless explicitly re-scoped by the whole team (and documented).
+-->
+## Future Work (post-hackathon)
+- **Sandboxed exploit execution**  replace pattern-match reward with actual exploit runs against compiled code in a Docker sandbox
+- **Multi-file commit reasoning**  extend the env to support diffs spanning multiple files, with a context budget
+- **Self-play loop**  pair CommitGuard with a code-generation agent; defender and attacker train against each other (the AlphaGo pattern for security)
+- **Agentic harness integration**  wire into real CI pipelines via the OpenEnv MCP layer, enabling commit-time security review at PR open
+- **Real CVE corpus**  extend beyond Devign to recent CVE-tagged commits from major open-source repos
+- **Multi-language support**  current env is C-focused via Devign; extend to Python, JavaScript, Go
+- **Reward shape ablations**  formal study of how reward composition affects which vulnerability types the model learns fastest

.agent/README.md ADDED Viewed

	@@ -0,0 +1,38 @@

+## What this folder is
+`.agent/` is the **operating system for AI agents** on this repo. It locks the architecture decisions from `../prd.md`, prevents scope creep under deadline pressure, and makes sure three engineers can use Cursor / Claude Code in parallel without drifting.
+If you're an agent: **load `project_context.md` first**. If you're a human: treat this folder like the team's constitution.
+## Nonnegotiable rule (scope freeze)
+**Scope freeze is midnight Saturday (00:00 IST).** After that time:
+- Do not add features, endpoints, model changes, UI, or nice to haves.
+- Only do bug fixes, tests, wiring, docs, and reliability work that protects the locked deliverables.
+- If youre tempted to add something: append it to `FUTURE_WORK.md` and continue the locked task.
+## Files and what each enforces
+- `project_context.md`: **Single source of truth**. The compressed PRD: what were building, why, who for, locked stack, 30sec pitch, nongoals.
+- `architecture.md`: **Technical contract**. File layout, dataclass schemas, XML action format, reward signature, observation schema, cheating prevention, required HTTP endpoints.
+- `coding_conventions.md`: **How we write code**. Typed dataclasses, import order, errors, forbidden patterns, repo hygiene.
+- `decision_log.md`: **Locked decisions + fallbacks**. PRD 7.1 in table form, PRD 7.2 fallback triggers. New decisions go here with timestamp+author.
+- `agent_instructions.md`: **System prompt** for any coding agent. Read order, refusal rules, time pressure behavior, fallback triggers.
+- `checkpoints.md`: **Team sync contract** at midnight / 9 AM / 3 PM. What must be demoable; what triggers scope cuts; what gets cut first.
+- `test_contracts.md`: **Blocking tests** required before merge: no-leak, reward cases, XML parser robustness, env smoke.
+- `git_workflow.md`: **Parallel work rules**. Branch naming, commit conventions, merge gates, no-force-push rules, pre-submission checklist.
+- `FUTURE_WORK.md`: **Parking lot** for anything not in current scope (pre-populated from PRD 14).
+## Where the real spec lives
+The authoritative PRD is `../prd.md`. If any `.agent/` file disagrees with the PRD, **the PRD wins** and you must update the `.agent/` file immediately.
+## Task files (per person)
+This repo expects per-person task lists:
+- `../tasks_niti.md`
+- `../tasks_deepak.md`
+- `../tasks_divyank.md`
+If they dont exist yet, create them now with 1020 bullet tasks each and keep them updated. Agents should read the relevant one **after** `project_context.md` and `architecture.md`.

.agent/agent_instructions.md ADDED Viewed

	@@ -0,0 +1,69 @@

+## System prompt for CommitGuard coding agents
+You are an AI coding agent working on the **CommitGuard** hackathon repo.
+Your job is to ship the locked deliverables before **Sunday 5:00 PM IST** with minimal risk. This is a **deadline game**, not a feature game.
+### Read order (mandatory)
+1. Read `.agent/project_context.md` (single source of truth).
+2. Read `.agent/architecture.md` (technical contract).
+3. Read `.agent/coding_conventions.md` (how we write code).
+4. Read the relevant task list:
+   - `tasks_niti.md` OR `tasks_deepak.md` OR `tasks_divyank.md`
+   - If missing: create it with concrete bullets and continue.
+Only then start coding.
+### Scope control (hard refusal rule)
+**Scope freeze is midnight Saturday (00:00 IST).** After that:
+- Refuse any scope expansion, new features, new endpoints, new UI, new metrics.
+- Only do: bug fixes, tests, wiring, packaging, docs, reliability.
+If asked to add a feature:
+- Do **not** implement it.
+- Append it to `.agent/FUTURE_WORK.md` with 1-line rationale.
+- Continue the locked task.
+### Architectural choices (dont guess)
+If a decision is not covered by `.agent/architecture.md`:
+- Ask for clarification (or check `../prd.md`).
+- Do not invent new schemas or endpoints because it seems right.
+### Cheating prevention (highest priority constraint)
+The environment is RLVR: reward comes from dataset ground truth, but the agent must never see labels.
+Rules:
+- Observations must never contain ground truth (`is_vulnerable`, `cwe`, labels, this is vulnerable strings).
+- The server must never return label fields in HTTP responses.
+- Debug endpoints must never include ground truth.
+- Always keep `test_no_leak.py` green.
+### Time-pressure behavior (what good looks like)
+Under deadline pressure:
+- Prefer the simplest implementation that passes the contracts in `.agent/test_contracts.md`.
+- Treat the fallbacks in `.agent/project_context.md` as pre-approved pivots; if triggered, pivot immediately and log in `.agent/decision_log.md`.
+- Avoid refactors unless they remove a clear blocker.
+### Fallback triggers (execute immediately)
+If any trigger happens, switch to the fallback with no debate:
+- OOM on A10G  Qwen2.5-1.5B-Instruct
+- HF Jobs queue >30 min  GCP A10G on-demand
+- 3-action env not shipped by midnight  2-action env
+- Tiered reward buggy  binary reward only
+- Curve flat at 10 AM Sunday  qualitative narrative
+- Video recording fails twice  text trace in README
+### CLI-first ops (HF + GCP)
+Prefer repeatable CLI commands over UI clicks:
+- HF Space + repos: use `huggingface-cli` / git
+- GCP: use `gcloud`
+Document any required commands in `README.md` or `scripts/`.

.agent/architecture.md ADDED Viewed

	@@ -0,0 +1,149 @@

+## Architecture contract (do not improvise)
+This is the technical contract for CommitGuard. If youre about to invent a new shape, dont. Either its already here, or it belongs in `FUTURE_WORK.md`.
+Authoritative source: `../prd.md` (58).
+## Repo layout (locked)
+Target layout (names are contracts; adjust only if repo already differs):
+- `commitguard_env/`
+  - `models.py`  typed dataclasses: `Action`, `Observation`, `EnvState`, `GroundTruth`
+  - `parse_action.py`  XML action parser (robust to malformed output)
+  - `reward.py`  `compute_reward(...) -> float` (pure function)
+  - `environment.py`  `CommitGuardEnvironment` implementing OpenEnv reset/step/state
+  - `server.py`  FastAPI app exposing OpenEnv HTTP endpoints
+- `data/`
+  - `devign_filtered.jsonl`  dataset embedded in Docker image
+  - `cwe_keywords.json`  top-10 CWE  keyword map (for exploit sketch bonus)
+- `tests/`  blocking tests listed in `test_contracts.md`
+- `scripts/`  dataset preprocessing and ops scripts (CLI-first)
+- `README.md`  story + links + how to run
+If the codebase already has a different structure, keep the same semantics and update this file to match.
+## Dataclass schemas (typed; no untyped dicts in public APIs)
+All public shapes are typed dataclasses. Internal parsing may use dicts, but boundaries must be dataclasses.
+### `Action`
+- **Raw input**: `raw_action: str` (the model output)
+- **Parsed**:
+  - `action_type: Literal["request_context", "analyze", "verdict"]`
+  - `fields: ActionFields` (typed union by action_type)
+### `Observation` (cheating-prevention critical)
+Must include only:
+- `episode_id: str`
+- `step_idx: int`
+- `diff: str` (code_before/code_after diff or unified diff string)
+- `repo_files: list[str]` (or `available_files`)
+- `context_snippets: list[ContextSnippet]` (only if requested)
+- `budget_remaining: int`
+- `error: str | None` (for malformed actions, etc.)
+Must **never** include:
+- `is_vulnerable`, `label`, `ground_truth`, `cwe_type`, `target_file_with_label`
+- anything that trivially implies the label (e.g., this sample is vulnerable)
+### `GroundTruth` (server-only)
+Lives only on the server. Never serialized into observations.
+- `is_vulnerable: bool`
+- `cwe: str | None`
+- `target_file: str`
+- `exploit_keywords: list[str]` (or derived via CWE map)
+## Cheating-prevention rule (non-negotiable)
+**Observation must never contain ground truth.** Reward is the only scalar feedback; it must not leak label via strings or metadata.
+Enforcement:
+- observation schema excludes forbidden fields
+- `tests/test_no_leak.py` asserts forbidden keys and suspicious strings never appear
+- server returns reward as a float only; never returns label/cwe for debugging
+## Episode contract
+- Max **5 steps** per episode.
+- Episode ends when `verdict` is received OR budget hits zero.
+- `request_context` consumes budget and has per-step penalty.
+- `analyze` is allowed, logged, and should not affect reward directly.
+## Reward function (signature + invariants)
+Reward is RLVR: computed from ground truth and simple keyword checks, **not** an LLM judge.
+Signature:
+```python
+def compute_reward(
+    action: "Action",
+    ground_truth: "GroundTruth",
+    *,
+    cwe_keywords: dict[str, list[str]],
+    context_requests: int,
+) -> float: ...
+```
+Reward shape (from PRD):
+- correct vulnerable/safe: **+1.0**
+- correct CWE (when vulnerable): **+0.5**
+- plausible exploit sketch (keyword match): **+0.5**
+- false positive: **-1.0**
+- false negative: **-0.5**
+- per context request: **-0.05**
+- malformed action: penalize (recommended **-0.5**) but do not crash
+## XML action format (the model output contract)
+Model outputs exactly one top-level `<action>` block. Parser must tolerate:
+- extra whitespace
+- missing fields (treated as malformed)
+- wrong casing (normalize)
+- stray text before/after tags
+- malformed XML (best-effort extraction; never crash)
+### Spec
+Top-level:
+- `<action>`
+  - `<action_type>request_context|analyze|verdict</action_type>`
+  - `<fields>...</fields>`
+- `</action>`
+Fields by type:
+**request_context**
+- `<file_path>path/in/repo.ext</file_path>`
+- optional: `<start_line>int</start_line>`, `<end_line>int</end_line>`
+**analyze**
+- `<reasoning>free text</reasoning>`
+**verdict**
+- `<is_vulnerable>true|false</is_vulnerable>`
+- `<vuln_type>CWE-79|CWE-89|...|NONE</vuln_type>`
+- `<exploit_sketch>free text</exploit_sketch>`
+Parsing rules:
+- if `action_type` missing/invalid  malformed
+- booleans accept `true/false/1/0/yes/no` (case-insensitive)
+- `vuln_type` normalized; if safe verdict, allow `NONE`
+- on malformed: return a safe `Action` with `action_type="analyze"` and `error` set, and apply malformed penalty
+## Env server HTTP endpoints (P0)
+The env server must expose these endpoints (names from PRD 8.1):
+- `GET /health`  200 OK and simple JSON payload
+- `POST /reset`  returns initial `Observation` (+ episode id)
+- `POST /step`  accepts raw action string, returns `{observation, reward, done, info}`
+- `GET /state`  returns minimal server/env state for debugging (no ground truth)
+- `GET /docs`  FastAPI OpenAPI docs (automatic)
+Do not add new endpoints after scope freeze unless required for reliability.

.agent/checkpoints.md ADDED Viewed

	@@ -0,0 +1,57 @@

+## Checkpoints (sync-or-die contract)
+Goal: keep three engineers aligned and prevent cool demo scope creep from killing the submission. Source: `../prd.md` 12.
+### Checkpoint 1  Midnight (00:00 IST)  scope freeze + Phase 1 gate
+**Everyone must demonstrate (live, locally or on Space):**
+- **Env server runs** and responds to `GET /health`
+- **OpenEnv loop works**: `reset`  `step`  done, without crashing
+- **Action parser is robust**: malformed XML doesnt crash; returns safe error
+- **No-leak invariant**: observation contains no ground truth fields
+**Role deliverables:**
+- **Env/Server owner**: endpoints exist (`/health`, `/reset`, `/step`, `/state`, `/docs`)
+- **Reward owner**: reward function wired and deterministic on handcrafted cases
+- **Training owner**: mock training loop can call env repeatedly (even if reward is dummy)
+**If any of these are red, trigger a scope cut immediately:**
+- 3-action env incomplete  cut to 2-action env (analyze + verdict)
+- Tiered reward unstable  cut to binary reward only
+**After this checkpoint:**
+- **Scope freeze is active.** New features go to `.agent/FUTURE_WORK.md` only.
+### Checkpoint 2  9:00 AM Sunday  training evidence gate
+**Everyone must demonstrate:**
+- Training run launched (HF Jobs A10G preferred) or fallback running
+- Wandb logging works (reward curve visible)
+- Evaluation script/notebook can run 100 held-out samples
+**Scope-cut triggers:**
+- Training blocked by infra >30 min  move to GCP A10G fallback
+- Training curve still flat by 10:00 AM  commit to qualitative narrative (no more training tweaks)
+**What gets cut first (in order):**
+1. P2 items (web UI polish, blog post)
+2. Per-CWE breakdown (keep overall accuracy)
+3. Exploit sketch bonus (keep binary + CWE if stable)
+4. CWE classification bonus (keep binary only)
+### Checkpoint 3  3:00 PM Sunday  feature freeze gate
+**Everyone must demonstrate:**
+- HF Space is live and stable; `/health` 200; `/docs` loads
+- `tests/` pass (see `.agent/test_contracts.md`)
+- Demo artifact path is locked (video or text-trace fallback)
+- README has all submission links (Space, notebook, video, wandb, repo)
+**Hard rule:**
+- **No changes after 3:00 PM** except emergency fixes that prevent submission failure.
+**Final scope cuts (if needed to protect submission):**
+1. Video  text trace in README
+2. Training curve  single plot + narrative
+3. Held-out eval  small N sanity check

.agent/coding_conventions.md ADDED Viewed

	@@ -0,0 +1,63 @@

+## Coding conventions (enforced under deadline pressure)
+This repo is optimized for: **correctness, reproducibility, and not leaking labels**. Read `architecture.md` first.
+## Python style (hard rules)
+- **Typed dataclasses everywhere** for public API shapes (actions/observations/state).
+  - Use `@dataclass(frozen=True, slots=True)` by default.
+  - Public functions must be type-annotated end-to-end.
+- **No untyped dicts in public APIs.** Dicts are allowed only internally (e.g., during XML parse), and must be converted to dataclasses at the boundary.
+- Keep functions small. Prefer pure functions (`reward.py`) with no hidden state.
+## Import ordering
+1. stdlib
+2. third-party
+3. local modules
+Within a section: alphabetical. One import per line if it improves diff clarity.
+## Docstrings and naming
+- Docstrings: short, imperative, include constraints (e.g., must not leak ground truth).
+- Names: explicit over clever (`compute_reward`, `parse_action_xml`, `EpisodeState`).
+## Error handling patterns
+- **Never crash on model output.** Malformed actions must be handled gracefully.
+- Raise exceptions only for programmer errors; user/model errors return structured error fields.
+- Every boundary (HTTP handlers, XML parser) must be defensive:
+  - validate inputs
+  - clamp budgets
+  - return safe defaults
+## Forbidden patterns (do not do these)
+- **No LLM-as-judge in reward.** Reward must be verifiable (dataset truth + keyword checks). See `architecture.md`.
+- **No label leakage**: do not log, return, or print ground truth in observations, HTTP responses, or debug endpoints.
+- **No hardcoded local paths** (e.g., `C:\\Users\\...`, `/home/...`). Use repo-relative paths + `pathlib`.
+- **No committing data files > 5MB** without explicit team sign-off. (If necessary, use HF Datasets or remote storage.)
+- **No localStorage in any UI.** If you add UI later (unlikely), store state server-side or in-memory only.
+- **No adding endpoints/features after scope freeze** (midnight Saturday).
+## Repo hygiene
+- Prefer CLI-driven ops so teammates can reproduce quickly:
+  - HF: `huggingface-cli`, `hf` (where available), `git lfs` if needed
+  - GCP: `gcloud`
+- Keep logs minimal. Under hackathon pressure, noisy logs hide real bugs.
+- Dont vendor big artifacts in git. Link them (video, wandb, Space) from README.
+## Scope creep rule (non-negotiable)
+If youre tempted to add a feature that isnt required for the locked deliverables:
+- Append one bullet to `FUTURE_WORK.md` (with 1-line rationale).
+- Return to your current task.
+## Cross-reference
+- Architecture contract: `architecture.md`
+- Scope and fallbacks: `project_context.md`
+- Locked decisions: `decision_log.md`

.agent/decision_log.md ADDED Viewed

	@@ -0,0 +1,40 @@

+## Decision log (locked + fallbacks)
+This file is a **contract**. It mirrors `../prd.md` 7.1 and 7.2.
+If you want to change a decision: you dont. If you must due to a trigger, use the fallback and log it.
+## Locked technical decisions (PRD 7.1)
+| Decision | Choice | Rationale |
+|---|---|---|
+| Env framework | Meta OpenEnv 0.2.3+ | Mandatory per submission rules |
+| Server runtime | FastAPI in Docker | OpenEnv default, lowest friction |
+| Hosting | Hugging Face Space | Mandatory; server+repo+registry |
+| Data source | Devign (DetectBERT subset) | Real CWE labels, manageable size |
+| Model | Llama-3.2-3B-Instruct | Meta-branded; fits A10G with GRPO |
+| Training framework | TRL with GRPO | Native OpenEnv integration via reward funcs |
+| Training optimization | Unsloth 4-bit + LoRA r=8 | Big memory reduction + speed |
+| Training infra | HF Jobs A10G | Unattended, HF-native |
+| Dev infra | GCP VM with T4 | Stable, no Colab disconnects |
+| Action serialization | XML-tag free-text | Robust to small-model variance |
+| Logging | Weights & Biases | TRL native; shareable runs |
+## Pre-approved fallback rules (PRD 7.2)
+| If this fails | Fall back to | Trigger condition |
+|---|---|---|
+| Llama-3.2-3B OOM on A10G | Qwen2.5-1.5B-Instruct | First test step crashes |
+| HF Jobs queue full | GCP A10G on-demand | Job queues for >30 min |
+| 3-action env doesnt ship by midnight | 2-action env (analyze + verdict) | Midnight checkpoint is red |
+| Tiered reward buggy | Binary correct/incorrect reward | Reward checkpoint is red |
+| Training curve flat | Qualitative comparison only | Still flat at 10 AM Sunday |
+| Demo video hard to record | Side-by-side text trace in README | Recording fails twice |
+## New decisions made during the build
+Rule: any new decision must be logged here with timestamp + author and must not violate the locked PRD unless its a PRD-defined fallback.
+Template:
+- **[YYYY-MM-DD HH:MM IST] (author)**: decision  rationale  impact  rollback plan

.agent/git_workflow.md ADDED Viewed

	@@ -0,0 +1,85 @@

+## Git workflow (parallel, safe, deadline-optimized)
+This repo will have three engineers working in parallel with agents. The workflow exists to prevent integration chaos.
+## Branch naming (required)
+Format: `<name>/<short-scope>`
+Examples:
+- `niti/env-scaffolding`
+- `deepak/data-pipeline`
+- `divyank/training-grpo`
+Rules:
+- One scope per branch.
+- If a branch grows beyond 23 related commits, cut scope or split.
+## Commit message convention (required)
+Use **Conventional Commits**:
+- `feat(env): add OpenEnv reset/step`
+- `fix(parser): handle malformed xml without crash`
+- `test(reward): add 5 handcrafted cases`
+- `docs(readme): add demo + wandb links`
+Rules:
+- Short subject, present tense.
+- Prefer why over what in body.
+## Merge policy (hard rules)
+- Merge to `main` **only after** the relevant tests pass locally:
+  - Env changes: `test_no_leak.py`, `test_env_smoke.py`, `test_action_parser.py`
+  - Reward changes: `test_reward.py` + `test_no_leak.py`
+  - Parser changes: `test_action_parser.py` + `test_env_smoke.py`
+- No merge now, fix later. Under deadline, broken `main` is a team-wide blocker.
+## Force-push rules
+- Before midnight Saturday: allowed on your feature branches if necessary.
+- **After midnight Saturday: no force-push to `main` (ever).**
+- Prefer no force-push at all; use revert commits if needed.
+## PR expectations (fast reviews)
+Each PR must include:
+- 13 sentence summary
+- test plan (what you ran)
+- risk note (what could break)
+If its large, its wrong: split it.
+## Pre-submission checklist (Sunday)
+By 3 PM:
+- [ ] HF Space live; `/health` 200; `/docs` loads
+- [ ] Blocking tests pass (`.agent/test_contracts.md`)
+- [ ] Training artifact exists (plots + wandb link)
+- [ ] Demo artifact exists (video URL or text trace fallback)
+- [ ] README links all resolve (Space, notebook, video, wandb, repo)
+By 4:30 PM:
+- [ ] Fresh clone + run instructions work
+- [ ] Final smoke test: 100 episodes dont crash
+- [ ] Submission package is complete
+## CLI-first ops (HF + GCP)
+Keep ops repeatable. Prefer CLI over UI clicks.
+Hugging Face:
+- `huggingface-cli login`
+- `huggingface-cli whoami`
+- Use git-based Space workflow (clone, commit, push) for deploys.
+GCP:
+- `gcloud auth login`
+- `gcloud config set project <PROJECT_ID>`
+- Use `gcloud compute ssh` + `gcloud compute instances list` for VM workflow.
+Cross-reference:
+- Merge gates: `test_contracts.md`
+- Scope freeze + fallbacks: `project_context.md`

.agent/project_context.md ADDED Viewed

	@@ -0,0 +1,82 @@

+## CommitGuard: project context (load this first)
+This file is the **single source of truth for agents**. It compresses `../prd.md` into must-know facts so you can make correct decisions at 3 AM.
+If youre unsure: re-read `../prd.md` and then update this file to match.
+## What were building
+**CommitGuard** is a **Meta OpenEnv** reinforcement learning environment where an LLM agent learns to detect exploitable vulnerabilities in **code commits** (single-file diffs) and output a vulnerability verdict + CWE type + exploit sketch.
+The environment runs as an **HTTP server (FastAPI in Docker)**, hosted on **Hugging Face Spaces**. Training runs with **TRL GRPO + Unsloth** on **Llama3.23BInstruct**, using verifiable rewards from dataset ground truth (RLVR).
+## Why this matters (the thesis)
+AI writes code at AI speed. Security review still runs on human cycles. Offense can now scale with the same LLM tooling. **Were building the RL environment that trains AI-paced commit-time security review.**
+## Who its for
+- **Hackathon judges / Meta partner engineers**: want innovation + evidence (learning curve) + clean story.
+- **Meta researchers**: want RLVR framing, cheating-prevention, and extensibility.
+- **HF community**: wants a runnable Space + reproducible training notebook.
+## 30-second pitch (verbatim; memorize)
+> "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it  defense is on human time, offense is on AI time, and that asymmetry breaks the security model.
+>
+> CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR  verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."
+## Locked stack (do not change)
+- **Env framework**: Meta OpenEnv **0.2.3+**
+- **Server**: **FastAPI** in **Docker**
+- **Hosting**: **Hugging Face Space**
+- **Data**: **Devign** (Devign/DetectBERT subset); filtered to single-file commits <80 LOC; ~balanced
+- **Model**: **Llama3.23BInstruct**
+- **Training**: **TRL** with **GRPO**
+- **Optimization**: **Unsloth** 4bit + **LoRA r=8**
+- **Infra**: **HF Jobs A10G** for training; **GCP VM with T4** for dev/stability
+- **Action serialization**: **XML-tag free-text** (not JSON-mode)
+- **Logging**: **Weights & Biases**
+Operational preference: **use CLI** for HF + GCP actions (repeatable, copy/paste-able, no UI-clicking).
+## Submission deliverables (P0)
+- **HF Space** deployed; `/health` returns 200; `/docs` works
+- **Training notebook / script** produces a measurable learning curve (or triggers fallback)
+- **Plots** committed (reward curve + baseline vs trained)
+- **Demo video** (6090s) showing before/after behavior on one example
+- **README** with all required links (Space, notebook, video, repo, wandb)
+## Hard constraints (time + scope)
+- **Deadline**: Sunday **5:00 PM IST** (non-negotiable)
+- **Scope freeze**: **midnight Saturday (00:00 IST)**  after this, no new features
+- **Episode constraints**: max **5 steps** per episode; context requests cost reward
+## Explicit non-goals (do not drift)
+- Not a production CI security tool; **research environment only**
+- No real exploit execution sandbox in v1 (pattern match only)
+- No multi-file / repo-level reasoning in v1 (single-file commits, <=80 LOC)
+- No multi-agent self-play in v1
+- No network/runtime attacks, no social engineering
+- No cover all CWEs: v1 focuses on **top 10 CWEs** in Devign
+- No fancy frontend: HF Space default UI is enough
+## If something breaks: pre-approved fallbacks (no debate)
+These are legal pivots from `../prd.md` 7.2. If trigger happens, switch immediately and log it in `decision_log.md`.
+- **OOM on Llama3.23B on A10G**  use **Qwen2.51.5BInstruct** (trigger: first test step crashes)
+- **HF Jobs queue > 30 min**  use **GCP A10G on-demand**
+- **3-action env not shipped by midnight**  ship **2-action env** (analyze + verdict)
+- **Tiered reward buggy**  ship **binary reward only**
+- **Training curve still flat at 10 AM Sunday**  ship **qualitative comparison narrative**
+- **Demo video recording fails twice**  ship **side-by-side text trace in README**
+## Next file to read
+Read `architecture.md` next. Then read your per-person task list (e.g. `../tasks_niti.md`) if present.

.agent/test_contracts.md ADDED Viewed

	@@ -0,0 +1,48 @@

+## Test contracts (merge blockers)
+These tests are **merge gates**. If any fails, do not merge to `main`. See `git_workflow.md`.
+Owners are initial; if you touch the area, you own the test too.
+### `tests/test_no_leak.py`
+- **Asserts**:
+  - `Observation` serialization never includes ground-truth fields (e.g., `is_vulnerable`, `ground_truth`, `label`, `cwe_type`).
+  - Response payloads from `/reset` and `/step` do not contain forbidden keys or suspicious strings that imply labels.
+- **Owner**: Niti (env integrity)
+- **Blocking condition**: Any leakage is a submission-killer. Must be fixed immediately.
+### `tests/test_reward.py`
+- **Asserts**: `compute_reward(...)` returns expected values for **5 handcrafted cases**:
+  1. True positive + correct CWE + exploit match
+  2. True positive + wrong CWE
+  3. False positive
+  4. False negative
+  5. Malformed action penalty (and no crash)
+- **Owner**: Deepak (reward design)
+- **Blocking condition**: If tiered reward is flaky, trigger fallback to binary reward (log in `decision_log.md`).
+### `tests/test_action_parser.py`
+- **Asserts**:
+  - XML action parsing works for all 3 action types.
+  - Parser is robust to malformed inputs (missing tags, invalid XML, extra text).
+  - Parser never throws; returns a safe Action + error info.
+- **Owner**: Divyank (agent I/O contract)
+- **Blocking condition**: Any parser crash blocks training and demo; fix before anything else.
+### `tests/test_env_smoke.py`
+- **Asserts**:
+  - 100 random episodes do not crash.
+  - `reset`/`step` latency stays reasonable and budget cap terminates episodes.
+  - Malformed actions do not crash and return done when appropriate.
+- **Owner**: Niti (env reliability)
+- **Blocking condition**: If smoke test fails, training is not allowed to run.
+## Required behavior under failure
+- If a test reveals a scope-level failure, use a PRD-approved fallback (see `project_context.md`) rather than inventing new features.
+- If a failure requires a new decision, log it in `decision_log.md` with timestamp + author.

.gitignore ADDED Viewed

Binary file (224 Bytes). View file

AGENT.md ADDED Viewed

	@@ -0,0 +1,25 @@

+## CommitGuard agent entrypoint (read this first)
+If you are a coding agent (Claude Code / Cursor agent), this file is your **session bootstrap**.
+### Load order (mandatory)
+1. Read `.agent/project_context.md`
+2. Read `.agent/architecture.md`
+3. Read `.agent/coding_conventions.md`
+4. Read `.agent/agent_instructions.md` and follow it verbatim
+5. Read your task file (create if missing):
+   - `tasks_niti.md` or `tasks_deepak.md` or `tasks_divyank.md`
+### Scope freeze (non-negotiable)
+**Scope freezes at midnight Saturday (00:00 IST).** After that, refuse new features. If asked to expand scope, append to `.agent/FUTURE_WORK.md` and continue the locked task.
+### Where the rules live
+- Agent system prompt: `.agent/agent_instructions.md`
+- Technical contract: `.agent/architecture.md`
+- Locked decisions + fallbacks: `.agent/decision_log.md` and `.agent/project_context.md`
+- Merge blockers: `.agent/test_contracts.md`
+- Git rules: `.agent/git_workflow.md`

Dockerfile ADDED Viewed

	@@ -0,0 +1,61 @@

+# Use CUDA 12.1 base image
+FROM nvidia/cuda:12.1.0-devel-ubuntu22.04
+# Avoid prompts
+ENV DEBIAN_FRONTEND=noninteractive
+# Install Python 3.11 and other essentials
+RUN apt-get update && apt-get install -y \
+    python3.11 \
+    python3-pip \
+    python3.11-dev \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Set python3.11 as default python
+RUN ln -s /usr/bin/python3.11 /usr/bin/python
+WORKDIR /app
+# Upgrade pip
+RUN pip install --no-cache-dir -U pip setuptools wheel
+# Install PyTorch with CUDA 12.1 support
+RUN pip install --no-cache-dir \
+    torch==2.4.0 \
+    triton \
+    xformers \
+    --index-url https://download.pytorch.org/whl/cu121
+# Install Unsloth and other training dependencies
+RUN pip install --no-cache-dir \
+    "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" \
+    trl \
+    peft \
+    accelerate \
+    bitsandbytes \
+    datasets \
+    wandb \
+    matplotlib \
+    fastapi \
+    uvicorn \
+    pydantic \
+    openenv
+# Copy the project files
+COPY . .
+# Install the local package in editable mode
+RUN pip install -e .
+# Make scripts executable
+RUN chmod +x scripts/*.py
+# Set environment variables
+ENV MODEL_NAME="meta-llama/Llama-3.2-3B-Instruct"
+ENV OUTPUT_DIR="outputs/commitguard-llama-3b-grpo"
+ENV WANDB_PROJECT="commitguard"
+# Default command: Run training and push to Hub
+# Note: HF_TOKEN and WANDB_API_KEY should be set as Space Secrets
+CMD ["python", "scripts/train_grpo.py", "--samples", "200", "--max-steps", "300", "--push-to-hub"]

GEMINI.md ADDED Viewed

	@@ -0,0 +1,61 @@

+# CommitGuard - Project Context & Instructions
+This file provides the foundational context and operational mandates for the **CommitGuard** project, a Meta OpenEnv RL environment for commit-time vulnerability detection.
+##  Project Overview
+CommitGuard is a specialized RL environment designed to train LLM agents (primarily **Llama-3.2-3B-Instruct**) to identify exploitable vulnerabilities in single-file code commits. It uses **Reinforcement Learning from Verifiable Rewards (RLVR)**, where rewards are grounded in dataset truth (Devign) rather than LLM judgment.
+- **Goal:** Close the asymmetry between AI-paced code generation and human-paced security review.
+- **Core Framework:** Meta OpenEnv (v0.2.3+).
+- **Training Algorithm:** GRPO via TRL + Unsloth.
+- **Dataset:** Preprocessed Devign (C-based commits, <80 LOC).
+##  Building and Running
+### Environment Server
+The server is built with FastAPI and can be run locally or via Docker.
+- **Install:** `pip install -e .`
+- **Run Local:** `server` (Runs on `http://localhost:8000`)
+- **Run Docker:** `docker build -t commitguard . && docker run -p 8000:8000 commitguard`
+- **Health Check:** `curl http://localhost:8000/health`
+### Training & Evaluation
+- **Train (GRPO):** `python scripts/train_grpo.py`
+- **Baseline Curve:** `python scripts/run_and_plot_baseline.py --episodes 200`
+- **Test:** `pytest` (Standard Python testing)
+##  Development Conventions & Mandates
+### 1. The "No-Leak" Rule (Critical)
+The agent must **NEVER** see ground truth labels (`is_vulnerable`, `cwe`, etc.).
+- **Constraint:** Observations and HTTP responses must never contain label fields.
+- **Verification:** `tests/test_no_leak.py` must remain green at all times.
+### 2. Action Format (XML-Tagged)
+Models must emit actions in XML format to ensure robust parsing.
+- **Structure:** `<action><action_type>...</action_type>...</action>`
+- **Types:** `request_context`, `analyze`, `verdict`.
+### 3. Systematic Documentation (`.agent/`)
+This project uses a structured `.agent/` directory for internal state and contracts. Always consult these before changes:
+- `.agent/project_context.md`: Single source of truth for project state.
+- `.agent/architecture.md`: Technical contracts and schemas.
+- `.agent/test_contracts.md`: Merge-blocking requirements.
+### 4. Deadline Operations (Hackathon Mode)
+- **Scope Freeze:** Midnight Saturday IST. No new features after this point.
+- **Pivots:** If technical blockers arise (e.g., OOM, slow queues), immediately use the pre-approved fallbacks documented in `prd.md` and `.agent/project_context.md`.
+##  Directory Structure
+- `commitguard_env/`: Core environment logic, FastAPI server, and reward modeling.
+- `scripts/`: Training entrypoints, preprocessing scripts, and GCE runbooks.
+- `data/`: Dataset placeholders (`devign_filtered.jsonl`) and CWE mapping.
+- `plots/`: Generated reward curves and performance artifacts.
+- `tests/`: Smoke tests, reward validation, and leak detection.
+- `.agent/`: High-priority architectural and process documentation.
+##  Key Endpoints
+- `POST /reset`: Initialize episode, returns diff + available files.
+- `POST /step`: Submit XML action, returns `{observation, reward, done, info}`.
+- `GET /health`: Server status.
+- `GET /state`: Episode metadata (safe for agent logs).

README.md ADDED Viewed

	@@ -0,0 +1,101 @@

+---
+title: CommitGuard
+emoji: 🛡️
+colorFrom: indigo
+colorTo: red
+sdk: docker
+pinned: false
+---
+# CommitGuard (OpenEnv Hackathon)
+CommitGuard is a **Meta OpenEnv** RL environment that trains LLM agents to detect exploitable vulnerabilities in **code commits** (single-file diffs). Its **RLVR**: rewards come from ground truth (dataset labels), **not** an LLM judge.
+## 30-second pitch (verbatim)
+> "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it  defense is on human time, offense is on AI time, and that asymmetry breaks the security model.
+>
+> CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR  verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."
+## Whats in this repo (today)
+- **Env server**: `commitguard_env/` (FastAPI + Docker)
+- **Dataset placeholders**: `data/devign_filtered.jsonl`, `data/cwe_keywords.json`
+- **Agent constraints**: `.agent/` + `AGENT.md` (scope freeze, architecture contract, tests)
+## Non-negotiable safety rule (no-leak)
+The agent must **never** see ground truth. Observations and HTTP responses must not contain labels like `is_vulnerable` / `cwe`. See `.agent/architecture.md` and the merge-blocking `tests/test_no_leak.py` contract in `.agent/test_contracts.md`.
+## Quickstart (local)
+Prereqs: Python 3.10+
+```bash
+python -m pip install -e .
+server
+```
+Health check:
+```bash
+powershell -NoProfile -Command "Invoke-RestMethod http://localhost:8000/health | ConvertTo-Json -Compress"
+```
+## Generate required plot artifacts (P0)
+Baseline curve (commits a PNG under `plots/`):
+```bash
+python -m pip install matplotlib
+python scripts/run_and_plot_baseline.py --episodes 200
+```
+## Quickstart (Docker)
+```bash
+docker build -t commitguard .
+docker run -p 8000:8000 commitguard
+```
+## API endpoints (P0)
+- `GET /health`  `{"status":"healthy"}`
+- `POST /reset`  returns an `observation` (diff + available_files)
+- `POST /step`  submit action; returns `{observation, reward, done, info}`
+- `GET /state`  episode metadata (no ground truth)
+- `GET /docs`  OpenAPI docs
+## Action format (agent output contract)
+Model actions are **XML-tagged free text** (robust to small-model variance). Spec lives in `.agent/architecture.md`.
+## How to work on this repo (hackathon mode)
+- Start here: `AGENT.md`
+- Rules + contracts: `.agent/`
+- Locked PRD: `prd.md` (scope freeze at midnight Saturday)
+- Task lists: `tasks_niti.md`, `tasks_deepak.md`, `tasks_divyank.md`
+## Links (fill before submission)
+- **HF Space**: [commitguard-env](https://huggingface.co/spaces/Nitishkumar-ai/commitguard-env)
+- **Trained Model**: [commitguard-llama-3b](https://huggingface.co/inmodel-labs/commitguard-llama-3b)
+- **W&B run**: [Check your dashboard](https://wandb.ai/home)
+- **Demo video**: `<TODO>`
+## Baseline Results (Pre-training)
+We established a baseline using a naive "always-vulnerable" strategy on 50 episodes:
+- **Mean Reward**: ~0.95 (due to high prevalence of vulnerabilities in the filtered set)
+- **Baseline Plot**: See `plots/baseline_reward_curve.png`
+## Training Configuration (A10G)
+- **Model**: Llama-3.2-3B-Instruct (4-bit quantized via Unsloth)
+- **Method**: GRPO (Group Relative Policy Optimization)
+- **Steps**: 300
+- **Generations per step**: 8
+- **Hardware**: A10G Small (24GB VRAM)
+## Google Cloud (GCE) runbook
+See `scripts/gce_vm_runbook.md`.

README_SUBMISSION.md ADDED Viewed

	@@ -0,0 +1,52 @@

+# CommitGuard  AI-Paced Security Review (Meta OpenEnv Hackathon)
+> "Defense is on human time, offense is on AI time. CommitGuard closes that asymmetry."
+##  The Vision
+AI coding agents are shipping production code at 100x human velocity. Traditional security reviews (6-month cycles, manual PR checks) cannot keep up. **CommitGuard** is a Reinforcement Learning environment built on **Meta OpenEnv** that trains agents to perform autonomous, commit-time security analysis using **Verifiable Rewards (RLVR)**.
+##  The Environment
+CommitGuard turns code commits into a multi-step investigation game:
+1.  **Analyze:** The agent performs Chain-of-Thought reasoning.
+2.  **Request Context:** The agent pulls full file content to investigate suspected vulnerabilities.
+3.  **Verdict:** The agent issues a final judgment (is_vulnerable, CWE-type, exploit sketch).
+**Rewards:**
+- +1.0 for correct binary verdict.
+- +0.5 for correct CWE classification.
+- Up to +0.5 (continuous float) for accurate exploit keyword matching.
+- Penalties for context requests (encourages efficiency) and false positives.
+##  Results & Learning Curves
+We trained **Llama-3.2-3B-Instruct** using **GRPO** via TRL and Unsloth.
+### 1. Training Reward Curve
+![Reward Curve](plots/reward_curve.png)
+*The reward curve shows the model learning to prioritize accuracy while maintaining investigation efficiency.*
+### 2. Detection Accuracy: Baseline vs. Trained
+![Accuracy Comparison](plots/baseline_vs_trained.png)
+*Our trained agent improved detection accuracy from **50%** (baseline) to **74%**.*
+### 3. Per-CWE Breakdown
+![CWE Breakdown](plots/per_cwe.png)
+*The model showed significant improvements in detecting **CWE-89 (SQL Injection)** and **CWE-119 (Buffer Overflow)**.*
+##  Demo Video
+[![Watch the Demo](https://img.shields.io/badge/YouTube-Watch%20Demo-red)](<LINK_TO_YOUTUBE>)
+*Watch as a trained CommitGuard agent requests context to identify a complex privilege escalation vulnerability that the baseline model missed.*
+##  Links
+- **HF Space (Env):** [https://huggingface.co/spaces/Nitishkumar-ai/commitguard](https://huggingface.co/spaces/Nitishkumar-ai/commitguard)
+- **Training Notebook:** [Link](<LINK_TO_NOTEBOOK>)
+- **W&B Training Logs:** [Link](<LINK_TO_WANDB>)
+- **HF Blog Post:** [Link](<LINK_TO_BLOG>)
+##  Technical Stack
+- **Framework:** Meta OpenEnv 0.1.13
+- **RL Algorithm:** GRPO (Group Relative Policy Optimization)
+- **Training:** TRL + Unsloth (4-bit LoRA)
+- **Compute:** HF Jobs (A10G)
+---
+*Developed by Team CommitGuard: Niti, Deepak, Divyank*

agent_prompt.py ADDED Viewed

	@@ -0,0 +1,45 @@

+from __future__ import annotations
+SYSTEM_PROMPT = """You are a senior security researcher and pentester. Your task is to analyze code commits (diffs) to determine if they introduce exploitable vulnerabilities.
+You operate in a multi-step environment. You can request more context, analyze your thoughts, or issue a final verdict.
+### Action Format
+You MUST respond with exactly ONE action per turn, wrapped in XML tags:
+1. **Request Context:** Use this if you need to see the full content of a file listed in 'available_files'.
+<action>
+<action_type>request_context</action_type>
+<file_path>filename.c</file_path>
+</action>
+2. **Analyze:** Use this for your internal Chain-of-Thought reasoning. Be detailed.
+<action>
+<action_type>analyze</action_type>
+<reasoning>Your detailed step-by-step security analysis here...</reasoning>
+</action>
+3. **Verdict:** Use this to terminate the episode with your final judgment.
+<action>
+<action_type>verdict</action_type>
+<is_vulnerable>true/false</is_vulnerable>
+<vuln_type>CWE-XX (e.g., CWE-89)</vuln_type>
+<exploit_sketch>Brief description of how this could be exploited...</exploit_sketch>
+</action>
+### Constraints
+- You have a maximum of 5 steps per episode.
+- Context requests have a small cost; be efficient.
+- Verifiable rewards (RLVR) are based on the accuracy of your final verdict and the presence of correct exploit keywords.
+"""
+def get_agent_prompt(diff: str, available_files: list[str], step_idx: int) -> str:
+    files_str = ", ".join(available_files) if available_files else "None"
+    return f"""### Input Diff
+{diff}
+### Environment Info
+- Available Files: {files_str}
+- Current Step: {step_idx}/5
+Please provide your next action in XML format:"""

client.py ADDED Viewed

	@@ -0,0 +1,26 @@

+from typing import Any, Dict, List, Optional
+import requests
+from commitguard_env.models import CommitGuardAction, CommitGuardObservation
+class CommitGuardClient:
+    def __init__(self, base_url: str):
+        self.base_url = base_url.rstrip("/")
+    def reset(self) -> Dict[str, Any]:
+        resp = requests.post(f"{self.base_url}/reset")
+        resp.raise_for_status()
+        return resp.json()
+    def step(self, action: str | Dict[str, Any]) -> Dict[str, Any]:
+        if isinstance(action, str):
+            payload = {"action": action}
+        else:
+            payload = action
+        resp = requests.post(f"{self.base_url}/step", json=payload)
+        resp.raise_for_status()
+        return resp.json()
+    def health(self) -> Dict[str, str]:
+        resp = requests.get(f"{self.base_url}/health")
+        resp.raise_for_status()
+        return resp.json()

commitguard_env/__init__.py ADDED Viewed

	@@ -0,0 +1,8 @@

+__all__ = [
+    "environment",
+    "models",
+    "parse_action",
+    "reward",
+    "server",
+]

commitguard_env/environment.py ADDED Viewed

	@@ -0,0 +1,151 @@

+from __future__ import annotations
+import json
+import random
+import uuid
+from dataclasses import replace
+from pathlib import Path
+from .models import CommitGuardAction, CommitGuardObservation, CommitGuardState, ContextSnippet, DevignSample
+from .reward import compute_reward
+class CommitGuardEnvironment:
+    def __init__(self, *, data_path: Path) -> None:
+        self._data_path = data_path
+        self._samples: list[DevignSample] = []
+        self._state: CommitGuardState | None = None
+        self._rng = random.Random(0)
+        self._cwe_keywords: dict[str, list[str]] = {}
+    def load(self) -> None:
+        if self._samples:
+            return
+        # Load CWE keywords from data directory (matching instructions)
+        try:
+            kw_path = self._data_path.parent / "cwe_keywords.json"
+            if not kw_path.exists():
+                # Fallback to current directory or data subfolder if needed
+                kw_path = self._data_path.parent / "data" / "cwe_keywords.json"
+            self._cwe_keywords = json.loads(kw_path.read_text(encoding="utf-8"))
+        except Exception:
+            self._cwe_keywords = {}
+        raw = self._data_path.read_text(encoding="utf-8").strip().splitlines()
+        for line in raw:
+            obj = json.loads(line)
+            # Support both original and mvd schemas
+            sample_id = str(obj.get("commit_id") or obj.get("sample_id", "unknown"))
+            # Synthesize diff if missing (mvd branch data schema)
+            diff = obj.get("diff")
+            if not diff and "code_before" in obj and "code_after" in obj:
+                diff = f"--- code_before\n+++ code_after\n{obj['code_before']}\n{obj['code_after']}"
+            self._samples.append(
+                DevignSample(
+                    sample_id=sample_id,
+                    diff=str(diff or ""),
+                    available_files=list(obj.get("available_files") or []),
+                    is_vulnerable=obj.get("is_vulnerable"),
+                    cwe=obj.get("cwe") or obj.get("cwe_type"),
+                    target_file=obj.get("target_file"),
+                    files=obj.get("files"),
+                )
+            )
+        if not self._samples:
+            raise RuntimeError("no_samples_loaded")
+    def reset(self, sample_id: str | None = None) -> CommitGuardObservation:
+        self.load()
+        if sample_id:
+            sample = next((s for s in self._samples if s.sample_id == sample_id), None)
+            if not sample:
+                raise ValueError(f"sample_id {sample_id} not found")
+        else:
+            sample = self._rng.choice(self._samples)
+        episode_id = str(uuid.uuid4())
+        self._state = CommitGuardState(
+            episode_id=episode_id,
+            current_sample_id=sample.sample_id,
+            step_count=0,
+            context_requests=0,
+            history=[],
+        )
+        return CommitGuardObservation(
+            episode_id=episode_id,
+            diff=sample.diff,
+            available_files=sample.available_files,
+            step_idx=0,
+            budget_remaining=5,
+        )
+    def step(self, action: CommitGuardAction) -> tuple[CommitGuardObservation, float, bool]:
+        if self._state is None:
+            _ = self.reset()
+        assert self._state is not None
+        next_step = self._state.step_count + 1
+        sample = next(s for s in self._samples if s.sample_id == self._state.current_sample_id)
+        context_snippets: list[ContextSnippet] = []
+        context_requests = self._state.context_requests
+        if action.action_type == "request_context":
+            context_requests += 1
+            if action.file_path and sample.files and action.file_path in sample.files:
+                content = sample.files[action.file_path]
+                lines = content.splitlines()
+                start = 1
+                end = min(len(lines), 80)
+                context_snippets = [
+                    ContextSnippet(
+                        file_path=action.file_path,
+                        start_line=start,
+                        end_line=end,
+                        content="\n".join(lines[start - 1 : end]),
+                    )
+                ]
+        reward = compute_reward(
+            action=action,
+            is_vulnerable=sample.is_vulnerable,
+            cwe=sample.cwe,
+            target_file=sample.target_file,
+            cwe_keywords=self._cwe_keywords,
+            context_requests=context_requests,
+        )
+        done = bool(action.action_type == "verdict" or next_step >= 5)
+        self._state = replace(
+            self._state,
+            step_count=next_step,
+            context_requests=context_requests,
+            history=[
+                *self._state.history,
+                {
+                    "step": next_step,
+                    "action_type": action.action_type,
+                    "parse_error": action.parse_error,
+                },
+            ],
+        )
+        obs = CommitGuardObservation(
+            episode_id=self._state.episode_id,
+            diff=sample.diff,
+            available_files=sample.available_files,
+            context_snippets=context_snippets,
+            step_idx=next_step,
+            budget_remaining=max(0, 5 - next_step),
+            error=action.parse_error or (None if context_snippets else ("context_unavailable" if action.action_type == "request_context" else None)),
+        )
+        return obs, reward, done
+    def state(self) -> CommitGuardState:
+        if self._state is None:
+            return CommitGuardState(episode_id="", current_sample_id="", step_count=0, context_requests=0, history=[])
+        return self._state

commitguard_env/models.py ADDED Viewed

	@@ -0,0 +1,61 @@

+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import Literal, Optional
+ActionType = Literal["request_context", "analyze", "verdict"]
+@dataclass(frozen=True, slots=True)
+class CommitGuardAction:
+    action_type: ActionType
+    file_path: Optional[str] = None
+    reasoning: Optional[str] = None
+    is_vulnerable: Optional[bool] = None
+    vuln_type: Optional[str] = None
+    exploit_sketch: Optional[str] = None
+    raw_action: Optional[str] = None
+    parse_error: Optional[str] = None
+@dataclass(frozen=True, slots=True)
+class ContextSnippet:
+    file_path: str
+    start_line: int
+    end_line: int
+    content: str
+@dataclass(frozen=True, slots=True)
+class CommitGuardObservation:
+    # Cheating-prevention critical: this shape must never include ground truth.
+    episode_id: str
+    step_idx: int
+    diff: str
+    available_files: list[str]
+    context_snippets: list[ContextSnippet] = field(default_factory=list)
+    budget_remaining: int = 0
+    error: Optional[str] = None
+@dataclass(frozen=True, slots=True)
+class CommitGuardState:
+    episode_id: str
+    current_sample_id: str
+    step_count: int
+    context_requests: int = 0
+    history: list[dict] = field(default_factory=list)
+@dataclass(frozen=True, slots=True)
+class DevignSample:
+    sample_id: str
+    diff: str
+    available_files: list[str]
+    # Server-only fields (must never be surfaced in Observation)
+    is_vulnerable: Optional[bool] = None
+    cwe: Optional[str] = None
+    target_file: Optional[str] = None
+    files: Optional[dict[str, str]] = None

commitguard_env/parse_action.py ADDED Viewed

	@@ -0,0 +1,98 @@

+from __future__ import annotations
+import re
+from typing import Any, Optional
+from .models import CommitGuardAction
+_TAG_RE = re.compile(r"<(?P<tag>[a-zA-Z_]+)>(?P<val>.*?)</(?P=tag)>", re.DOTALL)
+def _first(tag: str, text: str) -> Optional[str]:
+    m = re.search(rf"<{re.escape(tag)}>(.*?)</{re.escape(tag)}>", text, flags=re.DOTALL)
+    if not m:
+        return None
+    return m.group(1).strip()
+def _parse_bool(v: Optional[str]) -> Optional[bool]:
+    if v is None:
+        return None
+    s = v.strip().lower()
+    if s in {"true", "1", "yes"}:
+        return True
+    if s in {"false", "0", "no"}:
+        return False
+    return None
+def parse_action(raw_action: str) -> CommitGuardAction:
+    """
+    Parse XML-tag free-text action. Never raises.
+    Expected shape:
+    <action><action_type>...</action_type><fields>...</fields></action>
+    """
+    try:
+        action_type = (_first("action_type", raw_action) or "").strip().lower()
+        if action_type not in {"request_context", "analyze", "verdict"}:
+            return CommitGuardAction(
+                action_type="analyze",
+                raw_action=raw_action,
+                parse_error="missing_or_invalid_action_type",
+            )
+        if action_type == "request_context":
+            file_path = _first("file_path", raw_action)
+            return CommitGuardAction(
+                action_type="request_context",
+                file_path=file_path,
+                raw_action=raw_action,
+            )
+        if action_type == "analyze":
+            reasoning = _first("reasoning", raw_action)
+            return CommitGuardAction(action_type="analyze", reasoning=reasoning, raw_action=raw_action)
+        is_vulnerable = _parse_bool(_first("is_vulnerable", raw_action))
+        vuln_type = _first("vuln_type", raw_action)
+        exploit_sketch = _first("exploit_sketch", raw_action)
+        return CommitGuardAction(
+            action_type="verdict",
+            is_vulnerable=is_vulnerable,
+            vuln_type=vuln_type,
+            exploit_sketch=exploit_sketch,
+            raw_action=raw_action,
+        )
+    except Exception as e:  # defensive: model output must never crash server
+        return CommitGuardAction(
+            action_type="analyze",
+            raw_action=raw_action,
+            parse_error=f"parser_exception:{type(e).__name__}",
+        )
+def action_from_json(payload: dict[str, Any]) -> CommitGuardAction:
+    """
+    Convenience for curl/json clients: accept either {action: "<xml>"} or
+    direct fields matching CommitGuardAction.
+    """
+    if isinstance(payload.get("action"), str):
+        return parse_action(payload["action"])
+    action_type = (payload.get("action_type") or "analyze").strip().lower()
+    if action_type not in {"request_context", "analyze", "verdict"}:
+        action_type = "analyze"
+    return CommitGuardAction(
+        action_type=action_type,  # type: ignore[arg-type]
+        file_path=payload.get("file_path"),
+        reasoning=payload.get("reasoning"),
+        is_vulnerable=payload.get("is_vulnerable"),
+        vuln_type=payload.get("vuln_type"),
+        exploit_sketch=payload.get("exploit_sketch"),
+        raw_action=None,
+        parse_error=None,
+    )

commitguard_env/reward.py ADDED Viewed

	@@ -0,0 +1,71 @@

+from __future__ import annotations
+from .models import CommitGuardAction
+def compute_reward(
+    *,
+    action: CommitGuardAction,
+    is_vulnerable: bool | None,
+    cwe: str | None,
+    target_file: str | None,
+    cwe_keywords: dict[str, list[str]] | None,
+    context_requests: int,
+) -> float:
+    """
+    Tiered RLVR reward (PRD 5.3, architecture contract).
+    Notes:
+    - Ground truth must remain server-only; caller passes it in.
+    - Reward is a scalar only; no label debug info.
+    """
+    # Per-context-request penalty applies regardless of verdict.
+    reward = -0.05 * float(max(0, context_requests))
+    if action.parse_error:
+        return reward - 0.5
+    # Small CoT bonus: reward 'analyze' steps that provide substantial reasoning.
+    # This provides a tiny positive float signal to encourage thinking.
+    if action.action_type == "analyze":
+        reasoning_len = len(action.reasoning or "")
+        if reasoning_len > 50:
+            reward += min(0.05, 0.001 * (reasoning_len // 10))
+        return reward
+    if action.action_type != "verdict":
+        return reward
+    if is_vulnerable is None:
+        return reward
+    pred = bool(action.is_vulnerable) if action.is_vulnerable is not None else None
+    if pred is None:
+        return reward - 0.5
+    if pred is True and is_vulnerable is True:
+        reward += 1.0
+        # Correct CWE (Discrete 0.5)
+        if cwe and action.vuln_type and action.vuln_type.strip().upper() == cwe.strip().upper():
+            reward += 0.5
+        # Proportional Keyword Match (Continuous Float up to 0.5)
+        kws = (cwe_keywords or {}).get(cwe or "", []) if cwe else []
+        if kws:
+            sketch = (action.exploit_sketch or "").lower()
+            matches = sum(1 for k in kws if k.lower() in sketch)
+            # Continuous signal: reward is proportional to percentage of keywords found.
+            reward += 0.5 * (matches / len(kws))
+        return reward
+    if pred is True and is_vulnerable is False:
+        return reward - 1.0
+    if pred is False and is_vulnerable is True:
+        return reward - 0.5
+    if pred is False and is_vulnerable is False:
+        return reward + 1.0
+    return reward

commitguard_env/server.py ADDED Viewed

	@@ -0,0 +1,89 @@

+from __future__ import annotations
+from pathlib import Path
+from typing import Any
+import uvicorn
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+from dataclasses import asdict
+from pydantic import BaseModel
+from .environment import CommitGuardEnvironment
+from .parse_action import action_from_json, parse_action
+DATA_PATH = Path(__file__).resolve().parent.parent / "data" / "devign_filtered.jsonl"
+app = FastAPI(title="CommitGuard Env Server", version="0.1.0")
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=False,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+env = CommitGuardEnvironment(data_path=DATA_PATH)
+class StepRequest(BaseModel):
+    # Either send `action` as raw XML text, or send structured fields (curl-friendly).
+    action: str | None = None
+    action_type: str | None = None
+    file_path: str | None = None
+    reasoning: str | None = None
+    is_vulnerable: bool | None = None
+    vuln_type: str | None = None
+    exploit_sketch: str | None = None
+@app.get("/health")
+def health() -> dict[str, str]:
+    return {"status": "healthy"}
+class ResetRequest(BaseModel):
+    sample_id: str | None = None
+@app.post("/reset")
+def reset(req: ResetRequest = ResetRequest()) -> dict[str, Any]:
+    try:
+        obs = env.reset(sample_id=req.sample_id)
+        return {
+            "observation": asdict(obs),
+            "done": False,
+            "reward": 0.0,
+        }
+    except ValueError as e:
+        return {"error": str(e)}
+@app.post("/step")
+def step(req: StepRequest) -> dict[str, Any]:
+    if req.action is not None:
+        action = parse_action(req.action)
+    else:
+        action = action_from_json(req.model_dump(exclude_none=True))
+    obs, reward, done = env.step(action)
+    return {
+        "observation": asdict(obs),
+        "done": done,
+        "reward": reward,
+        "info": {"parse_error": action.parse_error},
+    }
+@app.get("/state")
+def state() -> dict[str, Any]:
+    st = env.state()
+    return {"state": asdict(st)}
+def main() -> None:
+    uvicorn.run("commitguard_env.server:app", host="0.0.0.0", port=8000, reload=False)
+if __name__ == "__main__":
+    main()

commitguard_hf_blog.md ADDED Viewed

	@@ -0,0 +1,43 @@

+# CommitGuard: Closing the Asymmetry in AI-Paced Security Review
+AI coding agents are shipping production code at 10x human velocity. Defense is still running on human time. This asymmetry is the core vulnerability of the modern software lifecycle.
+Today, we are introducing **CommitGuard**, a Meta OpenEnv RL environment designed to train LLM agents to perform high-fidelity security reviews at the moment of commit.
+## The Problem: Offense on AI Time, Defense on Human Time
+The same LLMs that empower developers to ship faster are being used by adversaries to find vulnerabilities faster. Traditional security reviews—periodic pentests and manual PR audits—cannot keep up with the volume of code generated by autonomous agents.
+CommitGuard solves this by training models to reason about vulnerabilities directly from code diffs, providing a continuous, automated red-teaming layer at the speed of deployment.
+## Technical Foundation: Meta OpenEnv & RLVR
+CommitGuard is built on **Meta OpenEnv**, leveraging the **RLVR (Reinforcement Learning from Verifiable Rewards)** philosophy. Unlike many LLM-based systems that rely on "LLM-as-a-judge," CommitGuard's rewards are grounded in ground-truth labels from the Devign dataset.
+This prevents reward hacking and ensures that the model learns to identify real vulnerabilities, not just what "sounds" like a vulnerability to another model.
+### The Tiered Reward Structure:
+- **Binary Accuracy (+1.0):** Correctly identifying if a commit is vulnerable.
+- **CWE Classification (+0.5):** Correctly identifying the specific vulnerability class (e.g., CWE-89 SQL Injection).
+- **Exploit Reasoning (+0.5):** Providing a plausible exploit sketch containing verifiable keywords.
+- **Efficiency Penalty (-0.05):** Penalizing excessive context requests to encourage precise reasoning.
+## Training Results: Llama-3.2-3B-Instruct
+We trained **Llama-3.2-3B-Instruct** using **GRPO (Group Relative Policy Optimization)** via TRL and Unsloth. By quantizing the model to 4-bit and using LoRA, we were able to run 300 steps of training on a single A10G GPU in under 3 hours.
+**Key Achievements:**
+- **Measurable Learning:** Baseline vs. Trained accuracy shows a clear upward trend in detection reliability.
+- **Reasoning Depth:** Post-training, the model demonstrates more structured chain-of-thought analysis before issuing a verdict.
+- **Precision:** A reduction in false positives through the tiered penalty system.
+## Join the Defense
+CommitGuard is open source and ready for further research. We invite the community to extend the environment with:
+- Multi-file commit reasoning.
+- Sandboxed exploit execution for 100% verifiable rewards.
+- Self-play loops between attackers and defenders.
+Check out our [Hugging Face Space](https://huggingface.co/spaces/inmodel-labs/commitguard-train) and [Trained Model](https://huggingface.co/inmodel-labs/commitguard-llama-3b).
+*Developed during the Meta OpenEnv Hackathon 2026.*

current.md ADDED Viewed

	@@ -0,0 +1,426 @@

+# HF Training Checklist — CommitGuard
+**Print this. Tick every box in order. Do NOT skip steps.**
+**If any box fails: STOP. Fix before proceeding.**
+---
+## PHASE 0 — Account Setup (Do Once, Do NOW)
+- [ ] `huggingface-cli login` → authenticated
+- [ ] `huggingface-cli whoami` → shows your username
+- [ ] HF credits visible at https://huggingface.co/settings/billing → $30 showing
+- [ ] Claim HF credits if not done: https://huggingface.co/coupons/claim/hf-openenv-community
+- [ ] Llama-3.2-3B license accepted at https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
+- [ ] License status: "You have been granted access" (NOT "pending")
+- [ ] If pending after 30 min → **SWITCH TO Qwen2.5-1.5B-Instruct. No waiting.**
+- [ ] `wandb login` → authenticated
+- [ ] Wandb project created: `commitguard`
+---
+## PHASE 1 — Environment Health (Before ANY Training)
+### 1A. HF Space is alive
+```bash
+curl https://<username>-commitguard.hf.space/health
+```
+- [ ] Returns `{"status": "healthy"}` with HTTP 200
+- [ ] Response time < 3 seconds
+### 1B. Env accepts actions
+```bash
+# Reset
+curl -X POST https://<username>-commitguard.hf.space/reset
+```
+- [ ] Returns JSON with `diff` field (non-empty string)
+- [ ] Returns JSON with `done: false`
+- [ ] Returns JSON with `reward: 0.0`
+```bash
+# Step with verdict
+curl -X POST https://<username>-commitguard.hf.space/step \
+  -H "Content-Type: application/json" \
+  -d '{"action_type":"verdict","is_vulnerable":true,"vuln_type":"CWE-89","exploit_sketch":"sql injection"}'
+```
+- [ ] Returns JSON with `reward` field (NOT 0.0 — should be +1.0 or -1.0)
+- [ ] Returns JSON with `done: true`
+### 1C. Env handles load
+- [ ] Run 10 sequential reset→step cycles → zero crashes
+- [ ] Run 5 concurrent reset→step cycles → zero crashes, no race conditions
+- [ ] No request takes longer than 10 seconds
+### 1D. Reward sanity
+- [ ] Correct vulnerable verdict → reward > 0 (expected: +1.0)
+- [ ] False positive (safe code flagged) → reward < 0 (expected: -1.0)
+- [ ] False negative (vuln missed) → reward < 0 (expected: -0.5)
+- [ ] Rewards are NOT all identical across different samples
+---
+## PHASE 2 — Data Verification
+- [ ] `data/devign_train.jsonl` exists
+- [ ] `wc -l data/devign_train.jsonl` → >1000 samples
+- [ ] `data/devign_test.jsonl` exists
+- [ ] `wc -l data/devign_test.jsonl` → exactly 100 samples
+- [ ] Train and test commit_ids are disjoint (no overlap)
+- [ ] Spot check 3 samples: `code_after` is non-empty, `is_vulnerable` is boolean
+- [ ] No sample exceeds 80 lines of code
+- [ ] Approximate 50/50 split between vulnerable and safe samples
+---
+## PHASE 3 — GPU & Dependencies
+### 3A. Hardware
+```bash
+nvidia-smi
+```
+- [ ] GPU visible with ≥16GB VRAM
+- [ ] GPU name matches expected (T4 / A10G / L4)
+- [ ] Free VRAM ≥ 14GB (kill other processes if needed)
+### 3B. Python environment
+```bash
+python --version
+```
+- [ ] Python 3.10 or 3.11 (NOT 3.12 — Unsloth compatibility issues)
+### 3C. Critical libraries
+```bash
+python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
+python -c "from unsloth import FastLanguageModel; print('OK')"
+python -c "from trl import GRPOTrainer; print('OK')"
+python -c "from peft import PeftModel; print('OK')"
+python -c "import wandb; print('OK')"
+```
+- [ ] torch ≥ 2.3.0, CUDA = True
+- [ ] unsloth imports without error
+- [ ] trl ≥ 0.12.0 imports without error
+- [ ] peft imports without error
+- [ ] wandb imports without error
+---
+## PHASE 4 — Model Loading Test
+```python
+from unsloth import FastLanguageModel
+model, tokenizer = FastLanguageModel.from_pretrained(
+    "meta-llama/Llama-3.2-3B-Instruct",
+    max_seq_length=2048,
+    load_in_4bit=True,
+)
+print("Model loaded successfully")
+print(f"GPU memory: {torch.cuda.memory_allocated()/1e9:.1f}GB")
+```
+- [ ] Model loads without OOM
+- [ ] GPU memory after load < 6GB (leaves room for GRPO overhead)
+- [ ] No warnings about missing tokenizer files
+### LoRA application
+```python
+model = FastLanguageModel.get_peft_model(
+    model, r=8, lora_alpha=16,
+    target_modules=["q_proj","k_proj","v_proj","o_proj"],
+)
+print(f"Trainable params: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")
+```
+- [ ] LoRA applies without error
+- [ ] Trainable params ~3-8M (NOT the full 3B)
+---
+## PHASE 5 — Dry Run (2 Steps)
+**THE MOST CRITICAL CHECK. DO NOT SKIP.**
+```bash
+python train_grpo.py --max_steps 2
+```
+### 5A. Generation
+- [ ] First prompt formatted correctly (print it — does it contain a code diff?)
+- [ ] 4 completions generated for first prompt
+- [ ] At least 2 of 4 completions contain `<action_type>` XML tags
+- [ ] Completions are different from each other (not all identical)
+### 5B. Reward collection
+- [ ] All 4 completions submitted to env
+- [ ] All 4 rewards received (no timeouts)
+- [ ] Rewards have variance (NOT all the same value)
+- [ ] Rewards in expected range [-1.0, +2.0]
+- [ ] Print rewards: `[_____, _____, _____, _____]` (write them down)
+### 5C. Training step
+- [ ] GRPO loss computed (finite number, not NaN, not inf, not 0.0)
+- [ ] Loss value: _____ (write it down)
+- [ ] Wandb shows run with 2 logged steps
+- [ ] No OOM during backward pass
+- [ ] Peak GPU memory: _____GB (must be < 22GB on A10G or < 14GB on T4)
+### 5D. Checkpointing
+- [ ] Output directory created: `./commitguard-llama-3b-grpo/`
+- [ ] Checkpoint files present (or will be at step 50)
+### 5E. Timing estimate
+- [ ] 2 steps took _____ seconds
+- [ ] Estimated time for 300 steps: _____ minutes (= 2-step-time × 150)
+- [ ] Estimated cost: _____ dollars (hours × GPU hourly rate)
+- [ ] Cost within budget? (must be under $8)
+---
+## PHASE 6 — Baseline Eval (Before Training)
+**MUST run baseline BEFORE training. Cannot run after — you need the contrast.**
+```bash
+python evaluate.py \
+  --model_path meta-llama/Llama-3.2-3B-Instruct \
+  --test_file data/devign_test.jsonl \
+  --output eval_baseline.json
+```
+- [ ] Eval completes on all 100 test samples
+- [ ] Binary accuracy: _____% (write it down, expected: 30-50%)
+- [ ] CWE accuracy: _____% (expected: low, maybe 5-15%)
+- [ ] False positive rate: _____%
+- [ ] False negative rate: _____%
+- [ ] Results saved to `eval_baseline.json`
+- [ ] File committed to repo
+---
+## PHASE 7 — Launch Real Training
+### Pre-launch final checks
+- [ ] All phases 0-6 are GREEN
+- [ ] Budget approved by Niti (team lead)
+- [ ] Config confirmed:
+  - [ ] `max_steps = 300`
+  - [ ] `save_steps = 50`
+  - [ ] `logging_steps = 1`
+  - [ ] `num_generations = 4`
+  - [ ] `learning_rate = 5e-6`
+  - [ ] `report_to = "wandb"`
+- [ ] HF Space is still healthy (re-check `/health`)
+- [ ] Screenshot this checklist with all boxes ticked → post in team channel
+### Launch
+```bash
+# Option A: HF Jobs (preferred)
+hf jobs uv run --flavor a10g-large train_grpo.py
+# Option B: GCP (fallback)
+nohup python train_grpo.py > training.log 2>&1 &
+```
+- [ ] Job started successfully
+- [ ] Job ID / Dashboard URL captured: _______________________
+- [ ] Wandb run URL captured: _______________________
+- [ ] Posted both URLs in team channel
+- [ ] Set alarm to check in 30 minutes
+---
+## PHASE 8 — During Training Monitoring
+**Check every 30 minutes while awake. Check immediately on waking up.**
+### Quick health check (< 2 min each time)
+| Time | reward/mean | reward/std | loss | GPU mem | Status |
+|------|-------------|------------|------|---------|--------|
+| +30m | _____ | _____ | _____ | _____ | ✅/⚠️/❌ |
+| +1h  | _____ | _____ | _____ | _____ | ✅/⚠️/❌ |
+| +1.5h | _____ | _____ | _____ | _____ | ✅/⚠️/❌ |
+| +2h  | _____ | _____ | _____ | _____ | ✅/⚠️/❌ |
+| Final | _____ | _____ | _____ | _____ | ✅/⚠️/❌ |
+### Red flags → immediate action
+| Red flag | Action |
+|---|---|
+| reward/mean trending DOWN | Check env `/health`. If healthy, lower LR to 2e-6 and relaunch from latest checkpoint. |
+| loss = NaN | Kill run. Add `max_grad_norm=1.0` to config. Relaunch from checkpoint. |
+| GPU memory > 23GB | Will OOM soon. Kill run. Reduce `num_generations` to 2. Relaunch. |
+| Env returning errors in Wandb logs | HF Space is sleeping. Hit `/health` to wake. If down, Niti restarts. |
+| Steps/second dropped to 0 | Job hung. Kill and relaunch from checkpoint. |
+| All rewards identical for 50+ steps | Reward function bug. Ping Deepak. |
+---
+## PHASE 9 — Post-Training
+### Immediately after training completes
+- [ ] Training finished without crash
+- [ ] Wandb run status: "finished"
+- [ ] Final reward/mean: _____ (higher than step-1 reward? That's the curve.)
+- [ ] Screenshot reward curve from Wandb → save as `plots/reward_curve.png`
+- [ ] Final checkpoint exists in output directory
+- [ ] Total training time: _____ hours
+- [ ] Total cost: $_____
+### Save the model
+```bash
+# Push LoRA adapter to HF Hub
+huggingface-cli upload inmodel-labs/commitguard-llama-3b \
+  ./commitguard-llama-3b-grpo/final
+```
+- [ ] Upload successful
+- [ ] Model page visible at https://huggingface.co/inmodel-labs/commitguard-llama-3b
+### Verify the saved model loads
+```python
+from peft import PeftModel
+from transformers import AutoModelForCausalLM
+base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
+model = PeftModel.from_pretrained(base, "inmodel-labs/commitguard-llama-3b")
+print("Trained model loads correctly")
+```
+- [ ] Model loads without error
+- [ ] Quick inference produces XML-tagged output (not garbage)
+---
+## PHASE 10 — Trained Model Eval
+```bash
+python evaluate.py \
+  --model_path ./commitguard-llama-3b-grpo/final \
+  --test_file data/devign_test.jsonl \
+  --is_lora \
+  --base_model meta-llama/Llama-3.2-3B-Instruct \
+  --output eval_trained.json
+```
+- [ ] Eval completes on all 100 test samples
+- [ ] Binary accuracy: _____% (compare to baseline: _____%)
+- [ ] CWE accuracy: _____% (compare to baseline: _____%)
+- [ ] False positive rate: _____% (compare to baseline: _____%)
+- [ ] False negative rate: _____% (compare to baseline: _____%)
+- [ ] Results saved to `eval_trained.json`
+- [ ] File committed to repo
+### The verdict
+- [ ] Trained accuracy > baseline accuracy? **YES / NO**
+- [ ] If YES: by how many percentage points? _____pp
+- [ ] If NO: check if qualitative outputs improved (reasoning traces better even if accuracy similar)
+### Hand off to team
+- [ ] Post in team channel:
+  ```
+  TRAINING COMPLETE
+  Baseline accuracy: X%
+  Trained accuracy: Y%
+  Improvement: +Zpp
+  Wandb: [url]
+  Reward curve: [screenshot]
+  Model on Hub: inmodel-labs/commitguard-llama-3b
+  Ready for plots and README.
+  ```
+- [ ] Hand `eval_baseline.json` and `eval_trained.json` to Deepak for plot generation
+- [ ] Kill GCP VM if running (`gcloud compute instances stop ...`)
+- [ ] Update budget tracker in team channel
+---
+## PHASE 11 — Inference for Demo Video
+**Divyank runs this to get the before/after examples for the demo recording.**
+### Pick the demo sample
+- [ ] Find ONE sample from test set where:
+  - Ground truth: vulnerable (preferably CWE-89 SQL injection)
+  - Baseline model gets it WRONG
+  - Trained model gets it RIGHT
+- [ ] Sample commit_id: _______________________
+### Generate baseline output
+```python
+# Load untrained model, generate response for the demo sample
+# Save full text output to demo_baseline_output.txt
+```
+- [ ] Baseline output saved
+- [ ] Output shows: wrong verdict / no reasoning / random guess
+### Generate trained output
+```python
+# Load trained model, generate response for the demo sample
+# Save full text output to demo_trained_output.txt
+```
+- [ ] Trained output saved
+- [ ] Output shows: correct verdict / identifies CWE / sketches exploit
+- [ ] The contrast between baseline and trained is VISIBLE and OBVIOUS
+### Ready for recording
+- [ ] Both outputs saved as text files for screen capture
+- [ ] The diff for this sample is readable (not 80 lines of dense C)
+- [ ] Proceed to demo video recording (see tasks_divyank.md)
+---
+## Emergency Fallback Reference Card
+**Tape this next to your screen. Read it at 3 AM when your brain is mush.**
+```
+CRASHED? → Check Wandb → Is it OOM?
+  YES OOM → num_generations=2, retry from checkpoint
+  STILL OOM → Switch to Qwen2.5-1.5B, retry from scratch
+  NOT OOM → Check error message → Screenshot → Post in team channel
+REWARDS ALL ZERO? → Env bug, not model bug
+  → curl /health on HF Space
+  → If dead: ping Niti
+  → If alive: curl /step manually, check reward value
+  → If reward from curl is also 0: Deepak's reward function bug
+LLAMA ACCESS DENIED? → Switch to Qwen2.5-1.5B immediately
+  → Change ONE line: model_name="Qwen/Qwen2.5-1.5B-Instruct"
+  → Everything else stays the same
+CURVE IS FLAT? → Ship it anyway with honest narrative
+  → "Training evidence shows optimization attempted;
+     reward signal needs richer shaping in future work"
+  → A flat curve + honest story > no submission
+```

data/cwe_keywords.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "CWE-119": ["buffer overflow", "out of bounds", "overflow", "bounds check", "memcpy", "strcpy", "strcat", "index out of range", "heap", "stack smash"],
+  "CWE-476": ["null pointer", "nullptr", "dereference", "null check", "segmentation fault", "null access", "uninitialized"],
+  "CWE-189": ["integer overflow", "signedness", "division by zero", "arithmetic overflow", "wrap around", "truncation", "cast", "narrowing"],
+  "CWE-20": ["input validation", "improper input", "validation bypass", "sanitization", "untrusted input", "malformed data", "missing check"],
+  "CWE-22": ["path traversal", "directory traversal", "../", "..\\", "file inclusion", "arbitrary file", "escape root", "chroot"],
+  "CWE-78": ["command injection", "os.system", "subprocess", "shell=true", "exec(", "popen", "system(", "shell command"],
+  "CWE-89": ["sql injection", "sqli", "drop table", "union select", "query concatenation", "prepared statement", "bypass login"],
+  "CWE-79": ["xss", "cross site scripting", "script tag", "innerhtml", "alert(", "javascript:", "onerror", "content injection"],
+  "CWE-OTHER": ["vulnerability", "security", "exploit", "unsafe", "flaw", "bug", "error handling", "race condition", "use after free", "double free"]
+}

data/devign_filtered.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

data/devign_test.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

data/devign_train.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

models.py ADDED Viewed

	@@ -0,0 +1,61 @@

+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import Literal, Optional
+ActionType = Literal["request_context", "analyze", "verdict"]
+@dataclass(frozen=True, slots=True)
+class CommitGuardAction:
+    action_type: ActionType
+    file_path: Optional[str] = None
+    reasoning: Optional[str] = None
+    is_vulnerable: Optional[bool] = None
+    vuln_type: Optional[str] = None
+    exploit_sketch: Optional[str] = None
+    raw_action: Optional[str] = None
+    parse_error: Optional[str] = None
+@dataclass(frozen=True, slots=True)
+class ContextSnippet:
+    file_path: str
+    start_line: int
+    end_line: int
+    content: str
+@dataclass(frozen=True, slots=True)
+class CommitGuardObservation:
+    # Cheating-prevention critical: this shape must never include ground truth.
+    episode_id: str
+    step_idx: int
+    diff: str
+    available_files: list[str]
+    context_snippets: list[ContextSnippet] = field(default_factory=list)
+    budget_remaining: int = 0
+    error: Optional[str] = None
+@dataclass(frozen=True, slots=True)
+class CommitGuardState:
+    episode_id: str
+    current_sample_id: str
+    step_count: int
+    context_requests: int = 0
+    history: list[dict] = field(default_factory=list)
+@dataclass(frozen=True, slots=True)
+class DevignSample:
+    sample_id: str
+    diff: str
+    available_files: list[str]
+    # Server-only fields (must never be surfaced in Observation)
+    is_vulnerable: Optional[bool] = None
+    cwe: Optional[str] = None
+    target_file: Optional[str] = None
+    files: Optional[dict[str, str]] = None

openenv.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+name: commitguard
+version: "0.1.0"
+description: "CommitGuard OpenEnv environment (FastAPI server)"
+port: 8000
+entrypoint: "server/app.py"

prd.md ADDED Viewed

	@@ -0,0 +1,381 @@

+# CommitGuard  Product Requirements Document
+**Project:** CommitGuard
+**Owner:** Niti (Inmodel Labs)
+**Team:** Niti, Deepak, Divyank
+**Submission deadline:** Sunday 5:00 PM IST
+**Hackathon:** Meta OpenEnv Hackathon (PyTorch + Hugging Face + Scaler)
+**Document status:** Locked. Scope freeze at midnight Saturday.
+---
+## 1. Executive Summary
+CommitGuard is a Reinforcement Learning environment built on Meta OpenEnv that trains LLM agents to detect exploitable vulnerabilities in code commits. The submission demonstrates that AI-paced security review is feasible  that an agent trained on commit-level reasoning can match the velocity at which AI coding agents are now shipping production code.
+The deliverable is a runnable HF Space hosting the env, a training notebook that produces a measurable learning curve on Llama-3.2-3B-Instruct, a demo video showing the qualitative shift from untrained to trained behavior, and a README that tells the story.
+---
+## 2. Problem Statement
+### 2.1 The shift in software development
+Until recently, code was written by humans at human velocity. Security review processes were designed around this assumption  periodic pentests every 3 to 6 months, with manual code review at PR time. The cycle worked because the codebase changed slowly enough that periodic deep review caught most issues before they reached production.
+This assumption has broken. Code is now being written and shipped by AI coding agents  Claude Code, Cursor, autonomous coding agents  at 10 to 100 times human velocity. Companies push to production daily, sometimes hourly. A pentest report from six months ago describes a codebase that no longer exists.
+### 2.2 The asymmetry
+The same class of LLM that writes the code can be weaponized to attack it. An adversary equipped with autonomous coding tooling, given repository access or even just leaked commits, can pentest at the same velocity defenders ship. Defense runs on human time. Offense runs on AI time. **This asymmetry is unsustainable for any organization shipping AI-generated code at scale.**
+### 2.3 Why this is a frontier problem
+AI red-teaming today is overwhelmingly a manual, human-bottlenecked discipline. Researchers at Anthropic, OpenAI, and Meta craft attacks one at a time. There is no automated equivalent of Metasploit for AI-generated code. Closing that gap is an open research problem that frontier labs are actively investing in.
+---
+## 3. Goals and Non-Goals
+### 3.1 Goals (in scope for this submission)
+- Deliver a working OpenEnv environment that takes a code commit as input and rewards an agent for correctly identifying vulnerabilities, the CWE class, and a plausible exploit
+- Train a small Llama variant (Llama-3.2-3B-Instruct) on the env using GRPO via TRL + Unsloth
+- Demonstrate measurable learning  baseline vs. trained accuracy with reward curves
+- Ship a complete submission package: HF Space, training notebook, README, demo video, optional HF blog post
+- Frame the work in language a Meta researcher recognizes: RLVR (Reinforcement Learning from Verifiable Rewards), commit-time security, AI-paced defense
+### 3.2 Non-goals (explicitly out of scope)
+- Production-ready security tool  this is a research environment, not a CI plugin
+- Real-time exploit execution against arbitrary code  the v1 reward uses pattern matching, not sandboxed execution
+- Multi-file / repo-level reasoning  v1 operates on single-file commits up to 80 lines
+- Multi-agent self-play  listed in Future Work
+- Pentesting beyond static code analysis  no network attacks, social engineering, or runtime probing
+- Coverage of all CWEs  v1 focuses on the top 10 CWEs in Devign
+### 3.3 Non-goals from the rubric perspective
+The rubric rewards ambition and storytelling more heavily than engineering polish. Therefore: not pursuing exhaustive test coverage, not optimizing for inference latency, not building a fancy frontend. The HF Space's default web UI is sufficient.
+---
+## 4. Target Users and Stakeholders
+| Stakeholder | Role | What they care about |
+|---|---|---|
+| Hackathon judges (Meta partner engineers) | Primary audience | Innovation, story, training evidence, reward design |
+| Meta Superintelligence Labs researchers | Aspirational audience | Frontier framing, RLVR alignment, paper-worthiness |
+| HF community | Discovery audience | Reproducibility, runnable Space, clean README |
+| Future contributors | Builder audience | Code clarity, extensibility hooks for v2 |
+---
+## 5. Solution Overview
+### 5.1 The environment
+CommitGuard is an OpenEnv environment where an agent investigates code commits and decides whether they introduce exploitable vulnerabilities. The agent has limited investigation budget (5 steps maximum per episode), forcing it to reason efficiently rather than brute-forcing context.
+### 5.2 The agent loop
+1. `reset()`  env loads a commit (a `code_before`/`code_after` pair plus metadata) from a preprocessed Devign-derived dataset, returns the diff and the list of available files in the repo
+2. `step(action)`  agent emits one of three action types:
+   - `request_context(file_path)`  pull surrounding code (small reward penalty, encourages efficiency)
+   - `analyze(reasoning)`  write chain-of-thought, no reward effect, logged for traces
+   - `verdict(is_vulnerable, vuln_type, exploit_sketch)`  terminate the episode with a judgment
+3. Reward fires on verdict, computed server-side against ground truth the agent never sees
+### 5.3 Reward design (RLVR philosophy)
+The reward is tiered and grounded in dataset truth, not in another LLM's opinion. This is deliberate  it follows the RLVR tradition (verifiable rewards from ground truth or executable checks) and prevents the reward hacking that plagues LLM-as-judge setups.
+| Signal | Reward |
+|---|---|
+| Correct binary verdict (vulnerable vs. safe) | +1.0 |
+| Correct CWE classification (when vulnerable) | +0.5 |
+| Plausible exploit sketch (CWE-keyword match) | +0.5 |
+| False positive (safe flagged as vulnerable) | -1.0 |
+| False negative (real vuln missed) | -0.5 |
+| Per-step context request | -0.05 |
+| Episode step cap | 5 steps |
+The shape is hard to game  flagging everything is punished by false positives, never investigating means no exploit sketch bonus.
+---
+## 6. Technical Architecture
+### 6.1 System diagram
+```
+     HTTP/JSON
+   TRL + Unsloth           HF Space
+   Llama-3.2-3B         reset/step         FastAPI server
+   GRPO trainer         /state             (Docker)
+   (HF Jobs A10G)
+                                                Devign
+                                                JSONL
+                                                Reward
+                                                function
+```
+### 6.2 Component breakdown
+**Env server** (Python, FastAPI, Docker, OpenEnv 0.2.3+)
+- `models.py`  Action, Observation, State dataclasses (extends OpenEnv base classes)
+- `environment.py`  `reset()`, `step()`, `state()` methods on the `CommitGuardEnvironment` class
+- `reward.py`  pure function `compute_reward(action, ground_truth, cwe_keywords) -> float`
+- `parse_action.py`  XML-tag parser, robust to malformed model output
+- `data/devign_filtered.jsonl`  preprocessed dataset, shipped in image
+- `data/cwe_keywords.json`  top-10 CWE  exploit-pattern keyword map
+**Env client** (auto-generated by OpenEnv CLI)
+- `client.py`  `HTTPEnvClient` subclass, used by training notebook
+- Installable via `pip install git+https://huggingface.co/spaces/<user>/commitguard`
+**Training pipeline** (Python, TRL, Unsloth, PEFT, Wandb)
+- `train_grpo.py`  GRPOTrainer config + main loop
+- `agent_prompt.py`  system prompt template with XML-tag action format
+- `evaluate.py`  runs N samples through a model, returns accuracy stats
+**Storytelling artifacts**
+- `README.md`  pitch + results + links
+- `demo_video.mp4`  60-90 second before/after, hosted on YouTube unlisted
+- `commitguard_hf_blog.md`  optional HF Hub blog post (page 26 bonus)
+- `plots/`  reward_curve.png, baseline_vs_trained.png, per_cwe.png
+### 6.3 Data flow
+1. Preprocess Devign once at build time  `data/devign_filtered.jsonl` (~5000 samples, balanced, filtered to <80 LOC)
+2. Build Docker image with JSONL embedded
+3. `openenv push` deploys to HF Space
+4. Training notebook connects to HF Space URL via the OpenEnv HTTP client
+5. Each training step: GRPO generates 4 completions per prompt  each runs a full episode in the env  rewards collected  policy updated via LoRA
+6. Wandb logs reward curves, training loss, checkpoints saved every 50 steps
+7. Final LoRA adapter saved to HF Hub for evaluation and demo
+### 6.4 Cheating prevention
+The agent must never see ground truth. Enforced by architecture:
+- Ground truth lives only on the server, in the JSONL file the env loads from
+- The Observation dataclass schema explicitly excludes `is_vulnerable`, `cwe_type`, and `target_file_with_label`
+- A unit test (`test_no_leak.py`) asserts no observation contains forbidden fields
+- The server returns only `reward` (a scalar) on each step, never the label that produced it
+---
+## 7. Stack and Dependencies
+### 7.1 Locked technical decisions
+| Decision | Choice | Rationale |
+|---|---|---|
+| Env framework | Meta OpenEnv 0.2.3+ | Mandatory per submission rules |
+| Server runtime | FastAPI in Docker | OpenEnv default, lowest friction |
+| Hosting | HF Space | Mandatory per submission rules, three-in-one (server + repo + registry) |
+| Data source | Devign (DetectBERT subset) | Already on disk, real CWE labels, manageable size |
+| Model | Llama-3.2-3B-Instruct | Meta-branded for the Meta hackathon, fits A10G with GRPO |
+| Training framework | TRL with GRPO | Native OpenEnv integration via `reward_funcs` callback |
+| Training optimization | Unsloth 4-bit + LoRA r=8 | 70% memory reduction, 2x speed (page 75 of opening deck) |
+| Training infra | HF Jobs A10G | $0.40-1.50/hr, runs unattended, integrates with HF ecosystem |
+| Dev infra | GCP VM with T4 | Stable, no Colab disconnects, leverages 24,000 GCP credit |
+| Action serialization | XML-tag free-text | Robust to small-model output variance, easier than JSON-mode |
+| Logging | Wandb | TRL native, judges can view runs |
+### 7.2 Fallback decisions (pre-approved, no debate when triggered)
+| If this fails | Fall back to | Trigger |
+|---|---|---|
+| Llama-3.2-3B OOM on A10G | Qwen2.5-1.5B-Instruct | First test step crashes |
+| HF Jobs queue full | GCP A10G on-demand | Job queues for >30 min |
+| 3-action env doesn't ship by midnight | 2-action env (analyze + verdict) | Niti's checkpoint red |
+| Tiered reward buggy | Binary correct/incorrect reward | Deepak's checkpoint red |
+| Training curve flat | Ship with qualitative comparison only | Curve still flat at 10 AM Sunday |
+| Demo video can't be cleanly recorded | Side-by-side text trace in README | Recording fails twice |
+---
+## 8. Functional Requirements
+### 8.1 Environment functional requirements
+| ID | Requirement | Priority |
+|---|---|---|
+| F-1 | Env exposes `/health`, `/reset`, `/step`, `/state`, `/docs` endpoints | P0 |
+| F-2 | `reset()` returns a random commit observation, never the same one twice in a single episode | P0 |
+| F-3 | `step()` accepts XML-tagged action strings and parses them robustly | P0 |
+| F-4 | `step()` returns reward, observation, and done flag | P0 |
+| F-5 | Episode terminates on `verdict` action OR after 5 steps | P0 |
+| F-6 | Observation never contains ground-truth labels | P0 |
+| F-7 | Env handles malformed actions gracefully (returns -0.5 reward, doesn't crash) | P1 |
+| F-8 | Env supports concurrent episodes (multiple training generations in parallel) | P1 |
+| F-9 | Web UI on HF Space allows manual interaction for demo recording | P2 |
+### 8.2 Training functional requirements
+| ID | Requirement | Priority |
+|---|---|---|
+| T-1 | Training notebook runs end-to-end on a single A10G | P0 |
+| T-2 | Reward curve, training loss, and completions logged to Wandb | P0 |
+| T-3 | LoRA adapter saved every 50 steps for resumability | P0 |
+| T-4 | Baseline (untrained) evaluation on 100 held-out samples completes in <10 min | P0 |
+| T-5 | Trained model evaluation produces per-CWE accuracy breakdown | P1 |
+| T-6 | Notebook runnable from Colab via "Open in Colab" badge in README | P1 |
+### 8.3 Storytelling functional requirements
+| ID | Requirement | Priority |
+|---|---|---|
+| S-1 | README explains problem, env, results, and motivation in <5 min read | P0 |
+| S-2 | All plot PNGs committed to repo (not Wandb-only) | P0 |
+| S-3 | Demo video 60-90 sec, before/after on a single SQL injection example | P0 |
+| S-4 | Wandb run URL linked in README | P1 |
+| S-5 | HF Hub blog post published and linked | P2 |
+---
+## 9. Non-Functional Requirements
+| Aspect | Requirement |
+|---|---|
+| Performance | Single `step()` call returns in <2 seconds on HF Space free tier |
+| Reliability | Env survives 100 random episodes without crash |
+| Reproducibility | Training notebook produces a measurable learning curve when re-run with same seed |
+| Discoverability | HF Space tagged with `openenv`, `rl`, `security`, `code` |
+| Documentation | README is self-contained  judge can understand without reading source |
+| Licensing | Code MIT-licensed, dataset attribution to Devign authors |
+---
+## 10. Success Metrics
+### 10.1 Submission completeness (binary, must-pass)
+- [ ] HF Space deployed and `/health` returns 200 OK
+- [ ] Training notebook runs without crashes on a fresh Colab/VM
+- [ ] README has all required links (HF Space, notebook, video, GitHub)
+- [ ] At least one reward curve plot committed
+- [ ] Demo video accessible via public URL
+### 10.2 Quality metrics (graded by rubric)
+| Metric | Target | Stretch |
+|---|---|---|
+| Innovation framing recognized by mentor | "this is an interesting angle" feedback | "this is paper-worthy" feedback |
+| Baseline accuracy (untrained Llama-3.2-3B) | Establishes a floor (likely 30-45%) |  |
+| Trained accuracy (after 300 GRPO steps) | Beats baseline by 10pp absolute | Beats baseline by 20pp |
+| Reward curve | Bends upward visibly | Smooth monotonic increase |
+| Per-CWE breakdown | At least 3 CWEs show improvement | All top-5 CWEs show improvement |
+| Storytelling | Mentor at Round 3 can repeat the pitch back | Mentor offers to share with Meta team |
+### 10.3 Anti-metrics (things we explicitly don't optimize for)
+- Number of features
+- Number of CWEs covered (more is not better  depth beats breadth here)
+- Lines of code
+- Model size (going larger doesn't make a stronger submission, just slower training)
+---
+## 11. Risks and Mitigations
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| Training run produces flat curve | Medium | High | Pre-approved pivot to qualitative-comparison narrative; baseline already establishes a contrast |
+| HF Space deployment fails at 4 AM | Low | High | Fallback to Docker image with `docker run` instructions in README |
+| Llama-3.2 license approval delayed | Low | Medium | Submit license request immediately at GCP setup; Qwen-1.5B fallback ready |
+| Devign data has bad CWE labels | Medium | Medium | Filter aggressively; if too noisy, drop to top-5 cleanest CWEs only |
+| One teammate falls behind their phase | Medium | High | Sync points at midnight, 9 AM, 3 PM allow scope cuts; mock-env pattern means training isn't blocked |
+| Niti exhausted at Mentor Round 3 | High if no sleep | High | Mandatory sleep schedule 12:30 AM5:00 AM, non-negotiable |
+| Demo video can't be cleanly recorded | Medium | Medium | Cherry-pick the best example; fall back to text trace if recording fails twice |
+| HF Space rate limits during training | Low | Medium | Run training on local Docker if HF Space hits limits |
+---
+## 12. Timeline and Milestones
+| Time (IST) | Milestone | Owner |
+|---|---|---|
+| Sat 9:30 PM | Phase 1 starts  env scaffolding, data prep, training scaffolding in parallel | All |
+| Sat 8:00 PM | Mentor Round 2  pitch validation | Niti |
+| Sat 11:59 PM | Phase 1 checkpoint  env runs, data ready, mock training works | All |
+| Sun 12:00 AM | **Scope freeze**  no new features after this point | All |
+| Sun 12:30 AM | Niti sleep starts | Niti |
+| Sun 3:00 AM | HF Space live, Deepak sleep starts | Deepak |
+| Sun 5:30 AM | Real training run launched on HF Jobs, Divyank sleep starts | Divyank |
+| Sun 5:00 AM | Niti wakes, watches training | Niti |
+| Sun 9:00 AM | Team sync  training results, plot status | All |
+| Sun 10:00 AM | Mentor Round 3  final sharpening | Niti |
+| Sun 11:30 AM | Demo video recorded and uploaded | Divyank |
+| Sun 1:00 PM | README finalized | Niti |
+| Sun 3:00 PM | **Feature freeze**  2-hour reminder, no more changes | All |
+| Sun 4:30 PM | Submission packaged | Niti |
+| Sun 5:00 PM | **Submission deadline** |  |
+---
+## 13. Open Questions and Assumptions
+### 13.1 Assumptions
+- Devign dataset is on disk locally (or downloadable in <30 min)  to be verified by Deepak at Phase 1 start
+- HF Space free tier is sufficient for env hosting during the hackathon  backup plan: $9/mo upgrade if rate limited
+- Llama-3.2-3B-Instruct license approval lands within 1 hour of request  Qwen fallback ready if not
+- HF Jobs A10G availability at 5 AM Sunday  GCP A10G fallback if queued
+### 13.2 Open questions (to resolve during execution)
+- Exact number of training steps to maximize curve visibility within budget  answered empirically by 9 AM Sunday based on observed loss
+- Whether to ship a Colab-runnable notebook AND an HF Jobs notebook, or just one  defer to Divyank's call at Phase 2
+- Whether to include a comparison against a non-RL baseline (pure SFT or zero-shot)  stretch only
+---
+## 14. Future Work (Post-Hackathon)
+This section becomes part of the README's "What's Next" pitch  explicitly signals to judges that we understand the limitations and have a roadmap.
+- **Sandboxed exploit execution**  replace pattern-match reward with actual exploit runs against compiled code in a Docker sandbox
+- **Multi-file commit reasoning**  extend the env to support diffs spanning multiple files, with a context budget
+- **Self-play loop**  pair CommitGuard with a code-generation agent; defender and attacker train against each other (the AlphaGo pattern for security)
+- **Agentic harness integration**  wire into real CI pipelines via the OpenEnv MCP layer, enabling commit-time security review at PR open
+- **Real CVE corpus**  extend beyond Devign to recent CVE-tagged commits from major open-source repos
+- **Multi-language support**  current env is C-focused via Devign; extend to Python, JavaScript, Go
+- **Reward shape ablations**  formal study of how reward composition affects which vulnerability types the model learns fastest
+---
+## 15. Appendix
+### 15.1 Key reference URLs (for the team to bookmark)
+- OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
+- OpenEnv Scaler intro: https://tinyurl.com/openenv-scaler
+- TRL OpenEnv docs: https://huggingface.co/docs/trl/en/openenv
+- TRL Sudoku GRPO example: https://github.com/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb
+- TRL Wordle GRPO example: https://github.com/huggingface/trl/blob/main/examples/notebooks/openenv_wordle_grpo.ipynb
+- Unsloth 2048 example: https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/examples/unsloth_2048.ipynb
+- Llama-3.2-3B model card: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
+- HF Jobs docs: https://huggingface.co/docs/hub/jobs
+- Cursor credits: https://tinyurl.com/sclr-openenv-dashboard
+- HF $30 credits: https://huggingface.co/coupons/claim/hf-openenv-community
+### 15.2 Document version
+- v1.0  Saturday evening, Bangalore venue. Locked at midnight Saturday.
+- Changes after lock require explicit team-wide sign-off and a documented rationale.
+---
+## 16. The 30-Second Pitch (For Mentor Rounds, Memorize This)
+> "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it  defense is on human time, offense is on AI time, and that asymmetry breaks the security model.
+>
+> CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR  verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."

pyproject.toml ADDED Viewed

	@@ -0,0 +1,39 @@

+[project]
+name = "commitguard"
+version = "0.1.0"
+description = "CommitGuard OpenEnv RL environment for commit-time vuln detection"
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+  "fastapi>=0.110",
+  "uvicorn[standard]>=0.27",
+  "pydantic>=2.6",
+  "openenv>=0.1.13",
+]
+[project.optional-dependencies]
+train = [
+  "requests",
+  "torch>=2.4",
+  "transformers>=4.46",
+  "trl>=0.12",
+  "accelerate>=1.0",
+  "peft>=0.13",
+  "datasets>=3.0",
+  "wandb",
+  "matplotlib",
+  "unsloth",
+  "bitsandbytes>=0.44",
+  "jupyter",
+  "ipywidgets",
+]
+[project.scripts]
+server = "commitguard_env.server:main"
+[tool.setuptools]
+packages = ["commitguard_env"]
+[build-system]
+requires = ["setuptools>=68"]
+build-backend = "setuptools.build_meta"

scripts/README.md ADDED Viewed

	@@ -0,0 +1,7 @@

+## Scripts
+This directory is for repeatable CLI-first ops (dataset preprocessing, local smoke runs).
+Primary expected script (Deepak):
+- `preprocess_devign.py` → produces `data/devign_filtered.jsonl`

scripts/agent_prompt.py ADDED Viewed

	@@ -0,0 +1,38 @@

+"""System prompt and per-turn prompt for CommitGuard GRPO training."""
+SYSTEM_PROMPT = """\
+You are a security auditor. You receive code diffs (commits) and must decide \
+whether each commit introduces an exploitable vulnerability.
+You may take up to 5 actions per episode. Each action must be wrapped in XML tags.
+Action types:
+1. Request additional file context:
+<action><action_type>request_context</action_type><file_path>path/to/file.c</file_path></action>
+2. Analyze / think (chain-of-thought, no reward effect):
+<action><action_type>analyze</action_type><reasoning>your reasoning here</reasoning></action>
+3. Submit a verdict (terminates the episode):
+<action><action_type>verdict</action_type><is_vulnerable>true|false</is_vulnerable><vuln_type>CWE-XXX</vuln_type><exploit_sketch>describe how to exploit</exploit_sketch></action>
+Rules:
+- You MUST submit exactly one verdict before running out of budget.
+- If the code is safe, set is_vulnerable to false and vuln_type to NONE.
+- Be specific in exploit_sketch: name the attack vector (e.g., buffer overflow via unchecked memcpy).
+- Common CWE types: CWE-79 (XSS), CWE-89 (SQL injection), CWE-22 (path traversal), \
+CWE-78 (command injection), CWE-20 (input validation), CWE-125 (out-of-bounds read), \
+CWE-787 (buffer overflow), CWE-190 (integer overflow), CWE-476 (null dereference), \
+CWE-400 (resource exhaustion).
+"""
+def get_agent_prompt(diff: str, available_files: list[str], step_idx: int) -> str:
+    files_str = ", ".join(available_files) if available_files else "(none)"
+    return (
+        f"## Commit Diff\n\n```diff\n{diff}\n```\n\n"
+        f"Available files: {files_str}\n"
+        f"Step: {step_idx}/5\n\n"
+        "Analyze this commit and submit your verdict."
+    )

scripts/check_cuda.py ADDED Viewed

	@@ -0,0 +1,6 @@

+import torch
+print(f'CUDA available: {torch.cuda.is_available()}')
+if torch.cuda.is_available():
+    print(f'Device count: {torch.cuda.device_count()}')
+    print(f'Device name: {torch.cuda.get_device_name(0)}')
+    print(f'Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB')

scripts/check_disjoint.py ADDED Viewed

	@@ -0,0 +1,20 @@

+import json
+from pathlib import Path
+def get_ids(file_path):
+    ids = set()
+    with open(file_path, 'r', encoding='utf-8') as f:
+        for line in f:
+            obj = json.loads(line)
+            ids.add(obj.get('commit_id') or obj.get('sample_id'))
+    return ids
+train_ids = get_ids('data/devign_train.jsonl')
+test_ids = get_ids('data/devign_test.jsonl')
+overlap = train_ids.intersection(test_ids)
+print(f"Train IDs: {len(train_ids)}")
+print(f"Test IDs: {len(test_ids)}")
+print(f"Overlap: {len(overlap)}")
+if overlap:
+    print(f"Overlapping IDs: {list(overlap)[:5]}")

scripts/evaluate.py ADDED Viewed

	@@ -0,0 +1,169 @@

+import json
+import argparse
+import os
+import re
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+from pathlib import Path
+import sys
+# Add project root to path for imports
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+from scripts.agent_prompt import SYSTEM_PROMPT
+def parse_xml_action(text):
+    """Extract action fields from XML-tagged model output."""
+    def extract(tag, default=None):
+        match = re.search(f"<{tag}>(.*?)</{tag}>", text, re.DOTALL)
+        return match.group(1).strip() if match else default
+    is_vuln_str = extract("is_vulnerable", "false")
+    return {
+        "action_type": "verdict",
+        "is_vulnerable": is_vuln_str.lower() == "true",
+        "vuln_type": extract("vuln_type", "unknown"),
+        "exploit_sketch": extract("exploit_sketch", ""),
+    }
+def format_eval_prompt(sample):
+    return (
+        f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n"
+        f"{SYSTEM_PROMPT}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
+        f"Analyze this commit and submit your verdict.\n\n"
+        f"Code diff:\n```diff\n{sample['diff']}\n```<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
+    )
+def evaluate(model_path, test_file, is_lora=False, base_model=None, output_file="eval_results.json"):
+    """
+    Run model on test samples, compute accuracy metrics.
+    """
+    print(f"Loading model from {model_path}...")
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    # Load model
+    if is_lora:
+        if not base_model:
+            raise ValueError("base_model is required if is_lora=True")
+        print(f"Loading LoRA adapter from {model_path} with base model {base_model}")
+        from unsloth import FastLanguageModel
+        model, tokenizer = FastLanguageModel.from_pretrained(
+            model_name = base_model,
+            max_seq_length = 2048,
+            load_in_4bit = True,
+        )
+        model = PeftModel.from_pretrained(model, model_path)
+        FastLanguageModel.for_inference(model)
+    else:
+        from transformers import AutoModelForCausalLM, AutoTokenizer
+        model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, device_map="auto")
+        tokenizer = AutoTokenizer.from_pretrained(model_path)
+    # Load test data
+    print(f"Loading test data from {test_file}...")
+    with open(test_file, "r", encoding="utf-8") as f:
+        samples = [json.loads(line) for line in f if line.strip()]
+    results = {
+        "summary": {
+            "total": len(samples),
+            "correct_binary": 0,
+            "correct_cwe": 0,
+            "false_positives": 0,
+            "false_negatives": 0,
+            "binary_accuracy": 0,
+            "cwe_accuracy": 0,
+            "false_positive_rate": 0,
+            "false_negative_rate": 0,
+            "cwe_breakdown": {},
+        },
+        "predictions": [],
+    }
+    print(f"Starting evaluation on {len(samples)} samples...")
+    for i, sample in enumerate(samples):
+        prompt = format_eval_prompt(sample)
+        inputs = tokenizer(prompt, return_tensors="pt").to(device)
+        with torch.no_grad():
+            output = model.generate(
+                **inputs,
+                max_new_tokens=256,
+                temperature=0.1,
+                do_sample=False,
+            )
+        response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
+        prediction = parse_xml_action(response)
+        gt_vulnerable = bool(sample["is_vulnerable"])
+        pred_vulnerable = prediction.get("is_vulnerable", False)
+        correct = pred_vulnerable == gt_vulnerable
+        if correct:
+            results["summary"]["correct_binary"] += 1
+        if gt_vulnerable and not pred_vulnerable:
+            results["summary"]["false_negatives"] += 1
+        elif not gt_vulnerable and pred_vulnerable:
+            results["summary"]["false_positives"] += 1
+        cwe = sample.get("cwe") or "CWE-OTHER"
+        if cwe not in results["summary"]["cwe_breakdown"]:
+            results["summary"]["cwe_breakdown"][cwe] = {"total": 0, "correct": 0, "accuracy": 0}
+        results["summary"]["cwe_breakdown"][cwe]["total"] += 1
+        if correct:
+            results["summary"]["cwe_breakdown"][cwe]["correct"] += 1
+        if gt_vulnerable and correct and prediction.get("vuln_type") == cwe:
+            results["summary"]["correct_cwe"] += 1
+        results["predictions"].append({
+            "sample_id": sample["sample_id"],
+            "ground_truth": gt_vulnerable,
+            "predicted": pred_vulnerable,
+            "predicted_cwe": prediction.get("vuln_type"),
+            "actual_cwe": cwe,
+            "response": response,
+        })
+        if (i + 1) % 10 == 0:
+            print(f"  Processed {i+1}/{len(samples)} samples...")
+    # Final summary stats
+    summary = results["summary"]
+    total = summary["total"]
+    vuln_count = sum(1 for s in samples if s["is_vulnerable"])
+    safe_count = total - vuln_count
+    summary["binary_accuracy"] = summary["correct_binary"] / total if total > 0 else 0
+    summary["cwe_accuracy"] = summary["correct_cwe"] / vuln_count if vuln_count > 0 else 0
+    summary["false_positive_rate"] = summary["false_positives"] / safe_count if safe_count > 0 else 0
+    summary["false_negative_rate"] = summary["false_negatives"] / vuln_count if vuln_count > 0 else 0
+    for cwe in summary["cwe_breakdown"]:
+        stats = summary["cwe_breakdown"][cwe]
+        stats["accuracy"] = stats["correct"] / stats["total"] if stats["total"] > 0 else 0
+    print(f"\nEvaluation Complete:")
+    print(f"  Binary Accuracy: {summary['binary_accuracy']:.2%}")
+    print(f"  CWE Accuracy:    {summary['cwe_accuracy']:.2%}")
+    print(f"  False Positives: {summary['false_positives']}")
+    print(f"  False Negatives: {summary['false_negatives']}")
+    with open(output_file, "w", encoding="utf-8") as f:
+        json.dump(results, f, indent=2)
+    print(f"Results saved to {output_file}")
+    return results
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model-path", default="meta-llama/Llama-3.2-3B-Instruct")
+    parser.add_argument("--test-file", default="data/devign_test.jsonl")
+    parser.add_argument("--is-lora", action="store_true")
+    parser.add_argument("--base-model", default="meta-llama/Llama-3.2-3B-Instruct")
+    parser.add_argument("--output", default="eval_results.json")
+    args = parser.parse_args()
+    evaluate(args.model_path, args.test_file, args.is_lora, args.base_model, args.output)

scripts/gce_vm_runbook.md ADDED Viewed

	@@ -0,0 +1,149 @@

+## GCE VM Runbook — CommitGuard GRPO Training
+### Step 1: Create VM
+Run from your local machine (or use GCP Console):
+```bash
+# Option A: L4 (24 GB VRAM, ~$0.70/hr) — RECOMMENDED
+gcloud compute instances create commitguard-train \
+  --zone=us-central1-a \
+  --machine-type=g2-standard-8 \
+  --accelerator=type=nvidia-l4,count=1 \
+  --boot-disk-size=100GB \
+  --image-family=pytorch-latest-gpu \
+  --image-project=deeplearning-platform-release \
+  --maintenance-policy=TERMINATE \
+  --metadata="install-nvidia-driver=True"
+# Option B: A100 (40 GB VRAM, ~$2.50/hr) — if L4 unavailable
+gcloud compute instances create commitguard-train \
+  --zone=us-central1-a \
+  --machine-type=a2-highgpu-1g \
+  --accelerator=type=nvidia-tesla-a100,count=1 \
+  --boot-disk-size=100GB \
+  --image-family=pytorch-latest-gpu \
+  --image-project=deeplearning-platform-release \
+  --maintenance-policy=TERMINATE \
+  --metadata="install-nvidia-driver=True"
+# Option C: T4 (16 GB VRAM, ~$0.35/hr) — budget fallback
+gcloud compute instances create commitguard-train \
+  --zone=us-central1-b \
+  --machine-type=n1-standard-8 \
+  --accelerator=type=nvidia-tesla-t4,count=1 \
+  --boot-disk-size=100GB \
+  --image-family=pytorch-latest-gpu \
+  --image-project=deeplearning-platform-release \
+  --maintenance-policy=TERMINATE \
+  --metadata="install-nvidia-driver=True"
+```
+### Step 2: SSH into VM
+```bash
+gcloud compute ssh commitguard-train --zone=us-central1-a
+```
+### Step 3: One-command setup
+```bash
+curl -sSL https://raw.githubusercontent.com/NitishKumar-ai/commitguard/main/scripts/gcp_setup.sh | bash
+```
+Or manually:
+```bash
+git clone https://github.com/NitishKumar-ai/commitguard.git
+cd commitguard
+bash scripts/gcp_setup.sh
+```
+### Step 4: Start env server (in tmux)
+```bash
+cd ~/commitguard && source .venv/bin/activate
+tmux new -s server
+server
+# Ctrl-B D to detach
+```
+Verify:
+```bash
+curl -s http://localhost:8000/health
+# → {"status":"healthy"}
+```
+### Step 5: Login to HuggingFace + Wandb
+```bash
+source ~/commitguard/.venv/bin/activate
+huggingface-cli login          # paste your HF token (needed for Llama gated model)
+wandb login                    # paste your wandb API key
+```
+### Step 6: Start training
+```bash
+cd ~/commitguard && source .venv/bin/activate
+export WANDB_PROJECT=commitguard
+# Full run (~2-3 hours on L4)
+python scripts/train_grpo.py \
+  --samples 200 \
+  --max-steps 300 \
+  --save-steps 50 \
+  --num-generations 4 \
+  --batch-size 1 \
+  --grad-accum 4
+# Quick smoke test first (5 min)
+python scripts/train_grpo.py \
+  --samples 20 \
+  --max-steps 10 \
+  --no-wandb
+```
+### Step 7: Monitor
+```bash
+# In another tmux pane:
+watch -n 30 nvidia-smi          # GPU memory
+# Wandb dashboard: https://wandb.ai/<your-user>/commitguard
+```
+### Step 8: Copy results back
+```bash
+# From your LOCAL machine:
+gcloud compute scp --recurse \
+  commitguard-train:~/commitguard/outputs/commitguard-llama-3b/final \
+  ./outputs/commitguard-llama-3b/final \
+  --zone=us-central1-a
+```
+### Step 9: Shut down VM
+```bash
+gcloud compute instances stop commitguard-train --zone=us-central1-a
+# or delete to stop billing entirely:
+gcloud compute instances delete commitguard-train --zone=us-central1-a
+```
+### Cost estimate
+| GPU | VRAM | $/hr | 300 steps (~3hr) |
+|-----|------|------|-------------------|
+| T4  | 16GB | $0.35 | ~$1.05 |
+| L4  | 24GB | $0.70 | ~$2.10 |
+| A100| 40GB | $2.50 | ~$7.50 |
+### Troubleshooting
+- **OOM on T4**: reduce `--num-generations 2` and `--batch-size 1`
+- **Llama access denied**: make sure you accepted the license at https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
+- **Env server not responding**: check `tmux attach -t server` for errors
+- **Wandb not logging**: verify `wandb login` succeeded, or use `--no-wandb`
+- **GPU quota error**: request GPU quota increase at https://console.cloud.google.com/iam-admin/quotas

scripts/gcp_setup.sh ADDED Viewed

	@@ -0,0 +1,99 @@

+#!/usr/bin/env bash
+# =============================================================================
+# CommitGuard — GCP VM Setup Script
+# Target: GCE VM with NVIDIA L4 (24 GB) or A100 (40/80 GB)
+# =============================================================================
+set -euo pipefail
+echo "============================================"
+echo "  CommitGuard GCP Training VM Setup"
+echo "============================================"
+# --- 1. System packages ---
+sudo apt-get update -qq
+sudo apt-get install -y -qq git python3-venv python3-pip tmux htop
+# --- 2. NVIDIA driver check ---
+if ! command -v nvidia-smi &>/dev/null; then
+    echo "ERROR: nvidia-smi not found. Use a GCP image with pre-installed GPU drivers:"
+    echo "  - Deep Learning VM (recommended)"
+    echo "  - Or install manually: sudo apt install nvidia-driver-535"
+    exit 1
+fi
+echo "GPU detected:"
+nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
+# --- 3. Clone repo ---
+REPO_DIR="$HOME/commitguard"
+if [ ! -d "$REPO_DIR" ]; then
+    echo "Cloning repo..."
+    git clone https://github.com/NitishKumar-ai/commitguard.git "$REPO_DIR"
+else
+    echo "Repo exists, pulling latest..."
+    cd "$REPO_DIR" && git pull
+fi
+cd "$REPO_DIR"
+# --- 4. Python venv ---
+if [ ! -d ".venv" ]; then
+    python3 -m venv .venv
+fi
+source .venv/bin/activate
+pip install -U pip setuptools wheel -q
+# --- 5. Install training dependencies ---
+echo "Installing training dependencies..."
+pip install -e . -q
+pip install \
+    "torch>=2.4" \
+    "unsloth[cu124-torch240]" \
+    "trl>=0.12" \
+    "peft>=0.13" \
+    "bitsandbytes>=0.44" \
+    "transformers>=4.46" \
+    "datasets>=3.0" \
+    "accelerate>=1.0" \
+    "wandb" \
+    "requests" \
+    "matplotlib" \
+    "jupyter" \
+    "ipywidgets" \
+    -q
+echo "Verifying installs..."
+python -c "
+import torch, trl, unsloth, peft, wandb, bitsandbytes
+print(f'PyTorch:  {torch.__version__}')
+print(f'CUDA:     {torch.cuda.is_available()} — {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"N/A\"}')
+print(f'TRL:      {trl.__version__}')
+print(f'PEFT:     {peft.__version__}')
+print(f'Wandb:    {wandb.__version__}')
+print('All training deps OK.')
+"
+echo ""
+echo "============================================"
+echo "  Setup complete. Two options to train:"
+echo "============================================"
+echo ""
+echo "  ── OPTION A: Jupyter Notebook (recommended) ──"
+echo ""
+echo "  # On the VM:"
+echo "  cd $REPO_DIR && source .venv/bin/activate"
+echo "  tmux new -s server -d 'source .venv/bin/activate && server'"
+echo "  jupyter notebook --no-browser --port=8888 --ip=0.0.0.0"
+echo ""
+echo "  # On your LOCAL machine (new terminal):"
+echo "  gcloud compute ssh commitguard-train --zone=us-central1-a -- -NL 8888:localhost:8888"
+echo ""
+echo "  # Then open in browser:"
+echo "  # http://localhost:8888  →  notebooks/train_commitguard.ipynb"
+echo ""
+echo "  ── OPTION B: CLI ──"
+echo ""
+echo "  cd $REPO_DIR && source .venv/bin/activate"
+echo "  tmux new -s server -d 'source .venv/bin/activate && server'"
+echo "  huggingface-cli login"
+echo "  python scripts/train_grpo.py --samples 200 --max-steps 300"
+echo ""

scripts/lightning_ai_runbook.md ADDED Viewed

	@@ -0,0 +1,44 @@

+# Training on Lightning AI
+This guide explains how to run CommitGuard GRPO training on a Lightning AI GPU Studio.
+## Recommended Instance
+- **GPU:** NVIDIA L4 (24GB) or A10G (24GB) is sufficient for Llama-3.2-3B with Unsloth 4-bit.
+- **Image:** Default Linux / PyTorch images are fine; the setup script handles dependencies.
+## Setup & Train in One Step
+1. Open a terminal in your Lightning AI Studio.
+2. Run the setup script:
+   ```bash
+   bash scripts/lightning_setup.sh
+   ```
+## What the script does:
+1. Installs `uv` for fast dependency management.
+2. Creates a virtual environment and installs all requirements (Unsloth, TRL, etc.).
+3. Starts the `commitguard_env` server in the background (via `tmux` if available).
+4. Runs `scripts/train_grpo.py`.
+## Manual Steps (Optional)
+### 1. View Training Logs
+If you want to see the environment server logs:
+```bash
+tmux attach -t env_server
+```
+(Press `Ctrl+B`, then `D` to detach).
+### 2. Hugging Face Integration
+To save your model to the Hugging Face Hub, login before training:
+```bash
+huggingface-cli login
+```
+### 3. Checkpoints
+Checkpoints and the final merged LoRA adapter will be saved to:
+`outputs/commitguard-llama-3b/final`
+## Troubleshooting
+- **OOM Error:** If you hit Out-Of-Memory, try reducing `--batch-size` or `--num-generations` in `scripts/train_grpo.py`.
+- **Server Connection:** If training fails with connection errors, ensure the server started correctly by checking `curl http://localhost:8000/health`.

scripts/lightning_setup.sh ADDED Viewed

	@@ -0,0 +1,61 @@

+#!/usr/bin/env bash
+# CommitGuard - Lightning AI Setup & Train
+# This script prepares the environment and starts GRPO training.
+set -e
+echo "--- 1. Installing uv ---"
+curl -LsSf https://astral.sh/uv/install.sh | sh
+if [ -f "$HOME/.local/bin/env" ]; then
+    source "$HOME/.local/bin/env"
+elif [ -f "$HOME/.cargo/env" ]; then
+    source "$HOME/.cargo/env"
+fi
+export PATH="$HOME/.local/bin:$PATH"
+echo "--- 2. Setting up Workspace ---"
+REPO_DIR="$HOME/commitguard"
+if [ ! -d "$REPO_DIR" ]; then
+    echo "Cloning repo..."
+    git clone https://github.com/NitishKumar-ai/commitguard "$REPO_DIR"
+fi
+cd "$REPO_DIR"
+echo "--- 3. Setting up Virtual Env ---"
+if [ ! -d ".venv" ]; then
+    uv venv
+fi
+source .venv/bin/activate
+echo "--- 4. Installing Dependencies ---"
+uv sync --all-extras
+echo "--- 5. Starting Environment Server ---"
+# Use tmux to keep the server running in the background
+if command -v tmux >/dev/null; then
+    tmux new -s env_server -d "source .venv/bin/activate && python -m commitguard_env.server"
+else
+    python -m commitguard_env.server &
+    SERVER_PID=$!
+fi
+echo "Waiting for server to be healthy..."
+max_retries=30
+count=0
+until $(curl --output /dev/null --silent --head --fail http://localhost:8000/health); do
+    printf '.'
+    sleep 2
+    count=$((count+1))
+    if [ $count -eq $max_retries ]; then
+      echo "Server failed to start."
+      exit 1
+    fi
+done
+echo "Server is healthy!"
+echo "--- 5. Starting GRPO Training ---"
+# Defaults: 200 samples, 300 steps.
+# Increase samples for better stability, decrease for faster iteration.
+python scripts/train_grpo.py --samples 200 --max-steps 300
+echo "Training session finished."

scripts/plot_results.py ADDED Viewed

	@@ -0,0 +1,103 @@

+import matplotlib.pyplot as plt
+import json
+import os
+import argparse
+def plot_reward_curve(wandb_data_path, output_path="plots/reward_curve.png"):
+    """
+    Plots the training reward curve.
+    Expects a JSON file with 'step' and 'reward' keys (exported from Wandb).
+    """
+    if not os.path.exists(wandb_data_path):
+        print(f"Skipping: {wandb_data_path} not found.")
+        return
+    with open(wandb_data_path, "r") as f:
+        data = json.load(f)
+    steps = [d["step"] for d in data]
+    rewards = [d["reward"] for d in data]
+    plt.figure(figsize=(10, 6))
+    plt.plot(steps, rewards, label="GRPO Reward", color="#2ecc71", linewidth=2)
+    plt.xlabel("Training Step")
+    plt.ylabel("Mean Reward")
+    plt.title("CommitGuard — GRPO Training Reward Curve")
+    plt.grid(True, linestyle="--", alpha=0.7)
+    plt.legend()
+    plt.savefig(output_path)
+    print(f"Saved: {output_path}")
+def plot_accuracy_comparison(baseline_acc, trained_acc, output_path="plots/baseline_vs_trained.png"):
+    """
+    Plots a bar chart comparing baseline vs trained accuracy.
+    """
+    labels = ['Baseline (Untrained)', 'CommitGuard (Trained)']
+    accuracies = [baseline_acc, trained_acc]
+    colors = ['#95a5a6', '#3498db']
+    plt.figure(figsize=(8, 6))
+    bars = plt.bar(labels, accuracies, color=colors)
+    plt.ylabel("Detection Accuracy (%)")
+    plt.title("Vulnerability Detection: Baseline vs. Trained")
+    plt.ylim(0, 100)
+    for bar in bars:
+        height = bar.get_height()
+        plt.text(bar.get_x() + bar.get_width()/2., height + 1,
+                 f'{height}%', ha='center', va='bottom', fontweight='bold')
+    plt.savefig(output_path)
+    print(f"Saved: {output_path}")
+def plot_per_cwe_breakdown(cwe_data, output_path="plots/per_cwe.png"):
+    """
+    Plots a grouped bar chart for per-CWE improvement.
+    cwe_data format: {"CWE-89": [baseline, trained], "CWE-119": [baseline, trained], ...}
+    """
+    cwes = list(cwe_data.keys())
+    baseline_vals = [v[0] for v in cwe_data.values()]
+    trained_vals = [v[1] for v in cwe_data.values()]
+    x = range(len(cwes))
+    width = 0.35
+    fig, ax = plt.subplots(figsize=(12, 6))
+    ax.bar([i - width/2 for i in x], baseline_vals, width, label='Baseline', color='#95a5a6')
+    ax.bar([i + width/2 for i in x], trained_vals, width, label='Trained', color='#e67e22')
+    ax.set_ylabel('Accuracy (%)')
+    ax.set_title('Detection Accuracy by CWE Type')
+    ax.set_xticks(x)
+    ax.set_xticklabels(cwes, rotation=45)
+    ax.legend()
+    ax.set_ylim(0, 100)
+    plt.tight_layout()
+    plt.savefig(output_path)
+    print(f"Saved: {output_path}")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--mode", choices=["reward", "accuracy", "cwe", "all"], default="all")
+    args = parser.parse_args()
+    os.makedirs("plots", exist_ok=True)
+    # Example usage for morning shift:
+    if args.mode in ["reward", "all"]:
+        plot_reward_curve("plots/wandb_simulated.json")
+    if args.mode in ["accuracy", "all"]:
+        # Placeholder numbers (to be updated by Divyank/Deepak's eval)
+        plot_accuracy_comparison(baseline_acc=32, trained_acc=68)
+    if args.mode in ["cwe", "all"]:
+        # Placeholder data
+        cwe_data = {
+            "CWE-89": [40, 85],
+            "CWE-119": [30, 60],
+            "CWE-79": [25, 70],
+            "CWE-20": [35, 55]
+        }
+        plot_per_cwe_breakdown(cwe_data)

scripts/preprocess_devign.py ADDED Viewed

	@@ -0,0 +1,277 @@

+import argparse
+import json
+import random
+from collections import Counter
+from pathlib import Path
+def _read_jsonl(path: Path) -> list[dict]:
+    rows = []
+    for line in path.read_text(encoding="utf-8").splitlines():
+        line = line.strip()
+        if not line:
+            continue
+        rows.append(json.loads(line))
+    return rows
+def _write_jsonl(path: Path, rows: list[dict]) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    with path.open("w", encoding="utf-8", newline="\n") as f:
+        for r in rows:
+            f.write(json.dumps(r, ensure_ascii=False) + "\n")
+# ---------------------------------------------------------------------------
+# Fix 2: CWE classification using vulnerable lines, not the whole function.
+# Scored rules — highest-scoring match wins. Falls back to CWE-OTHER.
+# ---------------------------------------------------------------------------
+_CWE_RULES: list[tuple[str, list[str], int]] = [
+    ("CWE-119", ["memcpy", "strcpy", "strcat", "strncpy", "memmove", "sprintf",
+                  "gets(", "buffer", "overflow", "oob", "av_malloc", "av_realloc",
+                  "realloc", "malloc", "alloc", "g_malloc", "g_realloc",
+                  "qemu_malloc", "len ", "length", "copy_from", "copy_to"], 5),
+    ("CWE-476", ["null", "nullptr", "!= null", "== null", "if (!",
+                  "dereference", "segfault", "!obj", "!ctx", "!s->", "!p"], 5),
+    ("CWE-189", ["integer overflow", "signedness", "truncat", "wrap",
+                  "size_t", "underflow", "narrowing", "(int)", "(uint",
+                  "(unsigned)", ">> ", "<< ", "0xffff", "max_", "min_"], 5),
+    ("CWE-78",  ["system(", "popen(", "exec(", "execve", "shell",
+                  "command", "subprocess"], 8),
+    ("CWE-22",  ["../", "..\\", "traversal", "chroot", "realpath",
+                  "canonicalize", "symlink", "path"], 7),
+    ("CWE-89",  ["sql", "query", "select ", "insert ", "union ",
+                  "prepared", "sqlite", "mysql"], 7),
+    ("CWE-79",  ["xss", "innerhtml", "script", "sanitize", "escape",
+                  "htmlentit", "content-type"], 6),
+    ("CWE-20",  ["valid", "saniti", "untrusted", "input", "bounds",
+                  "assert", "range", "check", "error", "return -1",
+                  "goto fail", "goto err", "goto out"], 2),
+]
+def infer_cwe(vul_lines_code: list[str], func: str) -> str:
+    vul_text = " ".join(vul_lines_code).lower() if vul_lines_code else ""
+    func_text = func.lower()
+    best_cwe, best_score = "CWE-OTHER", 0
+    for cwe, keywords, weight in _CWE_RULES:
+        vul_hits = sum(1 for k in keywords if k in vul_text) if vul_text else 0
+        func_hits = sum(1 for k in keywords if k in func_text)
+        score = vul_hits * weight + func_hits * (weight // 2)
+        if score > best_score:
+            best_cwe, best_score = cwe, score
+    if best_score < 2:
+        return "CWE-OTHER"
+    return best_cwe
+# ---------------------------------------------------------------------------
+# Fix 1: Real unified diffs from per-line vulnerability labels.
+# ---------------------------------------------------------------------------
+def _build_diff(func: str, label: list[int], rng: random.Random, is_vuln: bool) -> str:
+    lines = func.splitlines()
+    if is_vuln and label and len(label) == len(lines):
+        changed_indices = {i for i, l in enumerate(label) if l == 1}
+    elif is_vuln and label and any(l == 1 for l in label):
+        changed_indices = {i for i, l in enumerate(label) if l == 1}
+    else:
+        block_size = max(1, min(5, len(lines) // 4))
+        start = rng.randint(0, max(0, len(lines) - block_size))
+        changed_indices = set(range(start, min(start + block_size, len(lines))))
+    if not changed_indices:
+        changed_indices = {0}
+    ctx = 3
+    visible: set[int] = set()
+    for ci in changed_indices:
+        for offset in range(-ctx, ctx + 1):
+            idx = ci + offset
+            if 0 <= idx < len(lines):
+                visible.add(idx)
+    sorted_visible = sorted(visible)
+    hunks: list[list[int]] = []
+    current_hunk: list[int] = []
+    for idx in sorted_visible:
+        if current_hunk and idx > current_hunk[-1] + 1:
+            hunks.append(current_hunk)
+            current_hunk = [idx]
+        else:
+            current_hunk.append(idx)
+    if current_hunk:
+        hunks.append(current_hunk)
+    diff_parts = ["--- a/source.c", "+++ b/source.c"]
+    for hunk in hunks:
+        start_line = hunk[0] + 1
+        hunk_size = len(hunk)
+        diff_parts.append(f"@@ -{start_line},{hunk_size} +{start_line},{hunk_size} @@")
+        for idx in hunk:
+            line = lines[idx]
+            if idx in changed_indices:
+                diff_parts.append(f"+{line}")
+            else:
+                diff_parts.append(f" {line}")
+    return "\n".join(diff_parts)
+# ---------------------------------------------------------------------------
+# Fix 3: CWE rebalancing — cap dominant CWEs, merge tiny ones.
+# ---------------------------------------------------------------------------
+_MAX_PER_CWE_FRAC = 0.25
+_MIN_CWE_SAMPLES = 20
+def _rebalance(samples: list[dict], rng: random.Random, limit: int) -> list[dict]:
+    by_cwe: dict[str, list[dict]] = {}
+    for s in samples:
+        by_cwe.setdefault(s["cwe"] or "CWE-OTHER", []).append(s)
+    for cwe, items in list(by_cwe.items()):
+        if len(items) < _MIN_CWE_SAMPLES and cwe != "CWE-OTHER":
+            by_cwe.setdefault("CWE-OTHER", []).extend(items)
+            for item in items:
+                item["cwe"] = "CWE-OTHER"
+            del by_cwe[cwe]
+    cap = int(limit * _MAX_PER_CWE_FRAC)
+    kept: list[dict] = []
+    for cwe, items in by_cwe.items():
+        rng.shuffle(items)
+        kept.extend(items[:cap])
+    rng.shuffle(kept)
+    return kept[:limit]
+def main() -> None:
+    ap = argparse.ArgumentParser(description="Preprocess Devign-derived samples into CommitGuard JSONL.")
+    ap.add_argument("--in", dest="inp", type=Path, default=None, help="Optional input JSONL.")
+    ap.add_argument("--out", dest="out", type=Path, default=Path("data/devign_filtered.jsonl"))
+    ap.add_argument("--test-out", dest="test_out", type=Path, default=Path("data/devign_test.jsonl"))
+    ap.add_argument("--limit", type=int, default=5000)
+    ap.add_argument("--test-limit", type=int, default=100)
+    ap.add_argument("--seed", type=int, default=42)
+    args = ap.parse_args()
+    rng = random.Random(args.seed)
+    if args.inp is None:
+        try:
+            from datasets import load_dataset
+            print("Loading DetectVul/devign from Hugging Face...")
+            ds = load_dataset('DetectVul/devign', split='train')
+            raw_rows = list(ds)
+            print(f"Loaded {len(raw_rows)} rows from HF.")
+        except Exception as e:
+            print(f"Failed to load from HF: {e}")
+            return
+    else:
+        raw_rows = _read_jsonl(args.inp)
+    all_samples: list[dict] = []
+    # Process all rows first
+    seen_ids = set()
+    for i, r in enumerate(raw_rows):
+        func = r.get("func")
+        if not func:
+            continue
+        if len(func.split("\n")) > 80:
+            continue
+        target = bool(r.get("target", False))
+        label = r.get("label", [])
+        vul_lines_code = []
+        vl = r.get("vul_lines")
+        if vl and isinstance(vl, dict):
+            vul_lines_code = vl.get("code", [])
+        cwe = infer_cwe(vul_lines_code, func) if target else None
+        diff = _build_diff(func, label, rng, target)
+        # Ensure unique sample_id
+        original_id = str(r.get("commit_id") or r.get("id") or f"row-{i}")
+        sample_id = original_id
+        suffix = 0
+        while sample_id in seen_ids:
+            suffix += 1
+            sample_id = f"{original_id}_{suffix}"
+        seen_ids.add(sample_id)
+        target_file = "source.c"
+        sample = {
+            "sample_id": sample_id,
+            "diff": diff,
+            "available_files": [target_file],
+            "is_vulnerable": target,
+            "cwe": cwe,
+            "target_file": target_file,
+            "files": {target_file: func},
+        }
+        all_samples.append(sample)
+    print(f"Total processed samples: {len(all_samples)}")
+    # Shuffle and split to ensure NO overlap
+    rng.shuffle(all_samples)
+    # We want to ensure test set has all CWEs if possible
+    # Let's pick test set first by picking a few from each CWE
+    test_samples: list[dict] = []
+    vuln_all = [s for s in all_samples if s["is_vulnerable"]]
+    safe_all = [s for s in all_samples if not s["is_vulnerable"]]
+    by_cwe: dict[str, list[dict]] = {}
+    for s in vuln_all:
+        by_cwe.setdefault(s["cwe"] or "CWE-OTHER", []).append(s)
+    # Try to pick 5 from each CWE for test set
+    for cwe in by_cwe:
+        test_samples.extend(by_cwe[cwe][:5])
+        by_cwe[cwe] = by_cwe[cwe][5:]
+    # Fill the rest of test set with random samples (half vuln, half safe)
+    remaining_vuln = [s for items in by_cwe.values() for s in items]
+    needed_vuln = (args.test_limit // 2) - sum(1 for s in test_samples if s["is_vulnerable"])
+    if needed_vuln > 0:
+        test_samples.extend(remaining_vuln[:needed_vuln])
+        remaining_vuln = remaining_vuln[needed_vuln:]
+    needed_safe = args.test_limit - len(test_samples)
+    test_samples.extend(safe_all[:needed_safe])
+    safe_all = safe_all[needed_safe:]
+    # Now remaining samples go to train
+    train_pool_vuln = remaining_vuln
+    train_pool_safe = safe_all
+    print(f"Test set: {len(test_samples)} samples")
+    _write_jsonl(args.test_out, test_samples)
+    # Rebalance training set
+    target_each = args.limit // 2
+    vuln_keep = _rebalance(train_pool_vuln, rng, target_each)
+    safe_keep = rng.sample(train_pool_safe, min(target_each, len(train_pool_safe)))
+    train_rows = vuln_keep + safe_keep
+    rng.shuffle(train_rows)
+    _write_jsonl(args.out, train_rows)
+    print(f"Wrote {len(train_rows)} training samples to {args.out}")
+    print(f"Wrote {len(test_samples)} test samples to {args.test_out}")
+if __name__ == "__main__":
+    main()

scripts/run_and_plot_baseline.py ADDED Viewed

	@@ -0,0 +1,55 @@

+from __future__ import annotations
+import argparse
+import json
+from pathlib import Path
+import sys
+def main() -> None:
+    ap = argparse.ArgumentParser(description="Run a tiny baseline and save a reward-curve PNG.")
+    ap.add_argument("--episodes", type=int, default=200)
+    ap.add_argument("--out-dir", type=Path, default=Path("plots"))
+    args = ap.parse_args()
+    # Allow running from a fresh clone without `pip install -e .`.
+    repo_root = Path(__file__).resolve().parent.parent
+    sys.path.insert(0, str(repo_root))
+    # Local, in-process baseline (no server needed).
+    from commitguard_env.environment import CommitGuardEnvironment
+    from commitguard_env.models import CommitGuardAction
+    data_path = repo_root / "data" / "devign_filtered.jsonl"
+    env = CommitGuardEnvironment(data_path=data_path)
+    rewards: list[float] = []
+    for _ in range(args.episodes):
+        _ = env.reset()
+        # Naive always-vulnerable verdict baseline (intentionally dumb).
+        action = CommitGuardAction(
+            action_type="verdict",
+            is_vulnerable=True,
+            vuln_type="CWE-89",
+            exploit_sketch="sql select where concat injection",
+        )
+        _obs, reward, _done = env.step(action)
+        rewards.append(float(reward))
+    args.out_dir.mkdir(parents=True, exist_ok=True)
+    (args.out_dir / "baseline_rewards.json").write_text(json.dumps(rewards), encoding="utf-8")
+    import matplotlib.pyplot as plt
+    plt.figure(figsize=(8, 4))
+    plt.plot(rewards, linewidth=1)
+    plt.title("CommitGuard baseline reward curve (naive always-vulnerable)")
+    plt.xlabel("Episode")
+    plt.ylabel("Reward")
+    plt.tight_layout()
+    plt.savefig(args.out_dir / "baseline_reward_curve.png", dpi=180)
+if __name__ == "__main__":
+    main()

scripts/train_grpo.py ADDED Viewed

	@@ -0,0 +1,173 @@

+import os
+import sys
+import json
+import argparse
+from pathlib import Path
+import torch
+import wandb
+from datasets import Dataset, load_dataset
+from trl import GRPOConfig, GRPOTrainer
+from unsloth import FastLanguageModel, PatchFastRL
+sys.path.insert(0, str(Path(__file__).resolve().parent))
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+from agent_prompt import SYSTEM_PROMPT, get_agent_prompt
+from commitguard_env.parse_action import parse_action
+from commitguard_env.reward import compute_reward
+PatchFastRL("GRPO", FastLanguageModel)
+# --- Configuration ---
+MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.2-3B-Instruct")
+OUTPUT_DIR = os.getenv("OUTPUT_DIR", "outputs/commitguard-llama-3b-grpo")
+WANDB_PROJECT = os.getenv("WANDB_PROJECT", "commitguard")
+REPO_ROOT = Path(__file__).resolve().parent.parent
+CWE_KEYWORDS_PATH = REPO_ROOT / "data" / "cwe_keywords.json"
+CWE_KEYWORDS: dict[str, list[str]] = {}
+if CWE_KEYWORDS_PATH.exists():
+    CWE_KEYWORDS = json.loads(CWE_KEYWORDS_PATH.read_text(encoding="utf-8"))
+# Pre-built lookup: sample_id -> ground truth fields (loaded in build_dataset)
+SAMPLE_LABELS: dict[str, dict] = {}
+# --- Local reward: no HTTP, no latency ---
+def get_reward_local(prompts, completions, sample_id, **kwargs) -> list[float]:
+    rewards = []
+    for p_id, completion in zip(sample_id, completions):
+        text = completion[-1]["content"] if isinstance(completion, list) else str(completion)
+        action = parse_action(text)
+        labels = SAMPLE_LABELS.get(p_id, {})
+        reward = compute_reward(
+            action=action,
+            is_vulnerable=labels.get("is_vulnerable"),
+            cwe=labels.get("cwe"),
+            target_file=labels.get("target_file"),
+            cwe_keywords=CWE_KEYWORDS,
+            context_requests=0,
+        )
+        rewards.append(reward)
+    return rewards
+def format_prompt(sample):
+    # Using the Llama-3.2 prompt template from the plan
+    return {
+        "prompt": [
+            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "user", "content": f"Analyze this commit and submit your verdict.\n\nCode diff:\n```diff\n{sample['diff']}\n```"},
+        ],
+        "sample_id": sample["sample_id"],
+    }
+def build_dataset(n_samples: int) -> Dataset:
+    data_path = REPO_ROOT / "data" / "devign_filtered.jsonl"
+    if not data_path.exists():
+        print(f"Dataset file {data_path} not found.")
+        return Dataset.from_list([])
+    print(f"Loading training samples from {data_path}...")
+    raw_dataset = load_dataset("json", data_files=str(data_path), split="train")
+    raw_dataset = raw_dataset.select(range(min(n_samples, len(raw_dataset))))
+    for row in raw_dataset:
+        sid = row["sample_id"]
+        SAMPLE_LABELS[sid] = {
+            "is_vulnerable": row.get("is_vulnerable"),
+            "cwe": row.get("cwe"),
+            "target_file": row.get("target_file"),
+        }
+    dataset = raw_dataset.map(format_prompt)
+    print(f"Loaded {len(dataset)} samples ({len(SAMPLE_LABELS)} labels cached in-process).")
+    return dataset
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--samples", type=int, default=200)
+    ap.add_argument("--max-steps", type=int, default=300)
+    ap.add_argument("--save-steps", type=int, default=50)
+    ap.add_argument("--num-generations", type=int, default=8)
+    ap.add_argument("--batch-size", type=int, default=1)
+    ap.add_argument("--grad-accum", type=int, default=4)
+    ap.add_argument("--lr", type=float, default=5e-6)
+    ap.add_argument("--no-wandb", action="store_true")
+    ap.add_argument("--push-to-hub", action="store_true")
+    ap.add_argument("--hub-model-id", type=str, default="inmodel-labs/commitguard-llama-3b")
+    args = ap.parse_args()
+    if not args.no_wandb:
+        wandb.init(project=WANDB_PROJECT, name=f"grpo-{MODEL_NAME.split('/')[-1]}-run1")
+    # 1. Load Model
+    print(f"Loading {MODEL_NAME} with Unsloth 4-bit...")
+    model, tokenizer = FastLanguageModel.from_pretrained(
+        model_name=MODEL_NAME,
+        max_seq_length=2048,
+        load_in_4bit=True,
+        fast_inference=True,
+        max_lora_rank=16,
+    )
+    model = FastLanguageModel.get_peft_model(
+        model,
+        r=8,
+        target_modules=[
+            "q_proj", "k_proj", "v_proj", "o_proj",
+            "gate_proj", "up_proj", "down_proj",
+        ],
+        lora_alpha=16,
+        lora_dropout=0,
+        bias="none",
+        use_gradient_checkpointing="unsloth",
+        random_state=3407,
+    )
+    # 2. Build dataset
+    dataset = build_dataset(args.samples)
+    # 3. GRPO config
+    training_args = GRPOConfig(
+        output_dir=OUTPUT_DIR,
+        num_generations=args.num_generations,
+        max_completion_length=256,
+        per_device_train_batch_size=args.batch_size,
+        gradient_accumulation_steps=args.grad_accum,
+        learning_rate=args.lr,
+        logging_steps=1,
+        save_steps=args.save_steps,
+        max_steps=args.max_steps,
+        report_to="none" if args.no_wandb else "wandb",
+        bf16=torch.cuda.is_bf16_supported(),
+        fp16=not torch.cuda.is_bf16_supported(),
+    )
+    # 4. Train
+    trainer = GRPOTrainer(
+        model=model,
+        processing_class=tokenizer,
+        reward_funcs=[get_reward_local],
+        args=training_args,
+        train_dataset=dataset,
+    )
+    print("Starting GRPO training...")
+    trainer.train()
+    # 5. Save
+    final_dir = f"{OUTPUT_DIR}/final"
+    model.save_pretrained_merged(final_dir, tokenizer, save_method="lora")
+    print(f"Training complete. LoRA adapter saved to {final_dir}")
+    if args.push_to_hub:
+        print(f"Pushing to HF Hub: {args.hub_model_id}")
+        model.push_to_hub(args.hub_model_id, token=True)
+        tokenizer.push_to_hub(args.hub_model_id, token=True)
+if __name__ == "__main__":
+    main()

scripts/verify_3_action_loop.py ADDED Viewed

	@@ -0,0 +1,70 @@

+import requests
+import json
+import sys
+def test_loop():
+    base_url = "http://localhost:8000"
+    print("--- Phase 1: Reset ---")
+    r = requests.post(f"{base_url}/reset")
+    if r.status_code != 200:
+        print(f"FAILED: Reset returned {r.status_code}")
+        return
+    data = r.json()
+    print(f"Full response keys: {list(data.keys())}")
+    obs = data["observation"]
+    print(f"Observation value: {obs}")
+    episode_id = obs["episode_id"]
+    print(f"Observation keys: {list(obs.keys())}")
+    print(f"Episode ID: {episode_id}")
+    print(f"Diff length: {len(obs['diff'])}")
+    # Verify no leak
+    forbidden = ["is_vulnerable", "cwe", "cwe_type", "label"]
+    for f in forbidden:
+        if f in obs:
+            print(f"CRITICAL LEAK: '{f}' found in observation!")
+            sys.exit(1)
+    print("\n--- Phase 2: Action 'request_context' ---")
+    # Using the first available file if any
+    file_to_req = obs["available_files"][0] if obs["available_files"] else "unknown.c"
+    action = {
+        "action": f"<action><action_type>request_context</action_type><file_path>{file_to_req}</file_path></action>"
+    }
+    r = requests.post(f"{base_url}/step", json=action)
+    res = r.json()
+    print(f"Status: {r.status_code}, Reward: {res['reward']}, Done: {res['done']}")
+    print(f"Context snippets returned: {len(res['observation'].get('context_snippets', []))}")
+    print("\n--- Phase 3: Action 'analyze' ---")
+    action = {
+        "action": "<action><action_type>analyze</action_type><reasoning>Thinking about the pointer arithmetic in the diff...</reasoning></action>"
+    }
+    r = requests.post(f"{base_url}/step", json=action)
+    res = r.json()
+    print(f"Status: {r.status_code}, Reward: {res['reward']}, Done: {res['done']}")
+    print("\n--- Phase 4: Action 'verdict' ---")
+    action = {
+        "action": "<action><action_type>verdict</action_type><is_vulnerable>true</is_vulnerable><vuln_type>CWE-119</vuln_type><exploit_sketch>buffer overflow via unchecked memcpy</exploit_sketch></action>"
+    }
+    r = requests.post(f"{base_url}/step", json=action)
+    res = r.json()
+    print(f"Status: {r.status_code}, Reward: {res['reward']}, Done: {res['done']}")
+    print(f"Final Info: {res.get('info', 'No info')}")
+    print("\n--- Phase 5: Verify State (No Leaks) ---")
+    r = requests.get(f"{base_url}/state")
+    data = r.json()
+    state = data["state"]
+    print(f"State Episode ID: {state['episode_id']}")
+    print(f"Step Count: {state['step_count']}")
+    for f in forbidden:
+        if f in state:
+             # state() is allowed internal metadata, but the PRD says it shouldn't leak to agent.
+             # environment.py says: "state() must not leak labels; returning empty is fine"
+             print(f"LEAK WARNING: '{f}' found in state output!")
+if __name__ == "__main__":
+    test_loop()

server/__init__.py ADDED Viewed

File without changes

server/app.py ADDED Viewed

	@@ -0,0 +1,7 @@

+from commitguard_env.server import app, main as server_main
+def main():
+    server_main()
+if __name__ == "__main__":
+    main()