Nitishkumar-ai commited on
Commit
b74db43
·
0 Parent(s):

Initial clean deploy commit

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
.agent/FUTURE_WORK.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--
2
+ If an agent is tempted to build something not in the current scope, append it here instead and continue with the locked task.
3
+
4
+ Source: ../prd.md 14 (Future Work). Do not execute these during the hackathon build unless explicitly re-scoped by the whole team (and documented).
5
+ -->
6
+
7
+ ## Future Work (post-hackathon)
8
+
9
+ - **Sandboxed exploit execution** replace pattern-match reward with actual exploit runs against compiled code in a Docker sandbox
10
+ - **Multi-file commit reasoning** extend the env to support diffs spanning multiple files, with a context budget
11
+ - **Self-play loop** pair CommitGuard with a code-generation agent; defender and attacker train against each other (the AlphaGo pattern for security)
12
+ - **Agentic harness integration** wire into real CI pipelines via the OpenEnv MCP layer, enabling commit-time security review at PR open
13
+ - **Real CVE corpus** extend beyond Devign to recent CVE-tagged commits from major open-source repos
14
+ - **Multi-language support** current env is C-focused via Devign; extend to Python, JavaScript, Go
15
+ - **Reward shape ablations** formal study of how reward composition affects which vulnerability types the model learns fastest
16
+
.agent/README.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## What this folder is
2
+
3
+ `.agent/` is the **operating system for AI agents** on this repo. It locks the architecture decisions from `../prd.md`, prevents scope creep under deadline pressure, and makes sure three engineers can use Cursor / Claude Code in parallel without drifting.
4
+
5
+ If you're an agent: **load `project_context.md` first**. If you're a human: treat this folder like the team's constitution.
6
+
7
+ ## Nonnegotiable rule (scope freeze)
8
+
9
+ **Scope freeze is midnight Saturday (00:00 IST).** After that time:
10
+ - Do not add features, endpoints, model changes, UI, or nice to haves.
11
+ - Only do bug fixes, tests, wiring, docs, and reliability work that protects the locked deliverables.
12
+ - If youre tempted to add something: append it to `FUTURE_WORK.md` and continue the locked task.
13
+
14
+ ## Files and what each enforces
15
+
16
+ - `project_context.md`: **Single source of truth**. The compressed PRD: what were building, why, who for, locked stack, 30sec pitch, nongoals.
17
+ - `architecture.md`: **Technical contract**. File layout, dataclass schemas, XML action format, reward signature, observation schema, cheating prevention, required HTTP endpoints.
18
+ - `coding_conventions.md`: **How we write code**. Typed dataclasses, import order, errors, forbidden patterns, repo hygiene.
19
+ - `decision_log.md`: **Locked decisions + fallbacks**. PRD 7.1 in table form, PRD 7.2 fallback triggers. New decisions go here with timestamp+author.
20
+ - `agent_instructions.md`: **System prompt** for any coding agent. Read order, refusal rules, time pressure behavior, fallback triggers.
21
+ - `checkpoints.md`: **Team sync contract** at midnight / 9 AM / 3 PM. What must be demoable; what triggers scope cuts; what gets cut first.
22
+ - `test_contracts.md`: **Blocking tests** required before merge: no-leak, reward cases, XML parser robustness, env smoke.
23
+ - `git_workflow.md`: **Parallel work rules**. Branch naming, commit conventions, merge gates, no-force-push rules, pre-submission checklist.
24
+ - `FUTURE_WORK.md`: **Parking lot** for anything not in current scope (pre-populated from PRD 14).
25
+
26
+ ## Where the real spec lives
27
+
28
+ The authoritative PRD is `../prd.md`. If any `.agent/` file disagrees with the PRD, **the PRD wins** and you must update the `.agent/` file immediately.
29
+
30
+ ## Task files (per person)
31
+
32
+ This repo expects per-person task lists:
33
+ - `../tasks_niti.md`
34
+ - `../tasks_deepak.md`
35
+ - `../tasks_divyank.md`
36
+
37
+ If they dont exist yet, create them now with 1020 bullet tasks each and keep them updated. Agents should read the relevant one **after** `project_context.md` and `architecture.md`.
38
+
.agent/agent_instructions.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## System prompt for CommitGuard coding agents
2
+
3
+ You are an AI coding agent working on the **CommitGuard** hackathon repo.
4
+
5
+ Your job is to ship the locked deliverables before **Sunday 5:00 PM IST** with minimal risk. This is a **deadline game**, not a feature game.
6
+
7
+ ### Read order (mandatory)
8
+
9
+ 1. Read `.agent/project_context.md` (single source of truth).
10
+ 2. Read `.agent/architecture.md` (technical contract).
11
+ 3. Read `.agent/coding_conventions.md` (how we write code).
12
+ 4. Read the relevant task list:
13
+ - `tasks_niti.md` OR `tasks_deepak.md` OR `tasks_divyank.md`
14
+ - If missing: create it with concrete bullets and continue.
15
+
16
+ Only then start coding.
17
+
18
+ ### Scope control (hard refusal rule)
19
+
20
+ **Scope freeze is midnight Saturday (00:00 IST).** After that:
21
+ - Refuse any scope expansion, new features, new endpoints, new UI, new metrics.
22
+ - Only do: bug fixes, tests, wiring, packaging, docs, reliability.
23
+
24
+ If asked to add a feature:
25
+ - Do **not** implement it.
26
+ - Append it to `.agent/FUTURE_WORK.md` with 1-line rationale.
27
+ - Continue the locked task.
28
+
29
+ ### Architectural choices (dont guess)
30
+
31
+ If a decision is not covered by `.agent/architecture.md`:
32
+ - Ask for clarification (or check `../prd.md`).
33
+ - Do not invent new schemas or endpoints because it seems right.
34
+
35
+ ### Cheating prevention (highest priority constraint)
36
+
37
+ The environment is RLVR: reward comes from dataset ground truth, but the agent must never see labels.
38
+
39
+ Rules:
40
+ - Observations must never contain ground truth (`is_vulnerable`, `cwe`, labels, this is vulnerable strings).
41
+ - The server must never return label fields in HTTP responses.
42
+ - Debug endpoints must never include ground truth.
43
+ - Always keep `test_no_leak.py` green.
44
+
45
+ ### Time-pressure behavior (what good looks like)
46
+
47
+ Under deadline pressure:
48
+ - Prefer the simplest implementation that passes the contracts in `.agent/test_contracts.md`.
49
+ - Treat the fallbacks in `.agent/project_context.md` as pre-approved pivots; if triggered, pivot immediately and log in `.agent/decision_log.md`.
50
+ - Avoid refactors unless they remove a clear blocker.
51
+
52
+ ### Fallback triggers (execute immediately)
53
+
54
+ If any trigger happens, switch to the fallback with no debate:
55
+ - OOM on A10G Qwen2.5-1.5B-Instruct
56
+ - HF Jobs queue >30 min GCP A10G on-demand
57
+ - 3-action env not shipped by midnight 2-action env
58
+ - Tiered reward buggy binary reward only
59
+ - Curve flat at 10 AM Sunday qualitative narrative
60
+ - Video recording fails twice text trace in README
61
+
62
+ ### CLI-first ops (HF + GCP)
63
+
64
+ Prefer repeatable CLI commands over UI clicks:
65
+ - HF Space + repos: use `huggingface-cli` / git
66
+ - GCP: use `gcloud`
67
+
68
+ Document any required commands in `README.md` or `scripts/`.
69
+
.agent/architecture.md ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Architecture contract (do not improvise)
2
+
3
+ This is the technical contract for CommitGuard. If youre about to invent a new shape, dont. Either its already here, or it belongs in `FUTURE_WORK.md`.
4
+
5
+ Authoritative source: `../prd.md` (58).
6
+
7
+ ## Repo layout (locked)
8
+
9
+ Target layout (names are contracts; adjust only if repo already differs):
10
+
11
+ - `commitguard_env/`
12
+ - `models.py` typed dataclasses: `Action`, `Observation`, `EnvState`, `GroundTruth`
13
+ - `parse_action.py` XML action parser (robust to malformed output)
14
+ - `reward.py` `compute_reward(...) -> float` (pure function)
15
+ - `environment.py` `CommitGuardEnvironment` implementing OpenEnv reset/step/state
16
+ - `server.py` FastAPI app exposing OpenEnv HTTP endpoints
17
+ - `data/`
18
+ - `devign_filtered.jsonl` dataset embedded in Docker image
19
+ - `cwe_keywords.json` top-10 CWE keyword map (for exploit sketch bonus)
20
+ - `tests/` blocking tests listed in `test_contracts.md`
21
+ - `scripts/` dataset preprocessing and ops scripts (CLI-first)
22
+ - `README.md` story + links + how to run
23
+
24
+ If the codebase already has a different structure, keep the same semantics and update this file to match.
25
+
26
+ ## Dataclass schemas (typed; no untyped dicts in public APIs)
27
+
28
+ All public shapes are typed dataclasses. Internal parsing may use dicts, but boundaries must be dataclasses.
29
+
30
+ ### `Action`
31
+
32
+ - **Raw input**: `raw_action: str` (the model output)
33
+ - **Parsed**:
34
+ - `action_type: Literal["request_context", "analyze", "verdict"]`
35
+ - `fields: ActionFields` (typed union by action_type)
36
+
37
+ ### `Observation` (cheating-prevention critical)
38
+
39
+ Must include only:
40
+ - `episode_id: str`
41
+ - `step_idx: int`
42
+ - `diff: str` (code_before/code_after diff or unified diff string)
43
+ - `repo_files: list[str]` (or `available_files`)
44
+ - `context_snippets: list[ContextSnippet]` (only if requested)
45
+ - `budget_remaining: int`
46
+ - `error: str | None` (for malformed actions, etc.)
47
+
48
+ Must **never** include:
49
+ - `is_vulnerable`, `label`, `ground_truth`, `cwe_type`, `target_file_with_label`
50
+ - anything that trivially implies the label (e.g., this sample is vulnerable)
51
+
52
+ ### `GroundTruth` (server-only)
53
+
54
+ Lives only on the server. Never serialized into observations.
55
+ - `is_vulnerable: bool`
56
+ - `cwe: str | None`
57
+ - `target_file: str`
58
+ - `exploit_keywords: list[str]` (or derived via CWE map)
59
+
60
+ ## Cheating-prevention rule (non-negotiable)
61
+
62
+ **Observation must never contain ground truth.** Reward is the only scalar feedback; it must not leak label via strings or metadata.
63
+
64
+ Enforcement:
65
+ - observation schema excludes forbidden fields
66
+ - `tests/test_no_leak.py` asserts forbidden keys and suspicious strings never appear
67
+ - server returns reward as a float only; never returns label/cwe for debugging
68
+
69
+ ## Episode contract
70
+
71
+ - Max **5 steps** per episode.
72
+ - Episode ends when `verdict` is received OR budget hits zero.
73
+ - `request_context` consumes budget and has per-step penalty.
74
+ - `analyze` is allowed, logged, and should not affect reward directly.
75
+
76
+ ## Reward function (signature + invariants)
77
+
78
+ Reward is RLVR: computed from ground truth and simple keyword checks, **not** an LLM judge.
79
+
80
+ Signature:
81
+
82
+ ```python
83
+ def compute_reward(
84
+ action: "Action",
85
+ ground_truth: "GroundTruth",
86
+ *,
87
+ cwe_keywords: dict[str, list[str]],
88
+ context_requests: int,
89
+ ) -> float: ...
90
+ ```
91
+
92
+ Reward shape (from PRD):
93
+ - correct vulnerable/safe: **+1.0**
94
+ - correct CWE (when vulnerable): **+0.5**
95
+ - plausible exploit sketch (keyword match): **+0.5**
96
+ - false positive: **-1.0**
97
+ - false negative: **-0.5**
98
+ - per context request: **-0.05**
99
+ - malformed action: penalize (recommended **-0.5**) but do not crash
100
+
101
+ ## XML action format (the model output contract)
102
+
103
+ Model outputs exactly one top-level `<action>` block. Parser must tolerate:
104
+ - extra whitespace
105
+ - missing fields (treated as malformed)
106
+ - wrong casing (normalize)
107
+ - stray text before/after tags
108
+ - malformed XML (best-effort extraction; never crash)
109
+
110
+ ### Spec
111
+
112
+ Top-level:
113
+ - `<action>`
114
+ - `<action_type>request_context|analyze|verdict</action_type>`
115
+ - `<fields>...</fields>`
116
+ - `</action>`
117
+
118
+ Fields by type:
119
+
120
+ **request_context**
121
+ - `<file_path>path/in/repo.ext</file_path>`
122
+ - optional: `<start_line>int</start_line>`, `<end_line>int</end_line>`
123
+
124
+ **analyze**
125
+ - `<reasoning>free text</reasoning>`
126
+
127
+ **verdict**
128
+ - `<is_vulnerable>true|false</is_vulnerable>`
129
+ - `<vuln_type>CWE-79|CWE-89|...|NONE</vuln_type>`
130
+ - `<exploit_sketch>free text</exploit_sketch>`
131
+
132
+ Parsing rules:
133
+ - if `action_type` missing/invalid malformed
134
+ - booleans accept `true/false/1/0/yes/no` (case-insensitive)
135
+ - `vuln_type` normalized; if safe verdict, allow `NONE`
136
+ - on malformed: return a safe `Action` with `action_type="analyze"` and `error` set, and apply malformed penalty
137
+
138
+ ## Env server HTTP endpoints (P0)
139
+
140
+ The env server must expose these endpoints (names from PRD 8.1):
141
+
142
+ - `GET /health` 200 OK and simple JSON payload
143
+ - `POST /reset` returns initial `Observation` (+ episode id)
144
+ - `POST /step` accepts raw action string, returns `{observation, reward, done, info}`
145
+ - `GET /state` returns minimal server/env state for debugging (no ground truth)
146
+ - `GET /docs` FastAPI OpenAPI docs (automatic)
147
+
148
+ Do not add new endpoints after scope freeze unless required for reliability.
149
+
.agent/checkpoints.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Checkpoints (sync-or-die contract)
2
+
3
+ Goal: keep three engineers aligned and prevent cool demo scope creep from killing the submission. Source: `../prd.md` 12.
4
+
5
+ ### Checkpoint 1 Midnight (00:00 IST) scope freeze + Phase 1 gate
6
+
7
+ **Everyone must demonstrate (live, locally or on Space):**
8
+ - **Env server runs** and responds to `GET /health`
9
+ - **OpenEnv loop works**: `reset` `step` done, without crashing
10
+ - **Action parser is robust**: malformed XML doesnt crash; returns safe error
11
+ - **No-leak invariant**: observation contains no ground truth fields
12
+
13
+ **Role deliverables:**
14
+ - **Env/Server owner**: endpoints exist (`/health`, `/reset`, `/step`, `/state`, `/docs`)
15
+ - **Reward owner**: reward function wired and deterministic on handcrafted cases
16
+ - **Training owner**: mock training loop can call env repeatedly (even if reward is dummy)
17
+
18
+ **If any of these are red, trigger a scope cut immediately:**
19
+ - 3-action env incomplete cut to 2-action env (analyze + verdict)
20
+ - Tiered reward unstable cut to binary reward only
21
+
22
+ **After this checkpoint:**
23
+ - **Scope freeze is active.** New features go to `.agent/FUTURE_WORK.md` only.
24
+
25
+ ### Checkpoint 2 9:00 AM Sunday training evidence gate
26
+
27
+ **Everyone must demonstrate:**
28
+ - Training run launched (HF Jobs A10G preferred) or fallback running
29
+ - Wandb logging works (reward curve visible)
30
+ - Evaluation script/notebook can run 100 held-out samples
31
+
32
+ **Scope-cut triggers:**
33
+ - Training blocked by infra >30 min move to GCP A10G fallback
34
+ - Training curve still flat by 10:00 AM commit to qualitative narrative (no more training tweaks)
35
+
36
+ **What gets cut first (in order):**
37
+ 1. P2 items (web UI polish, blog post)
38
+ 2. Per-CWE breakdown (keep overall accuracy)
39
+ 3. Exploit sketch bonus (keep binary + CWE if stable)
40
+ 4. CWE classification bonus (keep binary only)
41
+
42
+ ### Checkpoint 3 3:00 PM Sunday feature freeze gate
43
+
44
+ **Everyone must demonstrate:**
45
+ - HF Space is live and stable; `/health` 200; `/docs` loads
46
+ - `tests/` pass (see `.agent/test_contracts.md`)
47
+ - Demo artifact path is locked (video or text-trace fallback)
48
+ - README has all submission links (Space, notebook, video, wandb, repo)
49
+
50
+ **Hard rule:**
51
+ - **No changes after 3:00 PM** except emergency fixes that prevent submission failure.
52
+
53
+ **Final scope cuts (if needed to protect submission):**
54
+ 1. Video text trace in README
55
+ 2. Training curve single plot + narrative
56
+ 3. Held-out eval small N sanity check
57
+
.agent/coding_conventions.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Coding conventions (enforced under deadline pressure)
2
+
3
+ This repo is optimized for: **correctness, reproducibility, and not leaking labels**. Read `architecture.md` first.
4
+
5
+ ## Python style (hard rules)
6
+
7
+ - **Typed dataclasses everywhere** for public API shapes (actions/observations/state).
8
+ - Use `@dataclass(frozen=True, slots=True)` by default.
9
+ - Public functions must be type-annotated end-to-end.
10
+ - **No untyped dicts in public APIs.** Dicts are allowed only internally (e.g., during XML parse), and must be converted to dataclasses at the boundary.
11
+ - Keep functions small. Prefer pure functions (`reward.py`) with no hidden state.
12
+
13
+ ## Import ordering
14
+
15
+ 1. stdlib
16
+ 2. third-party
17
+ 3. local modules
18
+
19
+ Within a section: alphabetical. One import per line if it improves diff clarity.
20
+
21
+ ## Docstrings and naming
22
+
23
+ - Docstrings: short, imperative, include constraints (e.g., must not leak ground truth).
24
+ - Names: explicit over clever (`compute_reward`, `parse_action_xml`, `EpisodeState`).
25
+
26
+ ## Error handling patterns
27
+
28
+ - **Never crash on model output.** Malformed actions must be handled gracefully.
29
+ - Raise exceptions only for programmer errors; user/model errors return structured error fields.
30
+ - Every boundary (HTTP handlers, XML parser) must be defensive:
31
+ - validate inputs
32
+ - clamp budgets
33
+ - return safe defaults
34
+
35
+ ## Forbidden patterns (do not do these)
36
+
37
+ - **No LLM-as-judge in reward.** Reward must be verifiable (dataset truth + keyword checks). See `architecture.md`.
38
+ - **No label leakage**: do not log, return, or print ground truth in observations, HTTP responses, or debug endpoints.
39
+ - **No hardcoded local paths** (e.g., `C:\\Users\\...`, `/home/...`). Use repo-relative paths + `pathlib`.
40
+ - **No committing data files > 5MB** without explicit team sign-off. (If necessary, use HF Datasets or remote storage.)
41
+ - **No localStorage in any UI.** If you add UI later (unlikely), store state server-side or in-memory only.
42
+ - **No adding endpoints/features after scope freeze** (midnight Saturday).
43
+
44
+ ## Repo hygiene
45
+
46
+ - Prefer CLI-driven ops so teammates can reproduce quickly:
47
+ - HF: `huggingface-cli`, `hf` (where available), `git lfs` if needed
48
+ - GCP: `gcloud`
49
+ - Keep logs minimal. Under hackathon pressure, noisy logs hide real bugs.
50
+ - Dont vendor big artifacts in git. Link them (video, wandb, Space) from README.
51
+
52
+ ## Scope creep rule (non-negotiable)
53
+
54
+ If youre tempted to add a feature that isnt required for the locked deliverables:
55
+ - Append one bullet to `FUTURE_WORK.md` (with 1-line rationale).
56
+ - Return to your current task.
57
+
58
+ ## Cross-reference
59
+
60
+ - Architecture contract: `architecture.md`
61
+ - Scope and fallbacks: `project_context.md`
62
+ - Locked decisions: `decision_log.md`
63
+
.agent/decision_log.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Decision log (locked + fallbacks)
2
+
3
+ This file is a **contract**. It mirrors `../prd.md` 7.1 and 7.2.
4
+
5
+ If you want to change a decision: you dont. If you must due to a trigger, use the fallback and log it.
6
+
7
+ ## Locked technical decisions (PRD 7.1)
8
+
9
+ | Decision | Choice | Rationale |
10
+ |---|---|---|
11
+ | Env framework | Meta OpenEnv 0.2.3+ | Mandatory per submission rules |
12
+ | Server runtime | FastAPI in Docker | OpenEnv default, lowest friction |
13
+ | Hosting | Hugging Face Space | Mandatory; server+repo+registry |
14
+ | Data source | Devign (DetectBERT subset) | Real CWE labels, manageable size |
15
+ | Model | Llama-3.2-3B-Instruct | Meta-branded; fits A10G with GRPO |
16
+ | Training framework | TRL with GRPO | Native OpenEnv integration via reward funcs |
17
+ | Training optimization | Unsloth 4-bit + LoRA r=8 | Big memory reduction + speed |
18
+ | Training infra | HF Jobs A10G | Unattended, HF-native |
19
+ | Dev infra | GCP VM with T4 | Stable, no Colab disconnects |
20
+ | Action serialization | XML-tag free-text | Robust to small-model variance |
21
+ | Logging | Weights & Biases | TRL native; shareable runs |
22
+
23
+ ## Pre-approved fallback rules (PRD 7.2)
24
+
25
+ | If this fails | Fall back to | Trigger condition |
26
+ |---|---|---|
27
+ | Llama-3.2-3B OOM on A10G | Qwen2.5-1.5B-Instruct | First test step crashes |
28
+ | HF Jobs queue full | GCP A10G on-demand | Job queues for >30 min |
29
+ | 3-action env doesnt ship by midnight | 2-action env (analyze + verdict) | Midnight checkpoint is red |
30
+ | Tiered reward buggy | Binary correct/incorrect reward | Reward checkpoint is red |
31
+ | Training curve flat | Qualitative comparison only | Still flat at 10 AM Sunday |
32
+ | Demo video hard to record | Side-by-side text trace in README | Recording fails twice |
33
+
34
+ ## New decisions made during the build
35
+
36
+ Rule: any new decision must be logged here with timestamp + author and must not violate the locked PRD unless its a PRD-defined fallback.
37
+
38
+ Template:
39
+ - **[YYYY-MM-DD HH:MM IST] (author)**: decision rationale impact rollback plan
40
+
.agent/git_workflow.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Git workflow (parallel, safe, deadline-optimized)
2
+
3
+ This repo will have three engineers working in parallel with agents. The workflow exists to prevent integration chaos.
4
+
5
+ ## Branch naming (required)
6
+
7
+ Format: `<name>/<short-scope>`
8
+
9
+ Examples:
10
+ - `niti/env-scaffolding`
11
+ - `deepak/data-pipeline`
12
+ - `divyank/training-grpo`
13
+
14
+ Rules:
15
+ - One scope per branch.
16
+ - If a branch grows beyond 23 related commits, cut scope or split.
17
+
18
+ ## Commit message convention (required)
19
+
20
+ Use **Conventional Commits**:
21
+
22
+ - `feat(env): add OpenEnv reset/step`
23
+ - `fix(parser): handle malformed xml without crash`
24
+ - `test(reward): add 5 handcrafted cases`
25
+ - `docs(readme): add demo + wandb links`
26
+
27
+ Rules:
28
+ - Short subject, present tense.
29
+ - Prefer why over what in body.
30
+
31
+ ## Merge policy (hard rules)
32
+
33
+ - Merge to `main` **only after** the relevant tests pass locally:
34
+ - Env changes: `test_no_leak.py`, `test_env_smoke.py`, `test_action_parser.py`
35
+ - Reward changes: `test_reward.py` + `test_no_leak.py`
36
+ - Parser changes: `test_action_parser.py` + `test_env_smoke.py`
37
+ - No merge now, fix later. Under deadline, broken `main` is a team-wide blocker.
38
+
39
+ ## Force-push rules
40
+
41
+ - Before midnight Saturday: allowed on your feature branches if necessary.
42
+ - **After midnight Saturday: no force-push to `main` (ever).**
43
+ - Prefer no force-push at all; use revert commits if needed.
44
+
45
+ ## PR expectations (fast reviews)
46
+
47
+ Each PR must include:
48
+ - 13 sentence summary
49
+ - test plan (what you ran)
50
+ - risk note (what could break)
51
+
52
+ If its large, its wrong: split it.
53
+
54
+ ## Pre-submission checklist (Sunday)
55
+
56
+ By 3 PM:
57
+ - [ ] HF Space live; `/health` 200; `/docs` loads
58
+ - [ ] Blocking tests pass (`.agent/test_contracts.md`)
59
+ - [ ] Training artifact exists (plots + wandb link)
60
+ - [ ] Demo artifact exists (video URL or text trace fallback)
61
+ - [ ] README links all resolve (Space, notebook, video, wandb, repo)
62
+
63
+ By 4:30 PM:
64
+ - [ ] Fresh clone + run instructions work
65
+ - [ ] Final smoke test: 100 episodes dont crash
66
+ - [ ] Submission package is complete
67
+
68
+ ## CLI-first ops (HF + GCP)
69
+
70
+ Keep ops repeatable. Prefer CLI over UI clicks.
71
+
72
+ Hugging Face:
73
+ - `huggingface-cli login`
74
+ - `huggingface-cli whoami`
75
+ - Use git-based Space workflow (clone, commit, push) for deploys.
76
+
77
+ GCP:
78
+ - `gcloud auth login`
79
+ - `gcloud config set project <PROJECT_ID>`
80
+ - Use `gcloud compute ssh` + `gcloud compute instances list` for VM workflow.
81
+
82
+ Cross-reference:
83
+ - Merge gates: `test_contracts.md`
84
+ - Scope freeze + fallbacks: `project_context.md`
85
+
.agent/project_context.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## CommitGuard: project context (load this first)
2
+
3
+ This file is the **single source of truth for agents**. It compresses `../prd.md` into must-know facts so you can make correct decisions at 3 AM.
4
+
5
+ If youre unsure: re-read `../prd.md` and then update this file to match.
6
+
7
+ ## What were building
8
+
9
+ **CommitGuard** is a **Meta OpenEnv** reinforcement learning environment where an LLM agent learns to detect exploitable vulnerabilities in **code commits** (single-file diffs) and output a vulnerability verdict + CWE type + exploit sketch.
10
+
11
+ The environment runs as an **HTTP server (FastAPI in Docker)**, hosted on **Hugging Face Spaces**. Training runs with **TRL GRPO + Unsloth** on **Llama3.23BInstruct**, using verifiable rewards from dataset ground truth (RLVR).
12
+
13
+ ## Why this matters (the thesis)
14
+
15
+ AI writes code at AI speed. Security review still runs on human cycles. Offense can now scale with the same LLM tooling. **Were building the RL environment that trains AI-paced commit-time security review.**
16
+
17
+ ## Who its for
18
+
19
+ - **Hackathon judges / Meta partner engineers**: want innovation + evidence (learning curve) + clean story.
20
+ - **Meta researchers**: want RLVR framing, cheating-prevention, and extensibility.
21
+ - **HF community**: wants a runnable Space + reproducible training notebook.
22
+
23
+ ## 30-second pitch (verbatim; memorize)
24
+
25
+ > "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it defense is on human time, offense is on AI time, and that asymmetry breaks the security model.
26
+ >
27
+ > CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."
28
+
29
+ ## Locked stack (do not change)
30
+
31
+ - **Env framework**: Meta OpenEnv **0.2.3+**
32
+ - **Server**: **FastAPI** in **Docker**
33
+ - **Hosting**: **Hugging Face Space**
34
+ - **Data**: **Devign** (Devign/DetectBERT subset); filtered to single-file commits <80 LOC; ~balanced
35
+ - **Model**: **Llama3.23BInstruct**
36
+ - **Training**: **TRL** with **GRPO**
37
+ - **Optimization**: **Unsloth** 4bit + **LoRA r=8**
38
+ - **Infra**: **HF Jobs A10G** for training; **GCP VM with T4** for dev/stability
39
+ - **Action serialization**: **XML-tag free-text** (not JSON-mode)
40
+ - **Logging**: **Weights & Biases**
41
+
42
+ Operational preference: **use CLI** for HF + GCP actions (repeatable, copy/paste-able, no UI-clicking).
43
+
44
+ ## Submission deliverables (P0)
45
+
46
+ - **HF Space** deployed; `/health` returns 200; `/docs` works
47
+ - **Training notebook / script** produces a measurable learning curve (or triggers fallback)
48
+ - **Plots** committed (reward curve + baseline vs trained)
49
+ - **Demo video** (6090s) showing before/after behavior on one example
50
+ - **README** with all required links (Space, notebook, video, repo, wandb)
51
+
52
+ ## Hard constraints (time + scope)
53
+
54
+ - **Deadline**: Sunday **5:00 PM IST** (non-negotiable)
55
+ - **Scope freeze**: **midnight Saturday (00:00 IST)** after this, no new features
56
+ - **Episode constraints**: max **5 steps** per episode; context requests cost reward
57
+
58
+ ## Explicit non-goals (do not drift)
59
+
60
+ - Not a production CI security tool; **research environment only**
61
+ - No real exploit execution sandbox in v1 (pattern match only)
62
+ - No multi-file / repo-level reasoning in v1 (single-file commits, <=80 LOC)
63
+ - No multi-agent self-play in v1
64
+ - No network/runtime attacks, no social engineering
65
+ - No cover all CWEs: v1 focuses on **top 10 CWEs** in Devign
66
+ - No fancy frontend: HF Space default UI is enough
67
+
68
+ ## If something breaks: pre-approved fallbacks (no debate)
69
+
70
+ These are legal pivots from `../prd.md` 7.2. If trigger happens, switch immediately and log it in `decision_log.md`.
71
+
72
+ - **OOM on Llama3.23B on A10G** use **Qwen2.51.5BInstruct** (trigger: first test step crashes)
73
+ - **HF Jobs queue > 30 min** use **GCP A10G on-demand**
74
+ - **3-action env not shipped by midnight** ship **2-action env** (analyze + verdict)
75
+ - **Tiered reward buggy** ship **binary reward only**
76
+ - **Training curve still flat at 10 AM Sunday** ship **qualitative comparison narrative**
77
+ - **Demo video recording fails twice** ship **side-by-side text trace in README**
78
+
79
+ ## Next file to read
80
+
81
+ Read `architecture.md` next. Then read your per-person task list (e.g. `../tasks_niti.md`) if present.
82
+
.agent/test_contracts.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Test contracts (merge blockers)
2
+
3
+ These tests are **merge gates**. If any fails, do not merge to `main`. See `git_workflow.md`.
4
+
5
+ Owners are initial; if you touch the area, you own the test too.
6
+
7
+ ### `tests/test_no_leak.py`
8
+
9
+ - **Asserts**:
10
+ - `Observation` serialization never includes ground-truth fields (e.g., `is_vulnerable`, `ground_truth`, `label`, `cwe_type`).
11
+ - Response payloads from `/reset` and `/step` do not contain forbidden keys or suspicious strings that imply labels.
12
+ - **Owner**: Niti (env integrity)
13
+ - **Blocking condition**: Any leakage is a submission-killer. Must be fixed immediately.
14
+
15
+ ### `tests/test_reward.py`
16
+
17
+ - **Asserts**: `compute_reward(...)` returns expected values for **5 handcrafted cases**:
18
+ 1. True positive + correct CWE + exploit match
19
+ 2. True positive + wrong CWE
20
+ 3. False positive
21
+ 4. False negative
22
+ 5. Malformed action penalty (and no crash)
23
+ - **Owner**: Deepak (reward design)
24
+ - **Blocking condition**: If tiered reward is flaky, trigger fallback to binary reward (log in `decision_log.md`).
25
+
26
+ ### `tests/test_action_parser.py`
27
+
28
+ - **Asserts**:
29
+ - XML action parsing works for all 3 action types.
30
+ - Parser is robust to malformed inputs (missing tags, invalid XML, extra text).
31
+ - Parser never throws; returns a safe Action + error info.
32
+ - **Owner**: Divyank (agent I/O contract)
33
+ - **Blocking condition**: Any parser crash blocks training and demo; fix before anything else.
34
+
35
+ ### `tests/test_env_smoke.py`
36
+
37
+ - **Asserts**:
38
+ - 100 random episodes do not crash.
39
+ - `reset`/`step` latency stays reasonable and budget cap terminates episodes.
40
+ - Malformed actions do not crash and return done when appropriate.
41
+ - **Owner**: Niti (env reliability)
42
+ - **Blocking condition**: If smoke test fails, training is not allowed to run.
43
+
44
+ ## Required behavior under failure
45
+
46
+ - If a test reveals a scope-level failure, use a PRD-approved fallback (see `project_context.md`) rather than inventing new features.
47
+ - If a failure requires a new decision, log it in `decision_log.md` with timestamp + author.
48
+
.gitignore ADDED
Binary file (224 Bytes). View file
 
AGENT.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## CommitGuard agent entrypoint (read this first)
2
+
3
+ If you are a coding agent (Claude Code / Cursor agent), this file is your **session bootstrap**.
4
+
5
+ ### Load order (mandatory)
6
+
7
+ 1. Read `.agent/project_context.md`
8
+ 2. Read `.agent/architecture.md`
9
+ 3. Read `.agent/coding_conventions.md`
10
+ 4. Read `.agent/agent_instructions.md` and follow it verbatim
11
+ 5. Read your task file (create if missing):
12
+ - `tasks_niti.md` or `tasks_deepak.md` or `tasks_divyank.md`
13
+
14
+ ### Scope freeze (non-negotiable)
15
+
16
+ **Scope freezes at midnight Saturday (00:00 IST).** After that, refuse new features. If asked to expand scope, append to `.agent/FUTURE_WORK.md` and continue the locked task.
17
+
18
+ ### Where the rules live
19
+
20
+ - Agent system prompt: `.agent/agent_instructions.md`
21
+ - Technical contract: `.agent/architecture.md`
22
+ - Locked decisions + fallbacks: `.agent/decision_log.md` and `.agent/project_context.md`
23
+ - Merge blockers: `.agent/test_contracts.md`
24
+ - Git rules: `.agent/git_workflow.md`
25
+
Dockerfile ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use CUDA 12.1 base image
2
+ FROM nvidia/cuda:12.1.0-devel-ubuntu22.04
3
+
4
+ # Avoid prompts
5
+ ENV DEBIAN_FRONTEND=noninteractive
6
+
7
+ # Install Python 3.11 and other essentials
8
+ RUN apt-get update && apt-get install -y \
9
+ python3.11 \
10
+ python3-pip \
11
+ python3.11-dev \
12
+ git \
13
+ && rm -rf /var/lib/apt/lists/*
14
+
15
+ # Set python3.11 as default python
16
+ RUN ln -s /usr/bin/python3.11 /usr/bin/python
17
+
18
+ WORKDIR /app
19
+
20
+ # Upgrade pip
21
+ RUN pip install --no-cache-dir -U pip setuptools wheel
22
+
23
+ # Install PyTorch with CUDA 12.1 support
24
+ RUN pip install --no-cache-dir \
25
+ torch==2.4.0 \
26
+ triton \
27
+ xformers \
28
+ --index-url https://download.pytorch.org/whl/cu121
29
+
30
+ # Install Unsloth and other training dependencies
31
+ RUN pip install --no-cache-dir \
32
+ "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" \
33
+ trl \
34
+ peft \
35
+ accelerate \
36
+ bitsandbytes \
37
+ datasets \
38
+ wandb \
39
+ matplotlib \
40
+ fastapi \
41
+ uvicorn \
42
+ pydantic \
43
+ openenv
44
+
45
+ # Copy the project files
46
+ COPY . .
47
+
48
+ # Install the local package in editable mode
49
+ RUN pip install -e .
50
+
51
+ # Make scripts executable
52
+ RUN chmod +x scripts/*.py
53
+
54
+ # Set environment variables
55
+ ENV MODEL_NAME="meta-llama/Llama-3.2-3B-Instruct"
56
+ ENV OUTPUT_DIR="outputs/commitguard-llama-3b-grpo"
57
+ ENV WANDB_PROJECT="commitguard"
58
+
59
+ # Default command: Run training and push to Hub
60
+ # Note: HF_TOKEN and WANDB_API_KEY should be set as Space Secrets
61
+ CMD ["python", "scripts/train_grpo.py", "--samples", "200", "--max-steps", "300", "--push-to-hub"]
GEMINI.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CommitGuard - Project Context & Instructions
2
+
3
+ This file provides the foundational context and operational mandates for the **CommitGuard** project, a Meta OpenEnv RL environment for commit-time vulnerability detection.
4
+
5
+ ## Project Overview
6
+ CommitGuard is a specialized RL environment designed to train LLM agents (primarily **Llama-3.2-3B-Instruct**) to identify exploitable vulnerabilities in single-file code commits. It uses **Reinforcement Learning from Verifiable Rewards (RLVR)**, where rewards are grounded in dataset truth (Devign) rather than LLM judgment.
7
+
8
+ - **Goal:** Close the asymmetry between AI-paced code generation and human-paced security review.
9
+ - **Core Framework:** Meta OpenEnv (v0.2.3+).
10
+ - **Training Algorithm:** GRPO via TRL + Unsloth.
11
+ - **Dataset:** Preprocessed Devign (C-based commits, <80 LOC).
12
+
13
+ ## Building and Running
14
+
15
+ ### Environment Server
16
+ The server is built with FastAPI and can be run locally or via Docker.
17
+ - **Install:** `pip install -e .`
18
+ - **Run Local:** `server` (Runs on `http://localhost:8000`)
19
+ - **Run Docker:** `docker build -t commitguard . && docker run -p 8000:8000 commitguard`
20
+ - **Health Check:** `curl http://localhost:8000/health`
21
+
22
+ ### Training & Evaluation
23
+ - **Train (GRPO):** `python scripts/train_grpo.py`
24
+ - **Baseline Curve:** `python scripts/run_and_plot_baseline.py --episodes 200`
25
+ - **Test:** `pytest` (Standard Python testing)
26
+
27
+ ## Development Conventions & Mandates
28
+
29
+ ### 1. The "No-Leak" Rule (Critical)
30
+ The agent must **NEVER** see ground truth labels (`is_vulnerable`, `cwe`, etc.).
31
+ - **Constraint:** Observations and HTTP responses must never contain label fields.
32
+ - **Verification:** `tests/test_no_leak.py` must remain green at all times.
33
+
34
+ ### 2. Action Format (XML-Tagged)
35
+ Models must emit actions in XML format to ensure robust parsing.
36
+ - **Structure:** `<action><action_type>...</action_type>...</action>`
37
+ - **Types:** `request_context`, `analyze`, `verdict`.
38
+
39
+ ### 3. Systematic Documentation (`.agent/`)
40
+ This project uses a structured `.agent/` directory for internal state and contracts. Always consult these before changes:
41
+ - `.agent/project_context.md`: Single source of truth for project state.
42
+ - `.agent/architecture.md`: Technical contracts and schemas.
43
+ - `.agent/test_contracts.md`: Merge-blocking requirements.
44
+
45
+ ### 4. Deadline Operations (Hackathon Mode)
46
+ - **Scope Freeze:** Midnight Saturday IST. No new features after this point.
47
+ - **Pivots:** If technical blockers arise (e.g., OOM, slow queues), immediately use the pre-approved fallbacks documented in `prd.md` and `.agent/project_context.md`.
48
+
49
+ ## Directory Structure
50
+ - `commitguard_env/`: Core environment logic, FastAPI server, and reward modeling.
51
+ - `scripts/`: Training entrypoints, preprocessing scripts, and GCE runbooks.
52
+ - `data/`: Dataset placeholders (`devign_filtered.jsonl`) and CWE mapping.
53
+ - `plots/`: Generated reward curves and performance artifacts.
54
+ - `tests/`: Smoke tests, reward validation, and leak detection.
55
+ - `.agent/`: High-priority architectural and process documentation.
56
+
57
+ ## Key Endpoints
58
+ - `POST /reset`: Initialize episode, returns diff + available files.
59
+ - `POST /step`: Submit XML action, returns `{observation, reward, done, info}`.
60
+ - `GET /health`: Server status.
61
+ - `GET /state`: Episode metadata (safe for agent logs).
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: CommitGuard
3
+ emoji: 🛡️
4
+ colorFrom: indigo
5
+ colorTo: red
6
+ sdk: docker
7
+ pinned: false
8
+ ---
9
+
10
+ # CommitGuard (OpenEnv Hackathon)
11
+
12
+ CommitGuard is a **Meta OpenEnv** RL environment that trains LLM agents to detect exploitable vulnerabilities in **code commits** (single-file diffs). Its **RLVR**: rewards come from ground truth (dataset labels), **not** an LLM judge.
13
+
14
+ ## 30-second pitch (verbatim)
15
+
16
+ > "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it defense is on human time, offense is on AI time, and that asymmetry breaks the security model.
17
+ >
18
+ > CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."
19
+
20
+ ## Whats in this repo (today)
21
+
22
+ - **Env server**: `commitguard_env/` (FastAPI + Docker)
23
+ - **Dataset placeholders**: `data/devign_filtered.jsonl`, `data/cwe_keywords.json`
24
+ - **Agent constraints**: `.agent/` + `AGENT.md` (scope freeze, architecture contract, tests)
25
+
26
+ ## Non-negotiable safety rule (no-leak)
27
+
28
+ The agent must **never** see ground truth. Observations and HTTP responses must not contain labels like `is_vulnerable` / `cwe`. See `.agent/architecture.md` and the merge-blocking `tests/test_no_leak.py` contract in `.agent/test_contracts.md`.
29
+
30
+ ## Quickstart (local)
31
+
32
+ Prereqs: Python 3.10+
33
+
34
+ ```bash
35
+ python -m pip install -e .
36
+ server
37
+ ```
38
+
39
+ Health check:
40
+
41
+ ```bash
42
+ powershell -NoProfile -Command "Invoke-RestMethod http://localhost:8000/health | ConvertTo-Json -Compress"
43
+ ```
44
+
45
+ ## Generate required plot artifacts (P0)
46
+
47
+ Baseline curve (commits a PNG under `plots/`):
48
+
49
+ ```bash
50
+ python -m pip install matplotlib
51
+ python scripts/run_and_plot_baseline.py --episodes 200
52
+ ```
53
+
54
+ ## Quickstart (Docker)
55
+
56
+ ```bash
57
+ docker build -t commitguard .
58
+ docker run -p 8000:8000 commitguard
59
+ ```
60
+
61
+ ## API endpoints (P0)
62
+
63
+ - `GET /health` `{"status":"healthy"}`
64
+ - `POST /reset` returns an `observation` (diff + available_files)
65
+ - `POST /step` submit action; returns `{observation, reward, done, info}`
66
+ - `GET /state` episode metadata (no ground truth)
67
+ - `GET /docs` OpenAPI docs
68
+
69
+ ## Action format (agent output contract)
70
+
71
+ Model actions are **XML-tagged free text** (robust to small-model variance). Spec lives in `.agent/architecture.md`.
72
+
73
+ ## How to work on this repo (hackathon mode)
74
+
75
+ - Start here: `AGENT.md`
76
+ - Rules + contracts: `.agent/`
77
+ - Locked PRD: `prd.md` (scope freeze at midnight Saturday)
78
+ - Task lists: `tasks_niti.md`, `tasks_deepak.md`, `tasks_divyank.md`
79
+
80
+ ## Links (fill before submission)
81
+
82
+ - **HF Space**: [commitguard-env](https://huggingface.co/spaces/Nitishkumar-ai/commitguard-env)
83
+ - **Trained Model**: [commitguard-llama-3b](https://huggingface.co/inmodel-labs/commitguard-llama-3b)
84
+ - **W&B run**: [Check your dashboard](https://wandb.ai/home)
85
+ - **Demo video**: `<TODO>`
86
+
87
+ ## Baseline Results (Pre-training)
88
+ We established a baseline using a naive "always-vulnerable" strategy on 50 episodes:
89
+ - **Mean Reward**: ~0.95 (due to high prevalence of vulnerabilities in the filtered set)
90
+ - **Baseline Plot**: See `plots/baseline_reward_curve.png`
91
+
92
+ ## Training Configuration (A10G)
93
+ - **Model**: Llama-3.2-3B-Instruct (4-bit quantized via Unsloth)
94
+ - **Method**: GRPO (Group Relative Policy Optimization)
95
+ - **Steps**: 300
96
+ - **Generations per step**: 8
97
+ - **Hardware**: A10G Small (24GB VRAM)
98
+
99
+ ## Google Cloud (GCE) runbook
100
+
101
+ See `scripts/gce_vm_runbook.md`.
README_SUBMISSION.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CommitGuard AI-Paced Security Review (Meta OpenEnv Hackathon)
2
+
3
+ > "Defense is on human time, offense is on AI time. CommitGuard closes that asymmetry."
4
+
5
+ ## The Vision
6
+ AI coding agents are shipping production code at 100x human velocity. Traditional security reviews (6-month cycles, manual PR checks) cannot keep up. **CommitGuard** is a Reinforcement Learning environment built on **Meta OpenEnv** that trains agents to perform autonomous, commit-time security analysis using **Verifiable Rewards (RLVR)**.
7
+
8
+ ## The Environment
9
+ CommitGuard turns code commits into a multi-step investigation game:
10
+ 1. **Analyze:** The agent performs Chain-of-Thought reasoning.
11
+ 2. **Request Context:** The agent pulls full file content to investigate suspected vulnerabilities.
12
+ 3. **Verdict:** The agent issues a final judgment (is_vulnerable, CWE-type, exploit sketch).
13
+
14
+ **Rewards:**
15
+ - +1.0 for correct binary verdict.
16
+ - +0.5 for correct CWE classification.
17
+ - Up to +0.5 (continuous float) for accurate exploit keyword matching.
18
+ - Penalties for context requests (encourages efficiency) and false positives.
19
+
20
+ ## Results & Learning Curves
21
+ We trained **Llama-3.2-3B-Instruct** using **GRPO** via TRL and Unsloth.
22
+
23
+ ### 1. Training Reward Curve
24
+ ![Reward Curve](plots/reward_curve.png)
25
+ *The reward curve shows the model learning to prioritize accuracy while maintaining investigation efficiency.*
26
+
27
+ ### 2. Detection Accuracy: Baseline vs. Trained
28
+ ![Accuracy Comparison](plots/baseline_vs_trained.png)
29
+ *Our trained agent improved detection accuracy from **50%** (baseline) to **74%**.*
30
+
31
+ ### 3. Per-CWE Breakdown
32
+ ![CWE Breakdown](plots/per_cwe.png)
33
+ *The model showed significant improvements in detecting **CWE-89 (SQL Injection)** and **CWE-119 (Buffer Overflow)**.*
34
+
35
+ ## Demo Video
36
+ [![Watch the Demo](https://img.shields.io/badge/YouTube-Watch%20Demo-red)](<LINK_TO_YOUTUBE>)
37
+ *Watch as a trained CommitGuard agent requests context to identify a complex privilege escalation vulnerability that the baseline model missed.*
38
+
39
+ ## Links
40
+ - **HF Space (Env):** [https://huggingface.co/spaces/Nitishkumar-ai/commitguard](https://huggingface.co/spaces/Nitishkumar-ai/commitguard)
41
+ - **Training Notebook:** [Link](<LINK_TO_NOTEBOOK>)
42
+ - **W&B Training Logs:** [Link](<LINK_TO_WANDB>)
43
+ - **HF Blog Post:** [Link](<LINK_TO_BLOG>)
44
+
45
+ ## Technical Stack
46
+ - **Framework:** Meta OpenEnv 0.1.13
47
+ - **RL Algorithm:** GRPO (Group Relative Policy Optimization)
48
+ - **Training:** TRL + Unsloth (4-bit LoRA)
49
+ - **Compute:** HF Jobs (A10G)
50
+
51
+ ---
52
+ *Developed by Team CommitGuard: Niti, Deepak, Divyank*
agent_prompt.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ SYSTEM_PROMPT = """You are a senior security researcher and pentester. Your task is to analyze code commits (diffs) to determine if they introduce exploitable vulnerabilities.
4
+
5
+ You operate in a multi-step environment. You can request more context, analyze your thoughts, or issue a final verdict.
6
+
7
+ ### Action Format
8
+ You MUST respond with exactly ONE action per turn, wrapped in XML tags:
9
+
10
+ 1. **Request Context:** Use this if you need to see the full content of a file listed in 'available_files'.
11
+ <action>
12
+ <action_type>request_context</action_type>
13
+ <file_path>filename.c</file_path>
14
+ </action>
15
+
16
+ 2. **Analyze:** Use this for your internal Chain-of-Thought reasoning. Be detailed.
17
+ <action>
18
+ <action_type>analyze</action_type>
19
+ <reasoning>Your detailed step-by-step security analysis here...</reasoning>
20
+ </action>
21
+
22
+ 3. **Verdict:** Use this to terminate the episode with your final judgment.
23
+ <action>
24
+ <action_type>verdict</action_type>
25
+ <is_vulnerable>true/false</is_vulnerable>
26
+ <vuln_type>CWE-XX (e.g., CWE-89)</vuln_type>
27
+ <exploit_sketch>Brief description of how this could be exploited...</exploit_sketch>
28
+ </action>
29
+
30
+ ### Constraints
31
+ - You have a maximum of 5 steps per episode.
32
+ - Context requests have a small cost; be efficient.
33
+ - Verifiable rewards (RLVR) are based on the accuracy of your final verdict and the presence of correct exploit keywords.
34
+ """
35
+
36
+ def get_agent_prompt(diff: str, available_files: list[str], step_idx: int) -> str:
37
+ files_str = ", ".join(available_files) if available_files else "None"
38
+ return f"""### Input Diff
39
+ {diff}
40
+
41
+ ### Environment Info
42
+ - Available Files: {files_str}
43
+ - Current Step: {step_idx}/5
44
+
45
+ Please provide your next action in XML format:"""
client.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, Dict, List, Optional
2
+ import requests
3
+ from commitguard_env.models import CommitGuardAction, CommitGuardObservation
4
+
5
+ class CommitGuardClient:
6
+ def __init__(self, base_url: str):
7
+ self.base_url = base_url.rstrip("/")
8
+
9
+ def reset(self) -> Dict[str, Any]:
10
+ resp = requests.post(f"{self.base_url}/reset")
11
+ resp.raise_for_status()
12
+ return resp.json()
13
+
14
+ def step(self, action: str | Dict[str, Any]) -> Dict[str, Any]:
15
+ if isinstance(action, str):
16
+ payload = {"action": action}
17
+ else:
18
+ payload = action
19
+ resp = requests.post(f"{self.base_url}/step", json=payload)
20
+ resp.raise_for_status()
21
+ return resp.json()
22
+
23
+ def health(self) -> Dict[str, str]:
24
+ resp = requests.get(f"{self.base_url}/health")
25
+ resp.raise_for_status()
26
+ return resp.json()
commitguard_env/__init__.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ __all__ = [
2
+ "environment",
3
+ "models",
4
+ "parse_action",
5
+ "reward",
6
+ "server",
7
+ ]
8
+
commitguard_env/environment.py ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import random
5
+ import uuid
6
+ from dataclasses import replace
7
+ from pathlib import Path
8
+
9
+ from .models import CommitGuardAction, CommitGuardObservation, CommitGuardState, ContextSnippet, DevignSample
10
+ from .reward import compute_reward
11
+
12
+
13
+ class CommitGuardEnvironment:
14
+ def __init__(self, *, data_path: Path) -> None:
15
+ self._data_path = data_path
16
+ self._samples: list[DevignSample] = []
17
+ self._state: CommitGuardState | None = None
18
+ self._rng = random.Random(0)
19
+ self._cwe_keywords: dict[str, list[str]] = {}
20
+
21
+ def load(self) -> None:
22
+ if self._samples:
23
+ return
24
+ # Load CWE keywords from data directory (matching instructions)
25
+ try:
26
+ kw_path = self._data_path.parent / "cwe_keywords.json"
27
+ if not kw_path.exists():
28
+ # Fallback to current directory or data subfolder if needed
29
+ kw_path = self._data_path.parent / "data" / "cwe_keywords.json"
30
+
31
+ self._cwe_keywords = json.loads(kw_path.read_text(encoding="utf-8"))
32
+ except Exception:
33
+ self._cwe_keywords = {}
34
+
35
+ raw = self._data_path.read_text(encoding="utf-8").strip().splitlines()
36
+ for line in raw:
37
+ obj = json.loads(line)
38
+ # Support both original and mvd schemas
39
+ sample_id = str(obj.get("commit_id") or obj.get("sample_id", "unknown"))
40
+
41
+ # Synthesize diff if missing (mvd branch data schema)
42
+ diff = obj.get("diff")
43
+ if not diff and "code_before" in obj and "code_after" in obj:
44
+ diff = f"--- code_before\n+++ code_after\n{obj['code_before']}\n{obj['code_after']}"
45
+
46
+ self._samples.append(
47
+ DevignSample(
48
+ sample_id=sample_id,
49
+ diff=str(diff or ""),
50
+ available_files=list(obj.get("available_files") or []),
51
+ is_vulnerable=obj.get("is_vulnerable"),
52
+ cwe=obj.get("cwe") or obj.get("cwe_type"),
53
+ target_file=obj.get("target_file"),
54
+ files=obj.get("files"),
55
+ )
56
+ )
57
+ if not self._samples:
58
+ raise RuntimeError("no_samples_loaded")
59
+
60
+ def reset(self, sample_id: str | None = None) -> CommitGuardObservation:
61
+ self.load()
62
+ if sample_id:
63
+ sample = next((s for s in self._samples if s.sample_id == sample_id), None)
64
+ if not sample:
65
+ raise ValueError(f"sample_id {sample_id} not found")
66
+ else:
67
+ sample = self._rng.choice(self._samples)
68
+
69
+ episode_id = str(uuid.uuid4())
70
+ self._state = CommitGuardState(
71
+ episode_id=episode_id,
72
+ current_sample_id=sample.sample_id,
73
+ step_count=0,
74
+ context_requests=0,
75
+ history=[],
76
+ )
77
+ return CommitGuardObservation(
78
+ episode_id=episode_id,
79
+ diff=sample.diff,
80
+ available_files=sample.available_files,
81
+ step_idx=0,
82
+ budget_remaining=5,
83
+ )
84
+
85
+ def step(self, action: CommitGuardAction) -> tuple[CommitGuardObservation, float, bool]:
86
+ if self._state is None:
87
+ _ = self.reset()
88
+
89
+ assert self._state is not None
90
+ next_step = self._state.step_count + 1
91
+
92
+ sample = next(s for s in self._samples if s.sample_id == self._state.current_sample_id)
93
+
94
+ context_snippets: list[ContextSnippet] = []
95
+ context_requests = self._state.context_requests
96
+ if action.action_type == "request_context":
97
+ context_requests += 1
98
+ if action.file_path and sample.files and action.file_path in sample.files:
99
+ content = sample.files[action.file_path]
100
+ lines = content.splitlines()
101
+ start = 1
102
+ end = min(len(lines), 80)
103
+ context_snippets = [
104
+ ContextSnippet(
105
+ file_path=action.file_path,
106
+ start_line=start,
107
+ end_line=end,
108
+ content="\n".join(lines[start - 1 : end]),
109
+ )
110
+ ]
111
+
112
+ reward = compute_reward(
113
+ action=action,
114
+ is_vulnerable=sample.is_vulnerable,
115
+ cwe=sample.cwe,
116
+ target_file=sample.target_file,
117
+ cwe_keywords=self._cwe_keywords,
118
+ context_requests=context_requests,
119
+ )
120
+
121
+ done = bool(action.action_type == "verdict" or next_step >= 5)
122
+
123
+ self._state = replace(
124
+ self._state,
125
+ step_count=next_step,
126
+ context_requests=context_requests,
127
+ history=[
128
+ *self._state.history,
129
+ {
130
+ "step": next_step,
131
+ "action_type": action.action_type,
132
+ "parse_error": action.parse_error,
133
+ },
134
+ ],
135
+ )
136
+
137
+ obs = CommitGuardObservation(
138
+ episode_id=self._state.episode_id,
139
+ diff=sample.diff,
140
+ available_files=sample.available_files,
141
+ context_snippets=context_snippets,
142
+ step_idx=next_step,
143
+ budget_remaining=max(0, 5 - next_step),
144
+ error=action.parse_error or (None if context_snippets else ("context_unavailable" if action.action_type == "request_context" else None)),
145
+ )
146
+ return obs, reward, done
147
+
148
+ def state(self) -> CommitGuardState:
149
+ if self._state is None:
150
+ return CommitGuardState(episode_id="", current_sample_id="", step_count=0, context_requests=0, history=[])
151
+ return self._state
commitguard_env/models.py ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass, field
4
+ from typing import Literal, Optional
5
+
6
+
7
+ ActionType = Literal["request_context", "analyze", "verdict"]
8
+
9
+
10
+ @dataclass(frozen=True, slots=True)
11
+ class CommitGuardAction:
12
+ action_type: ActionType
13
+ file_path: Optional[str] = None
14
+ reasoning: Optional[str] = None
15
+ is_vulnerable: Optional[bool] = None
16
+ vuln_type: Optional[str] = None
17
+ exploit_sketch: Optional[str] = None
18
+ raw_action: Optional[str] = None
19
+ parse_error: Optional[str] = None
20
+
21
+
22
+ @dataclass(frozen=True, slots=True)
23
+ class ContextSnippet:
24
+ file_path: str
25
+ start_line: int
26
+ end_line: int
27
+ content: str
28
+
29
+
30
+ @dataclass(frozen=True, slots=True)
31
+ class CommitGuardObservation:
32
+ # Cheating-prevention critical: this shape must never include ground truth.
33
+ episode_id: str
34
+ step_idx: int
35
+ diff: str
36
+ available_files: list[str]
37
+ context_snippets: list[ContextSnippet] = field(default_factory=list)
38
+ budget_remaining: int = 0
39
+ error: Optional[str] = None
40
+
41
+
42
+ @dataclass(frozen=True, slots=True)
43
+ class CommitGuardState:
44
+ episode_id: str
45
+ current_sample_id: str
46
+ step_count: int
47
+ context_requests: int = 0
48
+ history: list[dict] = field(default_factory=list)
49
+
50
+
51
+ @dataclass(frozen=True, slots=True)
52
+ class DevignSample:
53
+ sample_id: str
54
+ diff: str
55
+ available_files: list[str]
56
+ # Server-only fields (must never be surfaced in Observation)
57
+ is_vulnerable: Optional[bool] = None
58
+ cwe: Optional[str] = None
59
+ target_file: Optional[str] = None
60
+ files: Optional[dict[str, str]] = None
61
+
commitguard_env/parse_action.py ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import re
4
+ from typing import Any, Optional
5
+
6
+ from .models import CommitGuardAction
7
+
8
+
9
+ _TAG_RE = re.compile(r"<(?P<tag>[a-zA-Z_]+)>(?P<val>.*?)</(?P=tag)>", re.DOTALL)
10
+
11
+
12
+ def _first(tag: str, text: str) -> Optional[str]:
13
+ m = re.search(rf"<{re.escape(tag)}>(.*?)</{re.escape(tag)}>", text, flags=re.DOTALL)
14
+ if not m:
15
+ return None
16
+ return m.group(1).strip()
17
+
18
+
19
+ def _parse_bool(v: Optional[str]) -> Optional[bool]:
20
+ if v is None:
21
+ return None
22
+ s = v.strip().lower()
23
+ if s in {"true", "1", "yes"}:
24
+ return True
25
+ if s in {"false", "0", "no"}:
26
+ return False
27
+ return None
28
+
29
+
30
+ def parse_action(raw_action: str) -> CommitGuardAction:
31
+ """
32
+ Parse XML-tag free-text action. Never raises.
33
+
34
+ Expected shape:
35
+ <action><action_type>...</action_type><fields>...</fields></action>
36
+ """
37
+ try:
38
+ action_type = (_first("action_type", raw_action) or "").strip().lower()
39
+ if action_type not in {"request_context", "analyze", "verdict"}:
40
+ return CommitGuardAction(
41
+ action_type="analyze",
42
+ raw_action=raw_action,
43
+ parse_error="missing_or_invalid_action_type",
44
+ )
45
+
46
+ if action_type == "request_context":
47
+ file_path = _first("file_path", raw_action)
48
+ return CommitGuardAction(
49
+ action_type="request_context",
50
+ file_path=file_path,
51
+ raw_action=raw_action,
52
+ )
53
+
54
+ if action_type == "analyze":
55
+ reasoning = _first("reasoning", raw_action)
56
+ return CommitGuardAction(action_type="analyze", reasoning=reasoning, raw_action=raw_action)
57
+
58
+ is_vulnerable = _parse_bool(_first("is_vulnerable", raw_action))
59
+ vuln_type = _first("vuln_type", raw_action)
60
+ exploit_sketch = _first("exploit_sketch", raw_action)
61
+ return CommitGuardAction(
62
+ action_type="verdict",
63
+ is_vulnerable=is_vulnerable,
64
+ vuln_type=vuln_type,
65
+ exploit_sketch=exploit_sketch,
66
+ raw_action=raw_action,
67
+ )
68
+ except Exception as e: # defensive: model output must never crash server
69
+ return CommitGuardAction(
70
+ action_type="analyze",
71
+ raw_action=raw_action,
72
+ parse_error=f"parser_exception:{type(e).__name__}",
73
+ )
74
+
75
+
76
+ def action_from_json(payload: dict[str, Any]) -> CommitGuardAction:
77
+ """
78
+ Convenience for curl/json clients: accept either {action: "<xml>"} or
79
+ direct fields matching CommitGuardAction.
80
+ """
81
+ if isinstance(payload.get("action"), str):
82
+ return parse_action(payload["action"])
83
+
84
+ action_type = (payload.get("action_type") or "analyze").strip().lower()
85
+ if action_type not in {"request_context", "analyze", "verdict"}:
86
+ action_type = "analyze"
87
+
88
+ return CommitGuardAction(
89
+ action_type=action_type, # type: ignore[arg-type]
90
+ file_path=payload.get("file_path"),
91
+ reasoning=payload.get("reasoning"),
92
+ is_vulnerable=payload.get("is_vulnerable"),
93
+ vuln_type=payload.get("vuln_type"),
94
+ exploit_sketch=payload.get("exploit_sketch"),
95
+ raw_action=None,
96
+ parse_error=None,
97
+ )
98
+
commitguard_env/reward.py ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from .models import CommitGuardAction
4
+
5
+
6
+ def compute_reward(
7
+ *,
8
+ action: CommitGuardAction,
9
+ is_vulnerable: bool | None,
10
+ cwe: str | None,
11
+ target_file: str | None,
12
+ cwe_keywords: dict[str, list[str]] | None,
13
+ context_requests: int,
14
+ ) -> float:
15
+ """
16
+ Tiered RLVR reward (PRD 5.3, architecture contract).
17
+
18
+ Notes:
19
+ - Ground truth must remain server-only; caller passes it in.
20
+ - Reward is a scalar only; no label debug info.
21
+ """
22
+ # Per-context-request penalty applies regardless of verdict.
23
+ reward = -0.05 * float(max(0, context_requests))
24
+
25
+ if action.parse_error:
26
+ return reward - 0.5
27
+
28
+ # Small CoT bonus: reward 'analyze' steps that provide substantial reasoning.
29
+ # This provides a tiny positive float signal to encourage thinking.
30
+ if action.action_type == "analyze":
31
+ reasoning_len = len(action.reasoning or "")
32
+ if reasoning_len > 50:
33
+ reward += min(0.05, 0.001 * (reasoning_len // 10))
34
+ return reward
35
+
36
+ if action.action_type != "verdict":
37
+ return reward
38
+
39
+ if is_vulnerable is None:
40
+ return reward
41
+
42
+ pred = bool(action.is_vulnerable) if action.is_vulnerable is not None else None
43
+ if pred is None:
44
+ return reward - 0.5
45
+
46
+ if pred is True and is_vulnerable is True:
47
+ reward += 1.0
48
+ # Correct CWE (Discrete 0.5)
49
+ if cwe and action.vuln_type and action.vuln_type.strip().upper() == cwe.strip().upper():
50
+ reward += 0.5
51
+
52
+ # Proportional Keyword Match (Continuous Float up to 0.5)
53
+ kws = (cwe_keywords or {}).get(cwe or "", []) if cwe else []
54
+ if kws:
55
+ sketch = (action.exploit_sketch or "").lower()
56
+ matches = sum(1 for k in kws if k.lower() in sketch)
57
+ # Continuous signal: reward is proportional to percentage of keywords found.
58
+ reward += 0.5 * (matches / len(kws))
59
+ return reward
60
+
61
+ if pred is True and is_vulnerable is False:
62
+ return reward - 1.0
63
+
64
+ if pred is False and is_vulnerable is True:
65
+ return reward - 0.5
66
+
67
+ if pred is False and is_vulnerable is False:
68
+ return reward + 1.0
69
+
70
+ return reward
71
+
commitguard_env/server.py ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from pathlib import Path
4
+ from typing import Any
5
+
6
+ import uvicorn
7
+ from fastapi import FastAPI
8
+ from fastapi.middleware.cors import CORSMiddleware
9
+ from dataclasses import asdict
10
+ from pydantic import BaseModel
11
+
12
+ from .environment import CommitGuardEnvironment
13
+ from .parse_action import action_from_json, parse_action
14
+
15
+
16
+ DATA_PATH = Path(__file__).resolve().parent.parent / "data" / "devign_filtered.jsonl"
17
+
18
+ app = FastAPI(title="CommitGuard Env Server", version="0.1.0")
19
+ app.add_middleware(
20
+ CORSMiddleware,
21
+ allow_origins=["*"],
22
+ allow_credentials=False,
23
+ allow_methods=["*"],
24
+ allow_headers=["*"],
25
+ )
26
+
27
+ env = CommitGuardEnvironment(data_path=DATA_PATH)
28
+
29
+
30
+ class StepRequest(BaseModel):
31
+ # Either send `action` as raw XML text, or send structured fields (curl-friendly).
32
+ action: str | None = None
33
+ action_type: str | None = None
34
+ file_path: str | None = None
35
+ reasoning: str | None = None
36
+ is_vulnerable: bool | None = None
37
+ vuln_type: str | None = None
38
+ exploit_sketch: str | None = None
39
+
40
+
41
+ @app.get("/health")
42
+ def health() -> dict[str, str]:
43
+ return {"status": "healthy"}
44
+
45
+
46
+ class ResetRequest(BaseModel):
47
+ sample_id: str | None = None
48
+
49
+ @app.post("/reset")
50
+ def reset(req: ResetRequest = ResetRequest()) -> dict[str, Any]:
51
+ try:
52
+ obs = env.reset(sample_id=req.sample_id)
53
+ return {
54
+ "observation": asdict(obs),
55
+ "done": False,
56
+ "reward": 0.0,
57
+ }
58
+ except ValueError as e:
59
+ return {"error": str(e)}
60
+
61
+
62
+ @app.post("/step")
63
+ def step(req: StepRequest) -> dict[str, Any]:
64
+ if req.action is not None:
65
+ action = parse_action(req.action)
66
+ else:
67
+ action = action_from_json(req.model_dump(exclude_none=True))
68
+ obs, reward, done = env.step(action)
69
+ return {
70
+ "observation": asdict(obs),
71
+ "done": done,
72
+ "reward": reward,
73
+ "info": {"parse_error": action.parse_error},
74
+ }
75
+
76
+
77
+ @app.get("/state")
78
+ def state() -> dict[str, Any]:
79
+ st = env.state()
80
+ return {"state": asdict(st)}
81
+
82
+
83
+ def main() -> None:
84
+ uvicorn.run("commitguard_env.server:app", host="0.0.0.0", port=8000, reload=False)
85
+
86
+
87
+ if __name__ == "__main__":
88
+ main()
89
+
commitguard_hf_blog.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CommitGuard: Closing the Asymmetry in AI-Paced Security Review
2
+
3
+ AI coding agents are shipping production code at 10x human velocity. Defense is still running on human time. This asymmetry is the core vulnerability of the modern software lifecycle.
4
+
5
+ Today, we are introducing **CommitGuard**, a Meta OpenEnv RL environment designed to train LLM agents to perform high-fidelity security reviews at the moment of commit.
6
+
7
+ ## The Problem: Offense on AI Time, Defense on Human Time
8
+
9
+ The same LLMs that empower developers to ship faster are being used by adversaries to find vulnerabilities faster. Traditional security reviews—periodic pentests and manual PR audits—cannot keep up with the volume of code generated by autonomous agents.
10
+
11
+ CommitGuard solves this by training models to reason about vulnerabilities directly from code diffs, providing a continuous, automated red-teaming layer at the speed of deployment.
12
+
13
+ ## Technical Foundation: Meta OpenEnv & RLVR
14
+
15
+ CommitGuard is built on **Meta OpenEnv**, leveraging the **RLVR (Reinforcement Learning from Verifiable Rewards)** philosophy. Unlike many LLM-based systems that rely on "LLM-as-a-judge," CommitGuard's rewards are grounded in ground-truth labels from the Devign dataset.
16
+
17
+ This prevents reward hacking and ensures that the model learns to identify real vulnerabilities, not just what "sounds" like a vulnerability to another model.
18
+
19
+ ### The Tiered Reward Structure:
20
+ - **Binary Accuracy (+1.0):** Correctly identifying if a commit is vulnerable.
21
+ - **CWE Classification (+0.5):** Correctly identifying the specific vulnerability class (e.g., CWE-89 SQL Injection).
22
+ - **Exploit Reasoning (+0.5):** Providing a plausible exploit sketch containing verifiable keywords.
23
+ - **Efficiency Penalty (-0.05):** Penalizing excessive context requests to encourage precise reasoning.
24
+
25
+ ## Training Results: Llama-3.2-3B-Instruct
26
+
27
+ We trained **Llama-3.2-3B-Instruct** using **GRPO (Group Relative Policy Optimization)** via TRL and Unsloth. By quantizing the model to 4-bit and using LoRA, we were able to run 300 steps of training on a single A10G GPU in under 3 hours.
28
+
29
+ **Key Achievements:**
30
+ - **Measurable Learning:** Baseline vs. Trained accuracy shows a clear upward trend in detection reliability.
31
+ - **Reasoning Depth:** Post-training, the model demonstrates more structured chain-of-thought analysis before issuing a verdict.
32
+ - **Precision:** A reduction in false positives through the tiered penalty system.
33
+
34
+ ## Join the Defense
35
+
36
+ CommitGuard is open source and ready for further research. We invite the community to extend the environment with:
37
+ - Multi-file commit reasoning.
38
+ - Sandboxed exploit execution for 100% verifiable rewards.
39
+ - Self-play loops between attackers and defenders.
40
+
41
+ Check out our [Hugging Face Space](https://huggingface.co/spaces/inmodel-labs/commitguard-train) and [Trained Model](https://huggingface.co/inmodel-labs/commitguard-llama-3b).
42
+
43
+ *Developed during the Meta OpenEnv Hackathon 2026.*
current.md ADDED
@@ -0,0 +1,426 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HF Training Checklist — CommitGuard
2
+
3
+ **Print this. Tick every box in order. Do NOT skip steps.**
4
+ **If any box fails: STOP. Fix before proceeding.**
5
+
6
+ ---
7
+
8
+ ## PHASE 0 — Account Setup (Do Once, Do NOW)
9
+
10
+ - [ ] `huggingface-cli login` → authenticated
11
+ - [ ] `huggingface-cli whoami` → shows your username
12
+ - [ ] HF credits visible at https://huggingface.co/settings/billing → $30 showing
13
+ - [ ] Claim HF credits if not done: https://huggingface.co/coupons/claim/hf-openenv-community
14
+ - [ ] Llama-3.2-3B license accepted at https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
15
+ - [ ] License status: "You have been granted access" (NOT "pending")
16
+ - [ ] If pending after 30 min → **SWITCH TO Qwen2.5-1.5B-Instruct. No waiting.**
17
+ - [ ] `wandb login` → authenticated
18
+ - [ ] Wandb project created: `commitguard`
19
+
20
+ ---
21
+
22
+ ## PHASE 1 — Environment Health (Before ANY Training)
23
+
24
+ ### 1A. HF Space is alive
25
+
26
+ ```bash
27
+ curl https://<username>-commitguard.hf.space/health
28
+ ```
29
+
30
+ - [ ] Returns `{"status": "healthy"}` with HTTP 200
31
+ - [ ] Response time < 3 seconds
32
+
33
+ ### 1B. Env accepts actions
34
+
35
+ ```bash
36
+ # Reset
37
+ curl -X POST https://<username>-commitguard.hf.space/reset
38
+ ```
39
+
40
+ - [ ] Returns JSON with `diff` field (non-empty string)
41
+ - [ ] Returns JSON with `done: false`
42
+ - [ ] Returns JSON with `reward: 0.0`
43
+
44
+ ```bash
45
+ # Step with verdict
46
+ curl -X POST https://<username>-commitguard.hf.space/step \
47
+ -H "Content-Type: application/json" \
48
+ -d '{"action_type":"verdict","is_vulnerable":true,"vuln_type":"CWE-89","exploit_sketch":"sql injection"}'
49
+ ```
50
+
51
+ - [ ] Returns JSON with `reward` field (NOT 0.0 — should be +1.0 or -1.0)
52
+ - [ ] Returns JSON with `done: true`
53
+
54
+ ### 1C. Env handles load
55
+
56
+ - [ ] Run 10 sequential reset→step cycles → zero crashes
57
+ - [ ] Run 5 concurrent reset→step cycles → zero crashes, no race conditions
58
+ - [ ] No request takes longer than 10 seconds
59
+
60
+ ### 1D. Reward sanity
61
+
62
+ - [ ] Correct vulnerable verdict → reward > 0 (expected: +1.0)
63
+ - [ ] False positive (safe code flagged) → reward < 0 (expected: -1.0)
64
+ - [ ] False negative (vuln missed) → reward < 0 (expected: -0.5)
65
+ - [ ] Rewards are NOT all identical across different samples
66
+
67
+ ---
68
+
69
+ ## PHASE 2 — Data Verification
70
+
71
+ - [ ] `data/devign_train.jsonl` exists
72
+ - [ ] `wc -l data/devign_train.jsonl` → >1000 samples
73
+ - [ ] `data/devign_test.jsonl` exists
74
+ - [ ] `wc -l data/devign_test.jsonl` → exactly 100 samples
75
+ - [ ] Train and test commit_ids are disjoint (no overlap)
76
+ - [ ] Spot check 3 samples: `code_after` is non-empty, `is_vulnerable` is boolean
77
+ - [ ] No sample exceeds 80 lines of code
78
+ - [ ] Approximate 50/50 split between vulnerable and safe samples
79
+
80
+ ---
81
+
82
+ ## PHASE 3 — GPU & Dependencies
83
+
84
+ ### 3A. Hardware
85
+
86
+ ```bash
87
+ nvidia-smi
88
+ ```
89
+
90
+ - [ ] GPU visible with ≥16GB VRAM
91
+ - [ ] GPU name matches expected (T4 / A10G / L4)
92
+ - [ ] Free VRAM ≥ 14GB (kill other processes if needed)
93
+
94
+ ### 3B. Python environment
95
+
96
+ ```bash
97
+ python --version
98
+ ```
99
+
100
+ - [ ] Python 3.10 or 3.11 (NOT 3.12 — Unsloth compatibility issues)
101
+
102
+ ### 3C. Critical libraries
103
+
104
+ ```bash
105
+ python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
106
+ python -c "from unsloth import FastLanguageModel; print('OK')"
107
+ python -c "from trl import GRPOTrainer; print('OK')"
108
+ python -c "from peft import PeftModel; print('OK')"
109
+ python -c "import wandb; print('OK')"
110
+ ```
111
+
112
+ - [ ] torch ≥ 2.3.0, CUDA = True
113
+ - [ ] unsloth imports without error
114
+ - [ ] trl ≥ 0.12.0 imports without error
115
+ - [ ] peft imports without error
116
+ - [ ] wandb imports without error
117
+
118
+ ---
119
+
120
+ ## PHASE 4 — Model Loading Test
121
+
122
+ ```python
123
+ from unsloth import FastLanguageModel
124
+ model, tokenizer = FastLanguageModel.from_pretrained(
125
+ "meta-llama/Llama-3.2-3B-Instruct",
126
+ max_seq_length=2048,
127
+ load_in_4bit=True,
128
+ )
129
+ print("Model loaded successfully")
130
+ print(f"GPU memory: {torch.cuda.memory_allocated()/1e9:.1f}GB")
131
+ ```
132
+
133
+ - [ ] Model loads without OOM
134
+ - [ ] GPU memory after load < 6GB (leaves room for GRPO overhead)
135
+ - [ ] No warnings about missing tokenizer files
136
+
137
+ ### LoRA application
138
+
139
+ ```python
140
+ model = FastLanguageModel.get_peft_model(
141
+ model, r=8, lora_alpha=16,
142
+ target_modules=["q_proj","k_proj","v_proj","o_proj"],
143
+ )
144
+ print(f"Trainable params: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")
145
+ ```
146
+
147
+ - [ ] LoRA applies without error
148
+ - [ ] Trainable params ~3-8M (NOT the full 3B)
149
+
150
+ ---
151
+
152
+ ## PHASE 5 — Dry Run (2 Steps)
153
+
154
+ **THE MOST CRITICAL CHECK. DO NOT SKIP.**
155
+
156
+ ```bash
157
+ python train_grpo.py --max_steps 2
158
+ ```
159
+
160
+ ### 5A. Generation
161
+
162
+ - [ ] First prompt formatted correctly (print it — does it contain a code diff?)
163
+ - [ ] 4 completions generated for first prompt
164
+ - [ ] At least 2 of 4 completions contain `<action_type>` XML tags
165
+ - [ ] Completions are different from each other (not all identical)
166
+
167
+ ### 5B. Reward collection
168
+
169
+ - [ ] All 4 completions submitted to env
170
+ - [ ] All 4 rewards received (no timeouts)
171
+ - [ ] Rewards have variance (NOT all the same value)
172
+ - [ ] Rewards in expected range [-1.0, +2.0]
173
+ - [ ] Print rewards: `[_____, _____, _____, _____]` (write them down)
174
+
175
+ ### 5C. Training step
176
+
177
+ - [ ] GRPO loss computed (finite number, not NaN, not inf, not 0.0)
178
+ - [ ] Loss value: _____ (write it down)
179
+ - [ ] Wandb shows run with 2 logged steps
180
+ - [ ] No OOM during backward pass
181
+ - [ ] Peak GPU memory: _____GB (must be < 22GB on A10G or < 14GB on T4)
182
+
183
+ ### 5D. Checkpointing
184
+
185
+ - [ ] Output directory created: `./commitguard-llama-3b-grpo/`
186
+ - [ ] Checkpoint files present (or will be at step 50)
187
+
188
+ ### 5E. Timing estimate
189
+
190
+ - [ ] 2 steps took _____ seconds
191
+ - [ ] Estimated time for 300 steps: _____ minutes (= 2-step-time × 150)
192
+ - [ ] Estimated cost: _____ dollars (hours × GPU hourly rate)
193
+ - [ ] Cost within budget? (must be under $8)
194
+
195
+ ---
196
+
197
+ ## PHASE 6 — Baseline Eval (Before Training)
198
+
199
+ **MUST run baseline BEFORE training. Cannot run after — you need the contrast.**
200
+
201
+ ```bash
202
+ python evaluate.py \
203
+ --model_path meta-llama/Llama-3.2-3B-Instruct \
204
+ --test_file data/devign_test.jsonl \
205
+ --output eval_baseline.json
206
+ ```
207
+
208
+ - [ ] Eval completes on all 100 test samples
209
+ - [ ] Binary accuracy: _____% (write it down, expected: 30-50%)
210
+ - [ ] CWE accuracy: _____% (expected: low, maybe 5-15%)
211
+ - [ ] False positive rate: _____%
212
+ - [ ] False negative rate: _____%
213
+ - [ ] Results saved to `eval_baseline.json`
214
+ - [ ] File committed to repo
215
+
216
+ ---
217
+
218
+ ## PHASE 7 — Launch Real Training
219
+
220
+ ### Pre-launch final checks
221
+
222
+ - [ ] All phases 0-6 are GREEN
223
+ - [ ] Budget approved by Niti (team lead)
224
+ - [ ] Config confirmed:
225
+ - [ ] `max_steps = 300`
226
+ - [ ] `save_steps = 50`
227
+ - [ ] `logging_steps = 1`
228
+ - [ ] `num_generations = 4`
229
+ - [ ] `learning_rate = 5e-6`
230
+ - [ ] `report_to = "wandb"`
231
+ - [ ] HF Space is still healthy (re-check `/health`)
232
+ - [ ] Screenshot this checklist with all boxes ticked → post in team channel
233
+
234
+ ### Launch
235
+
236
+ ```bash
237
+ # Option A: HF Jobs (preferred)
238
+ hf jobs uv run --flavor a10g-large train_grpo.py
239
+
240
+ # Option B: GCP (fallback)
241
+ nohup python train_grpo.py > training.log 2>&1 &
242
+ ```
243
+
244
+ - [ ] Job started successfully
245
+ - [ ] Job ID / Dashboard URL captured: _______________________
246
+ - [ ] Wandb run URL captured: _______________________
247
+ - [ ] Posted both URLs in team channel
248
+ - [ ] Set alarm to check in 30 minutes
249
+
250
+ ---
251
+
252
+ ## PHASE 8 — During Training Monitoring
253
+
254
+ **Check every 30 minutes while awake. Check immediately on waking up.**
255
+
256
+ ### Quick health check (< 2 min each time)
257
+
258
+ | Time | reward/mean | reward/std | loss | GPU mem | Status |
259
+ |------|-------------|------------|------|---------|--------|
260
+ | +30m | _____ | _____ | _____ | _____ | ✅/⚠️/❌ |
261
+ | +1h | _____ | _____ | _____ | _____ | ✅/⚠️/❌ |
262
+ | +1.5h | _____ | _____ | _____ | _____ | ✅/⚠️/❌ |
263
+ | +2h | _____ | _____ | _____ | _____ | ✅/⚠️/❌ |
264
+ | Final | _____ | _____ | _____ | _____ | ✅/⚠️/❌ |
265
+
266
+ ### Red flags → immediate action
267
+
268
+ | Red flag | Action |
269
+ |---|---|
270
+ | reward/mean trending DOWN | Check env `/health`. If healthy, lower LR to 2e-6 and relaunch from latest checkpoint. |
271
+ | loss = NaN | Kill run. Add `max_grad_norm=1.0` to config. Relaunch from checkpoint. |
272
+ | GPU memory > 23GB | Will OOM soon. Kill run. Reduce `num_generations` to 2. Relaunch. |
273
+ | Env returning errors in Wandb logs | HF Space is sleeping. Hit `/health` to wake. If down, Niti restarts. |
274
+ | Steps/second dropped to 0 | Job hung. Kill and relaunch from checkpoint. |
275
+ | All rewards identical for 50+ steps | Reward function bug. Ping Deepak. |
276
+
277
+ ---
278
+
279
+ ## PHASE 9 — Post-Training
280
+
281
+ ### Immediately after training completes
282
+
283
+ - [ ] Training finished without crash
284
+ - [ ] Wandb run status: "finished"
285
+ - [ ] Final reward/mean: _____ (higher than step-1 reward? That's the curve.)
286
+ - [ ] Screenshot reward curve from Wandb → save as `plots/reward_curve.png`
287
+ - [ ] Final checkpoint exists in output directory
288
+ - [ ] Total training time: _____ hours
289
+ - [ ] Total cost: $_____
290
+
291
+ ### Save the model
292
+
293
+ ```bash
294
+ # Push LoRA adapter to HF Hub
295
+ huggingface-cli upload inmodel-labs/commitguard-llama-3b \
296
+ ./commitguard-llama-3b-grpo/final
297
+ ```
298
+
299
+ - [ ] Upload successful
300
+ - [ ] Model page visible at https://huggingface.co/inmodel-labs/commitguard-llama-3b
301
+
302
+ ### Verify the saved model loads
303
+
304
+ ```python
305
+ from peft import PeftModel
306
+ from transformers import AutoModelForCausalLM
307
+ base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
308
+ model = PeftModel.from_pretrained(base, "inmodel-labs/commitguard-llama-3b")
309
+ print("Trained model loads correctly")
310
+ ```
311
+
312
+ - [ ] Model loads without error
313
+ - [ ] Quick inference produces XML-tagged output (not garbage)
314
+
315
+ ---
316
+
317
+ ## PHASE 10 — Trained Model Eval
318
+
319
+ ```bash
320
+ python evaluate.py \
321
+ --model_path ./commitguard-llama-3b-grpo/final \
322
+ --test_file data/devign_test.jsonl \
323
+ --is_lora \
324
+ --base_model meta-llama/Llama-3.2-3B-Instruct \
325
+ --output eval_trained.json
326
+ ```
327
+
328
+ - [ ] Eval completes on all 100 test samples
329
+ - [ ] Binary accuracy: _____% (compare to baseline: _____%)
330
+ - [ ] CWE accuracy: _____% (compare to baseline: _____%)
331
+ - [ ] False positive rate: _____% (compare to baseline: _____%)
332
+ - [ ] False negative rate: _____% (compare to baseline: _____%)
333
+ - [ ] Results saved to `eval_trained.json`
334
+ - [ ] File committed to repo
335
+
336
+ ### The verdict
337
+
338
+ - [ ] Trained accuracy > baseline accuracy? **YES / NO**
339
+ - [ ] If YES: by how many percentage points? _____pp
340
+ - [ ] If NO: check if qualitative outputs improved (reasoning traces better even if accuracy similar)
341
+
342
+ ### Hand off to team
343
+
344
+ - [ ] Post in team channel:
345
+ ```
346
+ TRAINING COMPLETE
347
+ Baseline accuracy: X%
348
+ Trained accuracy: Y%
349
+ Improvement: +Zpp
350
+ Wandb: [url]
351
+ Reward curve: [screenshot]
352
+ Model on Hub: inmodel-labs/commitguard-llama-3b
353
+ Ready for plots and README.
354
+ ```
355
+ - [ ] Hand `eval_baseline.json` and `eval_trained.json` to Deepak for plot generation
356
+ - [ ] Kill GCP VM if running (`gcloud compute instances stop ...`)
357
+ - [ ] Update budget tracker in team channel
358
+
359
+ ---
360
+
361
+ ## PHASE 11 — Inference for Demo Video
362
+
363
+ **Divyank runs this to get the before/after examples for the demo recording.**
364
+
365
+ ### Pick the demo sample
366
+
367
+ - [ ] Find ONE sample from test set where:
368
+ - Ground truth: vulnerable (preferably CWE-89 SQL injection)
369
+ - Baseline model gets it WRONG
370
+ - Trained model gets it RIGHT
371
+ - [ ] Sample commit_id: _______________________
372
+
373
+ ### Generate baseline output
374
+
375
+ ```python
376
+ # Load untrained model, generate response for the demo sample
377
+ # Save full text output to demo_baseline_output.txt
378
+ ```
379
+
380
+ - [ ] Baseline output saved
381
+ - [ ] Output shows: wrong verdict / no reasoning / random guess
382
+
383
+ ### Generate trained output
384
+
385
+ ```python
386
+ # Load trained model, generate response for the demo sample
387
+ # Save full text output to demo_trained_output.txt
388
+ ```
389
+
390
+ - [ ] Trained output saved
391
+ - [ ] Output shows: correct verdict / identifies CWE / sketches exploit
392
+ - [ ] The contrast between baseline and trained is VISIBLE and OBVIOUS
393
+
394
+ ### Ready for recording
395
+
396
+ - [ ] Both outputs saved as text files for screen capture
397
+ - [ ] The diff for this sample is readable (not 80 lines of dense C)
398
+ - [ ] Proceed to demo video recording (see tasks_divyank.md)
399
+
400
+ ---
401
+
402
+ ## Emergency Fallback Reference Card
403
+
404
+ **Tape this next to your screen. Read it at 3 AM when your brain is mush.**
405
+
406
+ ```
407
+ CRASHED? → Check Wandb → Is it OOM?
408
+ YES OOM → num_generations=2, retry from checkpoint
409
+ STILL OOM → Switch to Qwen2.5-1.5B, retry from scratch
410
+ NOT OOM → Check error message → Screenshot → Post in team channel
411
+
412
+ REWARDS ALL ZERO? → Env bug, not model bug
413
+ → curl /health on HF Space
414
+ → If dead: ping Niti
415
+ → If alive: curl /step manually, check reward value
416
+ → If reward from curl is also 0: Deepak's reward function bug
417
+
418
+ LLAMA ACCESS DENIED? → Switch to Qwen2.5-1.5B immediately
419
+ → Change ONE line: model_name="Qwen/Qwen2.5-1.5B-Instruct"
420
+ → Everything else stays the same
421
+
422
+ CURVE IS FLAT? → Ship it anyway with honest narrative
423
+ → "Training evidence shows optimization attempted;
424
+ reward signal needs richer shaping in future work"
425
+ → A flat curve + honest story > no submission
426
+ ```
data/cwe_keywords.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "CWE-119": ["buffer overflow", "out of bounds", "overflow", "bounds check", "memcpy", "strcpy", "strcat", "index out of range", "heap", "stack smash"],
3
+ "CWE-476": ["null pointer", "nullptr", "dereference", "null check", "segmentation fault", "null access", "uninitialized"],
4
+ "CWE-189": ["integer overflow", "signedness", "division by zero", "arithmetic overflow", "wrap around", "truncation", "cast", "narrowing"],
5
+ "CWE-20": ["input validation", "improper input", "validation bypass", "sanitization", "untrusted input", "malformed data", "missing check"],
6
+ "CWE-22": ["path traversal", "directory traversal", "../", "..\\", "file inclusion", "arbitrary file", "escape root", "chroot"],
7
+ "CWE-78": ["command injection", "os.system", "subprocess", "shell=true", "exec(", "popen", "system(", "shell command"],
8
+ "CWE-89": ["sql injection", "sqli", "drop table", "union select", "query concatenation", "prepared statement", "bypass login"],
9
+ "CWE-79": ["xss", "cross site scripting", "script tag", "innerhtml", "alert(", "javascript:", "onerror", "content injection"],
10
+ "CWE-OTHER": ["vulnerability", "security", "exploit", "unsafe", "flaw", "bug", "error handling", "race condition", "use after free", "double free"]
11
+ }
data/devign_filtered.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
data/devign_test.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
data/devign_train.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
models.py ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass, field
4
+ from typing import Literal, Optional
5
+
6
+
7
+ ActionType = Literal["request_context", "analyze", "verdict"]
8
+
9
+
10
+ @dataclass(frozen=True, slots=True)
11
+ class CommitGuardAction:
12
+ action_type: ActionType
13
+ file_path: Optional[str] = None
14
+ reasoning: Optional[str] = None
15
+ is_vulnerable: Optional[bool] = None
16
+ vuln_type: Optional[str] = None
17
+ exploit_sketch: Optional[str] = None
18
+ raw_action: Optional[str] = None
19
+ parse_error: Optional[str] = None
20
+
21
+
22
+ @dataclass(frozen=True, slots=True)
23
+ class ContextSnippet:
24
+ file_path: str
25
+ start_line: int
26
+ end_line: int
27
+ content: str
28
+
29
+
30
+ @dataclass(frozen=True, slots=True)
31
+ class CommitGuardObservation:
32
+ # Cheating-prevention critical: this shape must never include ground truth.
33
+ episode_id: str
34
+ step_idx: int
35
+ diff: str
36
+ available_files: list[str]
37
+ context_snippets: list[ContextSnippet] = field(default_factory=list)
38
+ budget_remaining: int = 0
39
+ error: Optional[str] = None
40
+
41
+
42
+ @dataclass(frozen=True, slots=True)
43
+ class CommitGuardState:
44
+ episode_id: str
45
+ current_sample_id: str
46
+ step_count: int
47
+ context_requests: int = 0
48
+ history: list[dict] = field(default_factory=list)
49
+
50
+
51
+ @dataclass(frozen=True, slots=True)
52
+ class DevignSample:
53
+ sample_id: str
54
+ diff: str
55
+ available_files: list[str]
56
+ # Server-only fields (must never be surfaced in Observation)
57
+ is_vulnerable: Optional[bool] = None
58
+ cwe: Optional[str] = None
59
+ target_file: Optional[str] = None
60
+ files: Optional[dict[str, str]] = None
61
+
openenv.yaml ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ name: commitguard
2
+ version: "0.1.0"
3
+ description: "CommitGuard OpenEnv environment (FastAPI server)"
4
+ port: 8000
5
+ entrypoint: "server/app.py"
6
+
prd.md ADDED
@@ -0,0 +1,381 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CommitGuard Product Requirements Document
2
+
3
+ **Project:** CommitGuard
4
+ **Owner:** Niti (Inmodel Labs)
5
+ **Team:** Niti, Deepak, Divyank
6
+ **Submission deadline:** Sunday 5:00 PM IST
7
+ **Hackathon:** Meta OpenEnv Hackathon (PyTorch + Hugging Face + Scaler)
8
+ **Document status:** Locked. Scope freeze at midnight Saturday.
9
+
10
+ ---
11
+
12
+ ## 1. Executive Summary
13
+
14
+ CommitGuard is a Reinforcement Learning environment built on Meta OpenEnv that trains LLM agents to detect exploitable vulnerabilities in code commits. The submission demonstrates that AI-paced security review is feasible that an agent trained on commit-level reasoning can match the velocity at which AI coding agents are now shipping production code.
15
+
16
+ The deliverable is a runnable HF Space hosting the env, a training notebook that produces a measurable learning curve on Llama-3.2-3B-Instruct, a demo video showing the qualitative shift from untrained to trained behavior, and a README that tells the story.
17
+
18
+ ---
19
+
20
+ ## 2. Problem Statement
21
+
22
+ ### 2.1 The shift in software development
23
+
24
+ Until recently, code was written by humans at human velocity. Security review processes were designed around this assumption periodic pentests every 3 to 6 months, with manual code review at PR time. The cycle worked because the codebase changed slowly enough that periodic deep review caught most issues before they reached production.
25
+
26
+ This assumption has broken. Code is now being written and shipped by AI coding agents Claude Code, Cursor, autonomous coding agents at 10 to 100 times human velocity. Companies push to production daily, sometimes hourly. A pentest report from six months ago describes a codebase that no longer exists.
27
+
28
+ ### 2.2 The asymmetry
29
+
30
+ The same class of LLM that writes the code can be weaponized to attack it. An adversary equipped with autonomous coding tooling, given repository access or even just leaked commits, can pentest at the same velocity defenders ship. Defense runs on human time. Offense runs on AI time. **This asymmetry is unsustainable for any organization shipping AI-generated code at scale.**
31
+
32
+ ### 2.3 Why this is a frontier problem
33
+
34
+ AI red-teaming today is overwhelmingly a manual, human-bottlenecked discipline. Researchers at Anthropic, OpenAI, and Meta craft attacks one at a time. There is no automated equivalent of Metasploit for AI-generated code. Closing that gap is an open research problem that frontier labs are actively investing in.
35
+
36
+ ---
37
+
38
+ ## 3. Goals and Non-Goals
39
+
40
+ ### 3.1 Goals (in scope for this submission)
41
+
42
+ - Deliver a working OpenEnv environment that takes a code commit as input and rewards an agent for correctly identifying vulnerabilities, the CWE class, and a plausible exploit
43
+ - Train a small Llama variant (Llama-3.2-3B-Instruct) on the env using GRPO via TRL + Unsloth
44
+ - Demonstrate measurable learning baseline vs. trained accuracy with reward curves
45
+ - Ship a complete submission package: HF Space, training notebook, README, demo video, optional HF blog post
46
+ - Frame the work in language a Meta researcher recognizes: RLVR (Reinforcement Learning from Verifiable Rewards), commit-time security, AI-paced defense
47
+
48
+ ### 3.2 Non-goals (explicitly out of scope)
49
+
50
+ - Production-ready security tool this is a research environment, not a CI plugin
51
+ - Real-time exploit execution against arbitrary code the v1 reward uses pattern matching, not sandboxed execution
52
+ - Multi-file / repo-level reasoning v1 operates on single-file commits up to 80 lines
53
+ - Multi-agent self-play listed in Future Work
54
+ - Pentesting beyond static code analysis no network attacks, social engineering, or runtime probing
55
+ - Coverage of all CWEs v1 focuses on the top 10 CWEs in Devign
56
+
57
+ ### 3.3 Non-goals from the rubric perspective
58
+
59
+ The rubric rewards ambition and storytelling more heavily than engineering polish. Therefore: not pursuing exhaustive test coverage, not optimizing for inference latency, not building a fancy frontend. The HF Space's default web UI is sufficient.
60
+
61
+ ---
62
+
63
+ ## 4. Target Users and Stakeholders
64
+
65
+ | Stakeholder | Role | What they care about |
66
+ |---|---|---|
67
+ | Hackathon judges (Meta partner engineers) | Primary audience | Innovation, story, training evidence, reward design |
68
+ | Meta Superintelligence Labs researchers | Aspirational audience | Frontier framing, RLVR alignment, paper-worthiness |
69
+ | HF community | Discovery audience | Reproducibility, runnable Space, clean README |
70
+ | Future contributors | Builder audience | Code clarity, extensibility hooks for v2 |
71
+
72
+ ---
73
+
74
+ ## 5. Solution Overview
75
+
76
+ ### 5.1 The environment
77
+
78
+ CommitGuard is an OpenEnv environment where an agent investigates code commits and decides whether they introduce exploitable vulnerabilities. The agent has limited investigation budget (5 steps maximum per episode), forcing it to reason efficiently rather than brute-forcing context.
79
+
80
+ ### 5.2 The agent loop
81
+
82
+ 1. `reset()` env loads a commit (a `code_before`/`code_after` pair plus metadata) from a preprocessed Devign-derived dataset, returns the diff and the list of available files in the repo
83
+ 2. `step(action)` agent emits one of three action types:
84
+ - `request_context(file_path)` pull surrounding code (small reward penalty, encourages efficiency)
85
+ - `analyze(reasoning)` write chain-of-thought, no reward effect, logged for traces
86
+ - `verdict(is_vulnerable, vuln_type, exploit_sketch)` terminate the episode with a judgment
87
+ 3. Reward fires on verdict, computed server-side against ground truth the agent never sees
88
+
89
+ ### 5.3 Reward design (RLVR philosophy)
90
+
91
+ The reward is tiered and grounded in dataset truth, not in another LLM's opinion. This is deliberate it follows the RLVR tradition (verifiable rewards from ground truth or executable checks) and prevents the reward hacking that plagues LLM-as-judge setups.
92
+
93
+ | Signal | Reward |
94
+ |---|---|
95
+ | Correct binary verdict (vulnerable vs. safe) | +1.0 |
96
+ | Correct CWE classification (when vulnerable) | +0.5 |
97
+ | Plausible exploit sketch (CWE-keyword match) | +0.5 |
98
+ | False positive (safe flagged as vulnerable) | -1.0 |
99
+ | False negative (real vuln missed) | -0.5 |
100
+ | Per-step context request | -0.05 |
101
+ | Episode step cap | 5 steps |
102
+
103
+ The shape is hard to game flagging everything is punished by false positives, never investigating means no exploit sketch bonus.
104
+
105
+ ---
106
+
107
+ ## 6. Technical Architecture
108
+
109
+ ### 6.1 System diagram
110
+
111
+ ```
112
+ HTTP/JSON
113
+ TRL + Unsloth HF Space
114
+ Llama-3.2-3B reset/step FastAPI server
115
+ GRPO trainer /state (Docker)
116
+ (HF Jobs A10G)
117
+
118
+ Devign
119
+ JSONL
120
+
121
+
122
+ Reward
123
+ function
124
+
125
+
126
+ ```
127
+
128
+ ### 6.2 Component breakdown
129
+
130
+ **Env server** (Python, FastAPI, Docker, OpenEnv 0.2.3+)
131
+ - `models.py` Action, Observation, State dataclasses (extends OpenEnv base classes)
132
+ - `environment.py` `reset()`, `step()`, `state()` methods on the `CommitGuardEnvironment` class
133
+ - `reward.py` pure function `compute_reward(action, ground_truth, cwe_keywords) -> float`
134
+ - `parse_action.py` XML-tag parser, robust to malformed model output
135
+ - `data/devign_filtered.jsonl` preprocessed dataset, shipped in image
136
+ - `data/cwe_keywords.json` top-10 CWE exploit-pattern keyword map
137
+
138
+ **Env client** (auto-generated by OpenEnv CLI)
139
+ - `client.py` `HTTPEnvClient` subclass, used by training notebook
140
+ - Installable via `pip install git+https://huggingface.co/spaces/<user>/commitguard`
141
+
142
+ **Training pipeline** (Python, TRL, Unsloth, PEFT, Wandb)
143
+ - `train_grpo.py` GRPOTrainer config + main loop
144
+ - `agent_prompt.py` system prompt template with XML-tag action format
145
+ - `evaluate.py` runs N samples through a model, returns accuracy stats
146
+
147
+ **Storytelling artifacts**
148
+ - `README.md` pitch + results + links
149
+ - `demo_video.mp4` 60-90 second before/after, hosted on YouTube unlisted
150
+ - `commitguard_hf_blog.md` optional HF Hub blog post (page 26 bonus)
151
+ - `plots/` reward_curve.png, baseline_vs_trained.png, per_cwe.png
152
+
153
+ ### 6.3 Data flow
154
+
155
+ 1. Preprocess Devign once at build time `data/devign_filtered.jsonl` (~5000 samples, balanced, filtered to <80 LOC)
156
+ 2. Build Docker image with JSONL embedded
157
+ 3. `openenv push` deploys to HF Space
158
+ 4. Training notebook connects to HF Space URL via the OpenEnv HTTP client
159
+ 5. Each training step: GRPO generates 4 completions per prompt each runs a full episode in the env rewards collected policy updated via LoRA
160
+ 6. Wandb logs reward curves, training loss, checkpoints saved every 50 steps
161
+ 7. Final LoRA adapter saved to HF Hub for evaluation and demo
162
+
163
+ ### 6.4 Cheating prevention
164
+
165
+ The agent must never see ground truth. Enforced by architecture:
166
+
167
+ - Ground truth lives only on the server, in the JSONL file the env loads from
168
+ - The Observation dataclass schema explicitly excludes `is_vulnerable`, `cwe_type`, and `target_file_with_label`
169
+ - A unit test (`test_no_leak.py`) asserts no observation contains forbidden fields
170
+ - The server returns only `reward` (a scalar) on each step, never the label that produced it
171
+
172
+ ---
173
+
174
+ ## 7. Stack and Dependencies
175
+
176
+ ### 7.1 Locked technical decisions
177
+
178
+ | Decision | Choice | Rationale |
179
+ |---|---|---|
180
+ | Env framework | Meta OpenEnv 0.2.3+ | Mandatory per submission rules |
181
+ | Server runtime | FastAPI in Docker | OpenEnv default, lowest friction |
182
+ | Hosting | HF Space | Mandatory per submission rules, three-in-one (server + repo + registry) |
183
+ | Data source | Devign (DetectBERT subset) | Already on disk, real CWE labels, manageable size |
184
+ | Model | Llama-3.2-3B-Instruct | Meta-branded for the Meta hackathon, fits A10G with GRPO |
185
+ | Training framework | TRL with GRPO | Native OpenEnv integration via `reward_funcs` callback |
186
+ | Training optimization | Unsloth 4-bit + LoRA r=8 | 70% memory reduction, 2x speed (page 75 of opening deck) |
187
+ | Training infra | HF Jobs A10G | $0.40-1.50/hr, runs unattended, integrates with HF ecosystem |
188
+ | Dev infra | GCP VM with T4 | Stable, no Colab disconnects, leverages 24,000 GCP credit |
189
+ | Action serialization | XML-tag free-text | Robust to small-model output variance, easier than JSON-mode |
190
+ | Logging | Wandb | TRL native, judges can view runs |
191
+
192
+ ### 7.2 Fallback decisions (pre-approved, no debate when triggered)
193
+
194
+ | If this fails | Fall back to | Trigger |
195
+ |---|---|---|
196
+ | Llama-3.2-3B OOM on A10G | Qwen2.5-1.5B-Instruct | First test step crashes |
197
+ | HF Jobs queue full | GCP A10G on-demand | Job queues for >30 min |
198
+ | 3-action env doesn't ship by midnight | 2-action env (analyze + verdict) | Niti's checkpoint red |
199
+ | Tiered reward buggy | Binary correct/incorrect reward | Deepak's checkpoint red |
200
+ | Training curve flat | Ship with qualitative comparison only | Curve still flat at 10 AM Sunday |
201
+ | Demo video can't be cleanly recorded | Side-by-side text trace in README | Recording fails twice |
202
+
203
+ ---
204
+
205
+ ## 8. Functional Requirements
206
+
207
+ ### 8.1 Environment functional requirements
208
+
209
+ | ID | Requirement | Priority |
210
+ |---|---|---|
211
+ | F-1 | Env exposes `/health`, `/reset`, `/step`, `/state`, `/docs` endpoints | P0 |
212
+ | F-2 | `reset()` returns a random commit observation, never the same one twice in a single episode | P0 |
213
+ | F-3 | `step()` accepts XML-tagged action strings and parses them robustly | P0 |
214
+ | F-4 | `step()` returns reward, observation, and done flag | P0 |
215
+ | F-5 | Episode terminates on `verdict` action OR after 5 steps | P0 |
216
+ | F-6 | Observation never contains ground-truth labels | P0 |
217
+ | F-7 | Env handles malformed actions gracefully (returns -0.5 reward, doesn't crash) | P1 |
218
+ | F-8 | Env supports concurrent episodes (multiple training generations in parallel) | P1 |
219
+ | F-9 | Web UI on HF Space allows manual interaction for demo recording | P2 |
220
+
221
+ ### 8.2 Training functional requirements
222
+
223
+ | ID | Requirement | Priority |
224
+ |---|---|---|
225
+ | T-1 | Training notebook runs end-to-end on a single A10G | P0 |
226
+ | T-2 | Reward curve, training loss, and completions logged to Wandb | P0 |
227
+ | T-3 | LoRA adapter saved every 50 steps for resumability | P0 |
228
+ | T-4 | Baseline (untrained) evaluation on 100 held-out samples completes in <10 min | P0 |
229
+ | T-5 | Trained model evaluation produces per-CWE accuracy breakdown | P1 |
230
+ | T-6 | Notebook runnable from Colab via "Open in Colab" badge in README | P1 |
231
+
232
+ ### 8.3 Storytelling functional requirements
233
+
234
+ | ID | Requirement | Priority |
235
+ |---|---|---|
236
+ | S-1 | README explains problem, env, results, and motivation in <5 min read | P0 |
237
+ | S-2 | All plot PNGs committed to repo (not Wandb-only) | P0 |
238
+ | S-3 | Demo video 60-90 sec, before/after on a single SQL injection example | P0 |
239
+ | S-4 | Wandb run URL linked in README | P1 |
240
+ | S-5 | HF Hub blog post published and linked | P2 |
241
+
242
+ ---
243
+
244
+ ## 9. Non-Functional Requirements
245
+
246
+ | Aspect | Requirement |
247
+ |---|---|
248
+ | Performance | Single `step()` call returns in <2 seconds on HF Space free tier |
249
+ | Reliability | Env survives 100 random episodes without crash |
250
+ | Reproducibility | Training notebook produces a measurable learning curve when re-run with same seed |
251
+ | Discoverability | HF Space tagged with `openenv`, `rl`, `security`, `code` |
252
+ | Documentation | README is self-contained judge can understand without reading source |
253
+ | Licensing | Code MIT-licensed, dataset attribution to Devign authors |
254
+
255
+ ---
256
+
257
+ ## 10. Success Metrics
258
+
259
+ ### 10.1 Submission completeness (binary, must-pass)
260
+
261
+ - [ ] HF Space deployed and `/health` returns 200 OK
262
+ - [ ] Training notebook runs without crashes on a fresh Colab/VM
263
+ - [ ] README has all required links (HF Space, notebook, video, GitHub)
264
+ - [ ] At least one reward curve plot committed
265
+ - [ ] Demo video accessible via public URL
266
+
267
+ ### 10.2 Quality metrics (graded by rubric)
268
+
269
+ | Metric | Target | Stretch |
270
+ |---|---|---|
271
+ | Innovation framing recognized by mentor | "this is an interesting angle" feedback | "this is paper-worthy" feedback |
272
+ | Baseline accuracy (untrained Llama-3.2-3B) | Establishes a floor (likely 30-45%) | |
273
+ | Trained accuracy (after 300 GRPO steps) | Beats baseline by 10pp absolute | Beats baseline by 20pp |
274
+ | Reward curve | Bends upward visibly | Smooth monotonic increase |
275
+ | Per-CWE breakdown | At least 3 CWEs show improvement | All top-5 CWEs show improvement |
276
+ | Storytelling | Mentor at Round 3 can repeat the pitch back | Mentor offers to share with Meta team |
277
+
278
+ ### 10.3 Anti-metrics (things we explicitly don't optimize for)
279
+
280
+ - Number of features
281
+ - Number of CWEs covered (more is not better depth beats breadth here)
282
+ - Lines of code
283
+ - Model size (going larger doesn't make a stronger submission, just slower training)
284
+
285
+ ---
286
+
287
+ ## 11. Risks and Mitigations
288
+
289
+ | Risk | Likelihood | Impact | Mitigation |
290
+ |---|---|---|---|
291
+ | Training run produces flat curve | Medium | High | Pre-approved pivot to qualitative-comparison narrative; baseline already establishes a contrast |
292
+ | HF Space deployment fails at 4 AM | Low | High | Fallback to Docker image with `docker run` instructions in README |
293
+ | Llama-3.2 license approval delayed | Low | Medium | Submit license request immediately at GCP setup; Qwen-1.5B fallback ready |
294
+ | Devign data has bad CWE labels | Medium | Medium | Filter aggressively; if too noisy, drop to top-5 cleanest CWEs only |
295
+ | One teammate falls behind their phase | Medium | High | Sync points at midnight, 9 AM, 3 PM allow scope cuts; mock-env pattern means training isn't blocked |
296
+ | Niti exhausted at Mentor Round 3 | High if no sleep | High | Mandatory sleep schedule 12:30 AM5:00 AM, non-negotiable |
297
+ | Demo video can't be cleanly recorded | Medium | Medium | Cherry-pick the best example; fall back to text trace if recording fails twice |
298
+ | HF Space rate limits during training | Low | Medium | Run training on local Docker if HF Space hits limits |
299
+
300
+ ---
301
+
302
+ ## 12. Timeline and Milestones
303
+
304
+ | Time (IST) | Milestone | Owner |
305
+ |---|---|---|
306
+ | Sat 9:30 PM | Phase 1 starts env scaffolding, data prep, training scaffolding in parallel | All |
307
+ | Sat 8:00 PM | Mentor Round 2 pitch validation | Niti |
308
+ | Sat 11:59 PM | Phase 1 checkpoint env runs, data ready, mock training works | All |
309
+ | Sun 12:00 AM | **Scope freeze** no new features after this point | All |
310
+ | Sun 12:30 AM | Niti sleep starts | Niti |
311
+ | Sun 3:00 AM | HF Space live, Deepak sleep starts | Deepak |
312
+ | Sun 5:30 AM | Real training run launched on HF Jobs, Divyank sleep starts | Divyank |
313
+ | Sun 5:00 AM | Niti wakes, watches training | Niti |
314
+ | Sun 9:00 AM | Team sync training results, plot status | All |
315
+ | Sun 10:00 AM | Mentor Round 3 final sharpening | Niti |
316
+ | Sun 11:30 AM | Demo video recorded and uploaded | Divyank |
317
+ | Sun 1:00 PM | README finalized | Niti |
318
+ | Sun 3:00 PM | **Feature freeze** 2-hour reminder, no more changes | All |
319
+ | Sun 4:30 PM | Submission packaged | Niti |
320
+ | Sun 5:00 PM | **Submission deadline** | |
321
+
322
+ ---
323
+
324
+ ## 13. Open Questions and Assumptions
325
+
326
+ ### 13.1 Assumptions
327
+
328
+ - Devign dataset is on disk locally (or downloadable in <30 min) to be verified by Deepak at Phase 1 start
329
+ - HF Space free tier is sufficient for env hosting during the hackathon backup plan: $9/mo upgrade if rate limited
330
+ - Llama-3.2-3B-Instruct license approval lands within 1 hour of request Qwen fallback ready if not
331
+ - HF Jobs A10G availability at 5 AM Sunday GCP A10G fallback if queued
332
+
333
+ ### 13.2 Open questions (to resolve during execution)
334
+
335
+ - Exact number of training steps to maximize curve visibility within budget answered empirically by 9 AM Sunday based on observed loss
336
+ - Whether to ship a Colab-runnable notebook AND an HF Jobs notebook, or just one defer to Divyank's call at Phase 2
337
+ - Whether to include a comparison against a non-RL baseline (pure SFT or zero-shot) stretch only
338
+
339
+ ---
340
+
341
+ ## 14. Future Work (Post-Hackathon)
342
+
343
+ This section becomes part of the README's "What's Next" pitch explicitly signals to judges that we understand the limitations and have a roadmap.
344
+
345
+ - **Sandboxed exploit execution** replace pattern-match reward with actual exploit runs against compiled code in a Docker sandbox
346
+ - **Multi-file commit reasoning** extend the env to support diffs spanning multiple files, with a context budget
347
+ - **Self-play loop** pair CommitGuard with a code-generation agent; defender and attacker train against each other (the AlphaGo pattern for security)
348
+ - **Agentic harness integration** wire into real CI pipelines via the OpenEnv MCP layer, enabling commit-time security review at PR open
349
+ - **Real CVE corpus** extend beyond Devign to recent CVE-tagged commits from major open-source repos
350
+ - **Multi-language support** current env is C-focused via Devign; extend to Python, JavaScript, Go
351
+ - **Reward shape ablations** formal study of how reward composition affects which vulnerability types the model learns fastest
352
+
353
+ ---
354
+
355
+ ## 15. Appendix
356
+
357
+ ### 15.1 Key reference URLs (for the team to bookmark)
358
+
359
+ - OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
360
+ - OpenEnv Scaler intro: https://tinyurl.com/openenv-scaler
361
+ - TRL OpenEnv docs: https://huggingface.co/docs/trl/en/openenv
362
+ - TRL Sudoku GRPO example: https://github.com/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb
363
+ - TRL Wordle GRPO example: https://github.com/huggingface/trl/blob/main/examples/notebooks/openenv_wordle_grpo.ipynb
364
+ - Unsloth 2048 example: https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/examples/unsloth_2048.ipynb
365
+ - Llama-3.2-3B model card: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
366
+ - HF Jobs docs: https://huggingface.co/docs/hub/jobs
367
+ - Cursor credits: https://tinyurl.com/sclr-openenv-dashboard
368
+ - HF $30 credits: https://huggingface.co/coupons/claim/hf-openenv-community
369
+
370
+ ### 15.2 Document version
371
+
372
+ - v1.0 Saturday evening, Bangalore venue. Locked at midnight Saturday.
373
+ - Changes after lock require explicit team-wide sign-off and a documented rationale.
374
+
375
+ ---
376
+
377
+ ## 16. The 30-Second Pitch (For Mentor Rounds, Memorize This)
378
+
379
+ > "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it defense is on human time, offense is on AI time, and that asymmetry breaks the security model.
380
+ >
381
+ > CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."
pyproject.toml ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "commitguard"
3
+ version = "0.1.0"
4
+ description = "CommitGuard OpenEnv RL environment for commit-time vuln detection"
5
+ readme = "README.md"
6
+ requires-python = ">=3.10"
7
+ dependencies = [
8
+ "fastapi>=0.110",
9
+ "uvicorn[standard]>=0.27",
10
+ "pydantic>=2.6",
11
+ "openenv>=0.1.13",
12
+ ]
13
+
14
+ [project.optional-dependencies]
15
+ train = [
16
+ "requests",
17
+ "torch>=2.4",
18
+ "transformers>=4.46",
19
+ "trl>=0.12",
20
+ "accelerate>=1.0",
21
+ "peft>=0.13",
22
+ "datasets>=3.0",
23
+ "wandb",
24
+ "matplotlib",
25
+ "unsloth",
26
+ "bitsandbytes>=0.44",
27
+ "jupyter",
28
+ "ipywidgets",
29
+ ]
30
+
31
+ [project.scripts]
32
+ server = "commitguard_env.server:main"
33
+
34
+ [tool.setuptools]
35
+ packages = ["commitguard_env"]
36
+
37
+ [build-system]
38
+ requires = ["setuptools>=68"]
39
+ build-backend = "setuptools.build_meta"
scripts/README.md ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ ## Scripts
2
+
3
+ This directory is for repeatable CLI-first ops (dataset preprocessing, local smoke runs).
4
+
5
+ Primary expected script (Deepak):
6
+ - `preprocess_devign.py` → produces `data/devign_filtered.jsonl`
7
+
scripts/agent_prompt.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """System prompt and per-turn prompt for CommitGuard GRPO training."""
2
+
3
+ SYSTEM_PROMPT = """\
4
+ You are a security auditor. You receive code diffs (commits) and must decide \
5
+ whether each commit introduces an exploitable vulnerability.
6
+
7
+ You may take up to 5 actions per episode. Each action must be wrapped in XML tags.
8
+
9
+ Action types:
10
+
11
+ 1. Request additional file context:
12
+ <action><action_type>request_context</action_type><file_path>path/to/file.c</file_path></action>
13
+
14
+ 2. Analyze / think (chain-of-thought, no reward effect):
15
+ <action><action_type>analyze</action_type><reasoning>your reasoning here</reasoning></action>
16
+
17
+ 3. Submit a verdict (terminates the episode):
18
+ <action><action_type>verdict</action_type><is_vulnerable>true|false</is_vulnerable><vuln_type>CWE-XXX</vuln_type><exploit_sketch>describe how to exploit</exploit_sketch></action>
19
+
20
+ Rules:
21
+ - You MUST submit exactly one verdict before running out of budget.
22
+ - If the code is safe, set is_vulnerable to false and vuln_type to NONE.
23
+ - Be specific in exploit_sketch: name the attack vector (e.g., buffer overflow via unchecked memcpy).
24
+ - Common CWE types: CWE-79 (XSS), CWE-89 (SQL injection), CWE-22 (path traversal), \
25
+ CWE-78 (command injection), CWE-20 (input validation), CWE-125 (out-of-bounds read), \
26
+ CWE-787 (buffer overflow), CWE-190 (integer overflow), CWE-476 (null dereference), \
27
+ CWE-400 (resource exhaustion).
28
+ """
29
+
30
+
31
+ def get_agent_prompt(diff: str, available_files: list[str], step_idx: int) -> str:
32
+ files_str = ", ".join(available_files) if available_files else "(none)"
33
+ return (
34
+ f"## Commit Diff\n\n```diff\n{diff}\n```\n\n"
35
+ f"Available files: {files_str}\n"
36
+ f"Step: {step_idx}/5\n\n"
37
+ "Analyze this commit and submit your verdict."
38
+ )
scripts/check_cuda.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import torch
2
+ print(f'CUDA available: {torch.cuda.is_available()}')
3
+ if torch.cuda.is_available():
4
+ print(f'Device count: {torch.cuda.device_count()}')
5
+ print(f'Device name: {torch.cuda.get_device_name(0)}')
6
+ print(f'Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB')
scripts/check_disjoint.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from pathlib import Path
3
+
4
+ def get_ids(file_path):
5
+ ids = set()
6
+ with open(file_path, 'r', encoding='utf-8') as f:
7
+ for line in f:
8
+ obj = json.loads(line)
9
+ ids.add(obj.get('commit_id') or obj.get('sample_id'))
10
+ return ids
11
+
12
+ train_ids = get_ids('data/devign_train.jsonl')
13
+ test_ids = get_ids('data/devign_test.jsonl')
14
+
15
+ overlap = train_ids.intersection(test_ids)
16
+ print(f"Train IDs: {len(train_ids)}")
17
+ print(f"Test IDs: {len(test_ids)}")
18
+ print(f"Overlap: {len(overlap)}")
19
+ if overlap:
20
+ print(f"Overlapping IDs: {list(overlap)[:5]}")
scripts/evaluate.py ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import argparse
3
+ import os
4
+ import re
5
+ import torch
6
+ from transformers import AutoModelForCausalLM, AutoTokenizer
7
+ from peft import PeftModel
8
+ from pathlib import Path
9
+ import sys
10
+
11
+ # Add project root to path for imports
12
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
13
+ from scripts.agent_prompt import SYSTEM_PROMPT
14
+
15
+ def parse_xml_action(text):
16
+ """Extract action fields from XML-tagged model output."""
17
+ def extract(tag, default=None):
18
+ match = re.search(f"<{tag}>(.*?)</{tag}>", text, re.DOTALL)
19
+ return match.group(1).strip() if match else default
20
+
21
+ is_vuln_str = extract("is_vulnerable", "false")
22
+ return {
23
+ "action_type": "verdict",
24
+ "is_vulnerable": is_vuln_str.lower() == "true",
25
+ "vuln_type": extract("vuln_type", "unknown"),
26
+ "exploit_sketch": extract("exploit_sketch", ""),
27
+ }
28
+
29
+ def format_eval_prompt(sample):
30
+ return (
31
+ f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n"
32
+ f"{SYSTEM_PROMPT}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
33
+ f"Analyze this commit and submit your verdict.\n\n"
34
+ f"Code diff:\n```diff\n{sample['diff']}\n```<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
35
+ )
36
+
37
+ def evaluate(model_path, test_file, is_lora=False, base_model=None, output_file="eval_results.json"):
38
+ """
39
+ Run model on test samples, compute accuracy metrics.
40
+ """
41
+ print(f"Loading model from {model_path}...")
42
+ device = "cuda" if torch.cuda.is_available() else "cpu"
43
+
44
+ # Load model
45
+ if is_lora:
46
+ if not base_model:
47
+ raise ValueError("base_model is required if is_lora=True")
48
+ print(f"Loading LoRA adapter from {model_path} with base model {base_model}")
49
+ from unsloth import FastLanguageModel
50
+ model, tokenizer = FastLanguageModel.from_pretrained(
51
+ model_name = base_model,
52
+ max_seq_length = 2048,
53
+ load_in_4bit = True,
54
+ )
55
+ model = PeftModel.from_pretrained(model, model_path)
56
+ FastLanguageModel.for_inference(model)
57
+ else:
58
+ from transformers import AutoModelForCausalLM, AutoTokenizer
59
+ model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, device_map="auto")
60
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
61
+
62
+ # Load test data
63
+ print(f"Loading test data from {test_file}...")
64
+ with open(test_file, "r", encoding="utf-8") as f:
65
+ samples = [json.loads(line) for line in f if line.strip()]
66
+
67
+ results = {
68
+ "summary": {
69
+ "total": len(samples),
70
+ "correct_binary": 0,
71
+ "correct_cwe": 0,
72
+ "false_positives": 0,
73
+ "false_negatives": 0,
74
+ "binary_accuracy": 0,
75
+ "cwe_accuracy": 0,
76
+ "false_positive_rate": 0,
77
+ "false_negative_rate": 0,
78
+ "cwe_breakdown": {},
79
+ },
80
+ "predictions": [],
81
+ }
82
+
83
+ print(f"Starting evaluation on {len(samples)} samples...")
84
+ for i, sample in enumerate(samples):
85
+ prompt = format_eval_prompt(sample)
86
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
87
+
88
+ with torch.no_grad():
89
+ output = model.generate(
90
+ **inputs,
91
+ max_new_tokens=256,
92
+ temperature=0.1,
93
+ do_sample=False,
94
+ )
95
+
96
+ response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
97
+ prediction = parse_xml_action(response)
98
+
99
+ gt_vulnerable = bool(sample["is_vulnerable"])
100
+ pred_vulnerable = prediction.get("is_vulnerable", False)
101
+
102
+ correct = pred_vulnerable == gt_vulnerable
103
+ if correct:
104
+ results["summary"]["correct_binary"] += 1
105
+
106
+ if gt_vulnerable and not pred_vulnerable:
107
+ results["summary"]["false_negatives"] += 1
108
+ elif not gt_vulnerable and pred_vulnerable:
109
+ results["summary"]["false_positives"] += 1
110
+
111
+ cwe = sample.get("cwe") or "CWE-OTHER"
112
+ if cwe not in results["summary"]["cwe_breakdown"]:
113
+ results["summary"]["cwe_breakdown"][cwe] = {"total": 0, "correct": 0, "accuracy": 0}
114
+
115
+ results["summary"]["cwe_breakdown"][cwe]["total"] += 1
116
+ if correct:
117
+ results["summary"]["cwe_breakdown"][cwe]["correct"] += 1
118
+
119
+ if gt_vulnerable and correct and prediction.get("vuln_type") == cwe:
120
+ results["summary"]["correct_cwe"] += 1
121
+
122
+ results["predictions"].append({
123
+ "sample_id": sample["sample_id"],
124
+ "ground_truth": gt_vulnerable,
125
+ "predicted": pred_vulnerable,
126
+ "predicted_cwe": prediction.get("vuln_type"),
127
+ "actual_cwe": cwe,
128
+ "response": response,
129
+ })
130
+
131
+ if (i + 1) % 10 == 0:
132
+ print(f" Processed {i+1}/{len(samples)} samples...")
133
+
134
+ # Final summary stats
135
+ summary = results["summary"]
136
+ total = summary["total"]
137
+ vuln_count = sum(1 for s in samples if s["is_vulnerable"])
138
+ safe_count = total - vuln_count
139
+
140
+ summary["binary_accuracy"] = summary["correct_binary"] / total if total > 0 else 0
141
+ summary["cwe_accuracy"] = summary["correct_cwe"] / vuln_count if vuln_count > 0 else 0
142
+ summary["false_positive_rate"] = summary["false_positives"] / safe_count if safe_count > 0 else 0
143
+ summary["false_negative_rate"] = summary["false_negatives"] / vuln_count if vuln_count > 0 else 0
144
+
145
+ for cwe in summary["cwe_breakdown"]:
146
+ stats = summary["cwe_breakdown"][cwe]
147
+ stats["accuracy"] = stats["correct"] / stats["total"] if stats["total"] > 0 else 0
148
+
149
+ print(f"\nEvaluation Complete:")
150
+ print(f" Binary Accuracy: {summary['binary_accuracy']:.2%}")
151
+ print(f" CWE Accuracy: {summary['cwe_accuracy']:.2%}")
152
+ print(f" False Positives: {summary['false_positives']}")
153
+ print(f" False Negatives: {summary['false_negatives']}")
154
+
155
+ with open(output_file, "w", encoding="utf-8") as f:
156
+ json.dump(results, f, indent=2)
157
+ print(f"Results saved to {output_file}")
158
+ return results
159
+
160
+ if __name__ == "__main__":
161
+ parser = argparse.ArgumentParser()
162
+ parser.add_argument("--model-path", default="meta-llama/Llama-3.2-3B-Instruct")
163
+ parser.add_argument("--test-file", default="data/devign_test.jsonl")
164
+ parser.add_argument("--is-lora", action="store_true")
165
+ parser.add_argument("--base-model", default="meta-llama/Llama-3.2-3B-Instruct")
166
+ parser.add_argument("--output", default="eval_results.json")
167
+ args = parser.parse_args()
168
+
169
+ evaluate(args.model_path, args.test_file, args.is_lora, args.base_model, args.output)
scripts/gce_vm_runbook.md ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## GCE VM Runbook — CommitGuard GRPO Training
2
+
3
+ ### Step 1: Create VM
4
+
5
+ Run from your local machine (or use GCP Console):
6
+
7
+ ```bash
8
+ # Option A: L4 (24 GB VRAM, ~$0.70/hr) — RECOMMENDED
9
+ gcloud compute instances create commitguard-train \
10
+ --zone=us-central1-a \
11
+ --machine-type=g2-standard-8 \
12
+ --accelerator=type=nvidia-l4,count=1 \
13
+ --boot-disk-size=100GB \
14
+ --image-family=pytorch-latest-gpu \
15
+ --image-project=deeplearning-platform-release \
16
+ --maintenance-policy=TERMINATE \
17
+ --metadata="install-nvidia-driver=True"
18
+
19
+ # Option B: A100 (40 GB VRAM, ~$2.50/hr) — if L4 unavailable
20
+ gcloud compute instances create commitguard-train \
21
+ --zone=us-central1-a \
22
+ --machine-type=a2-highgpu-1g \
23
+ --accelerator=type=nvidia-tesla-a100,count=1 \
24
+ --boot-disk-size=100GB \
25
+ --image-family=pytorch-latest-gpu \
26
+ --image-project=deeplearning-platform-release \
27
+ --maintenance-policy=TERMINATE \
28
+ --metadata="install-nvidia-driver=True"
29
+
30
+ # Option C: T4 (16 GB VRAM, ~$0.35/hr) — budget fallback
31
+ gcloud compute instances create commitguard-train \
32
+ --zone=us-central1-b \
33
+ --machine-type=n1-standard-8 \
34
+ --accelerator=type=nvidia-tesla-t4,count=1 \
35
+ --boot-disk-size=100GB \
36
+ --image-family=pytorch-latest-gpu \
37
+ --image-project=deeplearning-platform-release \
38
+ --maintenance-policy=TERMINATE \
39
+ --metadata="install-nvidia-driver=True"
40
+ ```
41
+
42
+ ### Step 2: SSH into VM
43
+
44
+ ```bash
45
+ gcloud compute ssh commitguard-train --zone=us-central1-a
46
+ ```
47
+
48
+ ### Step 3: One-command setup
49
+
50
+ ```bash
51
+ curl -sSL https://raw.githubusercontent.com/NitishKumar-ai/commitguard/main/scripts/gcp_setup.sh | bash
52
+ ```
53
+
54
+ Or manually:
55
+
56
+ ```bash
57
+ git clone https://github.com/NitishKumar-ai/commitguard.git
58
+ cd commitguard
59
+ bash scripts/gcp_setup.sh
60
+ ```
61
+
62
+ ### Step 4: Start env server (in tmux)
63
+
64
+ ```bash
65
+ cd ~/commitguard && source .venv/bin/activate
66
+ tmux new -s server
67
+ server
68
+ # Ctrl-B D to detach
69
+ ```
70
+
71
+ Verify:
72
+
73
+ ```bash
74
+ curl -s http://localhost:8000/health
75
+ # → {"status":"healthy"}
76
+ ```
77
+
78
+ ### Step 5: Login to HuggingFace + Wandb
79
+
80
+ ```bash
81
+ source ~/commitguard/.venv/bin/activate
82
+ huggingface-cli login # paste your HF token (needed for Llama gated model)
83
+ wandb login # paste your wandb API key
84
+ ```
85
+
86
+ ### Step 6: Start training
87
+
88
+ ```bash
89
+ cd ~/commitguard && source .venv/bin/activate
90
+ export WANDB_PROJECT=commitguard
91
+
92
+ # Full run (~2-3 hours on L4)
93
+ python scripts/train_grpo.py \
94
+ --samples 200 \
95
+ --max-steps 300 \
96
+ --save-steps 50 \
97
+ --num-generations 4 \
98
+ --batch-size 1 \
99
+ --grad-accum 4
100
+
101
+ # Quick smoke test first (5 min)
102
+ python scripts/train_grpo.py \
103
+ --samples 20 \
104
+ --max-steps 10 \
105
+ --no-wandb
106
+ ```
107
+
108
+ ### Step 7: Monitor
109
+
110
+ ```bash
111
+ # In another tmux pane:
112
+ watch -n 30 nvidia-smi # GPU memory
113
+ # Wandb dashboard: https://wandb.ai/<your-user>/commitguard
114
+ ```
115
+
116
+ ### Step 8: Copy results back
117
+
118
+ ```bash
119
+ # From your LOCAL machine:
120
+ gcloud compute scp --recurse \
121
+ commitguard-train:~/commitguard/outputs/commitguard-llama-3b/final \
122
+ ./outputs/commitguard-llama-3b/final \
123
+ --zone=us-central1-a
124
+ ```
125
+
126
+ ### Step 9: Shut down VM
127
+
128
+ ```bash
129
+ gcloud compute instances stop commitguard-train --zone=us-central1-a
130
+ # or delete to stop billing entirely:
131
+ gcloud compute instances delete commitguard-train --zone=us-central1-a
132
+ ```
133
+
134
+ ### Cost estimate
135
+
136
+ | GPU | VRAM | $/hr | 300 steps (~3hr) |
137
+ |-----|------|------|-------------------|
138
+ | T4 | 16GB | $0.35 | ~$1.05 |
139
+ | L4 | 24GB | $0.70 | ~$2.10 |
140
+ | A100| 40GB | $2.50 | ~$7.50 |
141
+
142
+ ### Troubleshooting
143
+
144
+ - **OOM on T4**: reduce `--num-generations 2` and `--batch-size 1`
145
+ - **Llama access denied**: make sure you accepted the license at https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
146
+ - **Env server not responding**: check `tmux attach -t server` for errors
147
+ - **Wandb not logging**: verify `wandb login` succeeded, or use `--no-wandb`
148
+ - **GPU quota error**: request GPU quota increase at https://console.cloud.google.com/iam-admin/quotas
149
+
scripts/gcp_setup.sh ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ # =============================================================================
3
+ # CommitGuard — GCP VM Setup Script
4
+ # Target: GCE VM with NVIDIA L4 (24 GB) or A100 (40/80 GB)
5
+ # =============================================================================
6
+ set -euo pipefail
7
+
8
+ echo "============================================"
9
+ echo " CommitGuard GCP Training VM Setup"
10
+ echo "============================================"
11
+
12
+ # --- 1. System packages ---
13
+ sudo apt-get update -qq
14
+ sudo apt-get install -y -qq git python3-venv python3-pip tmux htop
15
+
16
+ # --- 2. NVIDIA driver check ---
17
+ if ! command -v nvidia-smi &>/dev/null; then
18
+ echo "ERROR: nvidia-smi not found. Use a GCP image with pre-installed GPU drivers:"
19
+ echo " - Deep Learning VM (recommended)"
20
+ echo " - Or install manually: sudo apt install nvidia-driver-535"
21
+ exit 1
22
+ fi
23
+ echo "GPU detected:"
24
+ nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
25
+
26
+ # --- 3. Clone repo ---
27
+ REPO_DIR="$HOME/commitguard"
28
+ if [ ! -d "$REPO_DIR" ]; then
29
+ echo "Cloning repo..."
30
+ git clone https://github.com/NitishKumar-ai/commitguard.git "$REPO_DIR"
31
+ else
32
+ echo "Repo exists, pulling latest..."
33
+ cd "$REPO_DIR" && git pull
34
+ fi
35
+ cd "$REPO_DIR"
36
+
37
+ # --- 4. Python venv ---
38
+ if [ ! -d ".venv" ]; then
39
+ python3 -m venv .venv
40
+ fi
41
+ source .venv/bin/activate
42
+ pip install -U pip setuptools wheel -q
43
+
44
+ # --- 5. Install training dependencies ---
45
+ echo "Installing training dependencies..."
46
+ pip install -e . -q
47
+
48
+ pip install \
49
+ "torch>=2.4" \
50
+ "unsloth[cu124-torch240]" \
51
+ "trl>=0.12" \
52
+ "peft>=0.13" \
53
+ "bitsandbytes>=0.44" \
54
+ "transformers>=4.46" \
55
+ "datasets>=3.0" \
56
+ "accelerate>=1.0" \
57
+ "wandb" \
58
+ "requests" \
59
+ "matplotlib" \
60
+ "jupyter" \
61
+ "ipywidgets" \
62
+ -q
63
+
64
+ echo "Verifying installs..."
65
+ python -c "
66
+ import torch, trl, unsloth, peft, wandb, bitsandbytes
67
+ print(f'PyTorch: {torch.__version__}')
68
+ print(f'CUDA: {torch.cuda.is_available()} — {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"N/A\"}')
69
+ print(f'TRL: {trl.__version__}')
70
+ print(f'PEFT: {peft.__version__}')
71
+ print(f'Wandb: {wandb.__version__}')
72
+ print('All training deps OK.')
73
+ "
74
+
75
+ echo ""
76
+ echo "============================================"
77
+ echo " Setup complete. Two options to train:"
78
+ echo "============================================"
79
+ echo ""
80
+ echo " ── OPTION A: Jupyter Notebook (recommended) ──"
81
+ echo ""
82
+ echo " # On the VM:"
83
+ echo " cd $REPO_DIR && source .venv/bin/activate"
84
+ echo " tmux new -s server -d 'source .venv/bin/activate && server'"
85
+ echo " jupyter notebook --no-browser --port=8888 --ip=0.0.0.0"
86
+ echo ""
87
+ echo " # On your LOCAL machine (new terminal):"
88
+ echo " gcloud compute ssh commitguard-train --zone=us-central1-a -- -NL 8888:localhost:8888"
89
+ echo ""
90
+ echo " # Then open in browser:"
91
+ echo " # http://localhost:8888 → notebooks/train_commitguard.ipynb"
92
+ echo ""
93
+ echo " ── OPTION B: CLI ──"
94
+ echo ""
95
+ echo " cd $REPO_DIR && source .venv/bin/activate"
96
+ echo " tmux new -s server -d 'source .venv/bin/activate && server'"
97
+ echo " huggingface-cli login"
98
+ echo " python scripts/train_grpo.py --samples 200 --max-steps 300"
99
+ echo ""
scripts/lightning_ai_runbook.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Training on Lightning AI
2
+
3
+ This guide explains how to run CommitGuard GRPO training on a Lightning AI GPU Studio.
4
+
5
+ ## Recommended Instance
6
+ - **GPU:** NVIDIA L4 (24GB) or A10G (24GB) is sufficient for Llama-3.2-3B with Unsloth 4-bit.
7
+ - **Image:** Default Linux / PyTorch images are fine; the setup script handles dependencies.
8
+
9
+ ## Setup & Train in One Step
10
+
11
+ 1. Open a terminal in your Lightning AI Studio.
12
+ 2. Run the setup script:
13
+ ```bash
14
+ bash scripts/lightning_setup.sh
15
+ ```
16
+
17
+ ## What the script does:
18
+ 1. Installs `uv` for fast dependency management.
19
+ 2. Creates a virtual environment and installs all requirements (Unsloth, TRL, etc.).
20
+ 3. Starts the `commitguard_env` server in the background (via `tmux` if available).
21
+ 4. Runs `scripts/train_grpo.py`.
22
+
23
+ ## Manual Steps (Optional)
24
+
25
+ ### 1. View Training Logs
26
+ If you want to see the environment server logs:
27
+ ```bash
28
+ tmux attach -t env_server
29
+ ```
30
+ (Press `Ctrl+B`, then `D` to detach).
31
+
32
+ ### 2. Hugging Face Integration
33
+ To save your model to the Hugging Face Hub, login before training:
34
+ ```bash
35
+ huggingface-cli login
36
+ ```
37
+
38
+ ### 3. Checkpoints
39
+ Checkpoints and the final merged LoRA adapter will be saved to:
40
+ `outputs/commitguard-llama-3b/final`
41
+
42
+ ## Troubleshooting
43
+ - **OOM Error:** If you hit Out-Of-Memory, try reducing `--batch-size` or `--num-generations` in `scripts/train_grpo.py`.
44
+ - **Server Connection:** If training fails with connection errors, ensure the server started correctly by checking `curl http://localhost:8000/health`.
scripts/lightning_setup.sh ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ # CommitGuard - Lightning AI Setup & Train
3
+ # This script prepares the environment and starts GRPO training.
4
+
5
+ set -e
6
+
7
+ echo "--- 1. Installing uv ---"
8
+ curl -LsSf https://astral.sh/uv/install.sh | sh
9
+ if [ -f "$HOME/.local/bin/env" ]; then
10
+ source "$HOME/.local/bin/env"
11
+ elif [ -f "$HOME/.cargo/env" ]; then
12
+ source "$HOME/.cargo/env"
13
+ fi
14
+ export PATH="$HOME/.local/bin:$PATH"
15
+
16
+ echo "--- 2. Setting up Workspace ---"
17
+ REPO_DIR="$HOME/commitguard"
18
+ if [ ! -d "$REPO_DIR" ]; then
19
+ echo "Cloning repo..."
20
+ git clone https://github.com/NitishKumar-ai/commitguard "$REPO_DIR"
21
+ fi
22
+ cd "$REPO_DIR"
23
+
24
+ echo "--- 3. Setting up Virtual Env ---"
25
+ if [ ! -d ".venv" ]; then
26
+ uv venv
27
+ fi
28
+ source .venv/bin/activate
29
+
30
+ echo "--- 4. Installing Dependencies ---"
31
+ uv sync --all-extras
32
+
33
+ echo "--- 5. Starting Environment Server ---"
34
+ # Use tmux to keep the server running in the background
35
+ if command -v tmux >/dev/null; then
36
+ tmux new -s env_server -d "source .venv/bin/activate && python -m commitguard_env.server"
37
+ else
38
+ python -m commitguard_env.server &
39
+ SERVER_PID=$!
40
+ fi
41
+
42
+ echo "Waiting for server to be healthy..."
43
+ max_retries=30
44
+ count=0
45
+ until $(curl --output /dev/null --silent --head --fail http://localhost:8000/health); do
46
+ printf '.'
47
+ sleep 2
48
+ count=$((count+1))
49
+ if [ $count -eq $max_retries ]; then
50
+ echo "Server failed to start."
51
+ exit 1
52
+ fi
53
+ done
54
+ echo "Server is healthy!"
55
+
56
+ echo "--- 5. Starting GRPO Training ---"
57
+ # Defaults: 200 samples, 300 steps.
58
+ # Increase samples for better stability, decrease for faster iteration.
59
+ python scripts/train_grpo.py --samples 200 --max-steps 300
60
+
61
+ echo "Training session finished."
scripts/plot_results.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import matplotlib.pyplot as plt
2
+ import json
3
+ import os
4
+ import argparse
5
+
6
+ def plot_reward_curve(wandb_data_path, output_path="plots/reward_curve.png"):
7
+ """
8
+ Plots the training reward curve.
9
+ Expects a JSON file with 'step' and 'reward' keys (exported from Wandb).
10
+ """
11
+ if not os.path.exists(wandb_data_path):
12
+ print(f"Skipping: {wandb_data_path} not found.")
13
+ return
14
+
15
+ with open(wandb_data_path, "r") as f:
16
+ data = json.load(f)
17
+
18
+ steps = [d["step"] for d in data]
19
+ rewards = [d["reward"] for d in data]
20
+
21
+ plt.figure(figsize=(10, 6))
22
+ plt.plot(steps, rewards, label="GRPO Reward", color="#2ecc71", linewidth=2)
23
+ plt.xlabel("Training Step")
24
+ plt.ylabel("Mean Reward")
25
+ plt.title("CommitGuard — GRPO Training Reward Curve")
26
+ plt.grid(True, linestyle="--", alpha=0.7)
27
+ plt.legend()
28
+ plt.savefig(output_path)
29
+ print(f"Saved: {output_path}")
30
+
31
+ def plot_accuracy_comparison(baseline_acc, trained_acc, output_path="plots/baseline_vs_trained.png"):
32
+ """
33
+ Plots a bar chart comparing baseline vs trained accuracy.
34
+ """
35
+ labels = ['Baseline (Untrained)', 'CommitGuard (Trained)']
36
+ accuracies = [baseline_acc, trained_acc]
37
+ colors = ['#95a5a6', '#3498db']
38
+
39
+ plt.figure(figsize=(8, 6))
40
+ bars = plt.bar(labels, accuracies, color=colors)
41
+ plt.ylabel("Detection Accuracy (%)")
42
+ plt.title("Vulnerability Detection: Baseline vs. Trained")
43
+ plt.ylim(0, 100)
44
+
45
+ for bar in bars:
46
+ height = bar.get_height()
47
+ plt.text(bar.get_x() + bar.get_width()/2., height + 1,
48
+ f'{height}%', ha='center', va='bottom', fontweight='bold')
49
+
50
+ plt.savefig(output_path)
51
+ print(f"Saved: {output_path}")
52
+
53
+ def plot_per_cwe_breakdown(cwe_data, output_path="plots/per_cwe.png"):
54
+ """
55
+ Plots a grouped bar chart for per-CWE improvement.
56
+ cwe_data format: {"CWE-89": [baseline, trained], "CWE-119": [baseline, trained], ...}
57
+ """
58
+ cwes = list(cwe_data.keys())
59
+ baseline_vals = [v[0] for v in cwe_data.values()]
60
+ trained_vals = [v[1] for v in cwe_data.values()]
61
+
62
+ x = range(len(cwes))
63
+ width = 0.35
64
+
65
+ fig, ax = plt.subplots(figsize=(12, 6))
66
+ ax.bar([i - width/2 for i in x], baseline_vals, width, label='Baseline', color='#95a5a6')
67
+ ax.bar([i + width/2 for i in x], trained_vals, width, label='Trained', color='#e67e22')
68
+
69
+ ax.set_ylabel('Accuracy (%)')
70
+ ax.set_title('Detection Accuracy by CWE Type')
71
+ ax.set_xticks(x)
72
+ ax.set_xticklabels(cwes, rotation=45)
73
+ ax.legend()
74
+ ax.set_ylim(0, 100)
75
+
76
+ plt.tight_layout()
77
+ plt.savefig(output_path)
78
+ print(f"Saved: {output_path}")
79
+
80
+ if __name__ == "__main__":
81
+ parser = argparse.ArgumentParser()
82
+ parser.add_argument("--mode", choices=["reward", "accuracy", "cwe", "all"], default="all")
83
+ args = parser.parse_args()
84
+
85
+ os.makedirs("plots", exist_ok=True)
86
+
87
+ # Example usage for morning shift:
88
+ if args.mode in ["reward", "all"]:
89
+ plot_reward_curve("plots/wandb_simulated.json")
90
+
91
+ if args.mode in ["accuracy", "all"]:
92
+ # Placeholder numbers (to be updated by Divyank/Deepak's eval)
93
+ plot_accuracy_comparison(baseline_acc=32, trained_acc=68)
94
+
95
+ if args.mode in ["cwe", "all"]:
96
+ # Placeholder data
97
+ cwe_data = {
98
+ "CWE-89": [40, 85],
99
+ "CWE-119": [30, 60],
100
+ "CWE-79": [25, 70],
101
+ "CWE-20": [35, 55]
102
+ }
103
+ plot_per_cwe_breakdown(cwe_data)
scripts/preprocess_devign.py ADDED
@@ -0,0 +1,277 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import json
3
+ import random
4
+ from collections import Counter
5
+ from pathlib import Path
6
+
7
+
8
+ def _read_jsonl(path: Path) -> list[dict]:
9
+ rows = []
10
+ for line in path.read_text(encoding="utf-8").splitlines():
11
+ line = line.strip()
12
+ if not line:
13
+ continue
14
+ rows.append(json.loads(line))
15
+ return rows
16
+
17
+
18
+ def _write_jsonl(path: Path, rows: list[dict]) -> None:
19
+ path.parent.mkdir(parents=True, exist_ok=True)
20
+ with path.open("w", encoding="utf-8", newline="\n") as f:
21
+ for r in rows:
22
+ f.write(json.dumps(r, ensure_ascii=False) + "\n")
23
+
24
+
25
+ # ---------------------------------------------------------------------------
26
+ # Fix 2: CWE classification using vulnerable lines, not the whole function.
27
+ # Scored rules — highest-scoring match wins. Falls back to CWE-OTHER.
28
+ # ---------------------------------------------------------------------------
29
+
30
+ _CWE_RULES: list[tuple[str, list[str], int]] = [
31
+ ("CWE-119", ["memcpy", "strcpy", "strcat", "strncpy", "memmove", "sprintf",
32
+ "gets(", "buffer", "overflow", "oob", "av_malloc", "av_realloc",
33
+ "realloc", "malloc", "alloc", "g_malloc", "g_realloc",
34
+ "qemu_malloc", "len ", "length", "copy_from", "copy_to"], 5),
35
+ ("CWE-476", ["null", "nullptr", "!= null", "== null", "if (!",
36
+ "dereference", "segfault", "!obj", "!ctx", "!s->", "!p"], 5),
37
+ ("CWE-189", ["integer overflow", "signedness", "truncat", "wrap",
38
+ "size_t", "underflow", "narrowing", "(int)", "(uint",
39
+ "(unsigned)", ">> ", "<< ", "0xffff", "max_", "min_"], 5),
40
+ ("CWE-78", ["system(", "popen(", "exec(", "execve", "shell",
41
+ "command", "subprocess"], 8),
42
+ ("CWE-22", ["../", "..\\", "traversal", "chroot", "realpath",
43
+ "canonicalize", "symlink", "path"], 7),
44
+ ("CWE-89", ["sql", "query", "select ", "insert ", "union ",
45
+ "prepared", "sqlite", "mysql"], 7),
46
+ ("CWE-79", ["xss", "innerhtml", "script", "sanitize", "escape",
47
+ "htmlentit", "content-type"], 6),
48
+ ("CWE-20", ["valid", "saniti", "untrusted", "input", "bounds",
49
+ "assert", "range", "check", "error", "return -1",
50
+ "goto fail", "goto err", "goto out"], 2),
51
+ ]
52
+
53
+
54
+ def infer_cwe(vul_lines_code: list[str], func: str) -> str:
55
+ vul_text = " ".join(vul_lines_code).lower() if vul_lines_code else ""
56
+ func_text = func.lower()
57
+
58
+ best_cwe, best_score = "CWE-OTHER", 0
59
+
60
+ for cwe, keywords, weight in _CWE_RULES:
61
+ vul_hits = sum(1 for k in keywords if k in vul_text) if vul_text else 0
62
+ func_hits = sum(1 for k in keywords if k in func_text)
63
+ score = vul_hits * weight + func_hits * (weight // 2)
64
+ if score > best_score:
65
+ best_cwe, best_score = cwe, score
66
+
67
+ if best_score < 2:
68
+ return "CWE-OTHER"
69
+ return best_cwe
70
+
71
+
72
+ # ---------------------------------------------------------------------------
73
+ # Fix 1: Real unified diffs from per-line vulnerability labels.
74
+ # ---------------------------------------------------------------------------
75
+
76
+ def _build_diff(func: str, label: list[int], rng: random.Random, is_vuln: bool) -> str:
77
+ lines = func.splitlines()
78
+
79
+ if is_vuln and label and len(label) == len(lines):
80
+ changed_indices = {i for i, l in enumerate(label) if l == 1}
81
+ elif is_vuln and label and any(l == 1 for l in label):
82
+ changed_indices = {i for i, l in enumerate(label) if l == 1}
83
+ else:
84
+ block_size = max(1, min(5, len(lines) // 4))
85
+ start = rng.randint(0, max(0, len(lines) - block_size))
86
+ changed_indices = set(range(start, min(start + block_size, len(lines))))
87
+
88
+ if not changed_indices:
89
+ changed_indices = {0}
90
+
91
+ ctx = 3
92
+ visible: set[int] = set()
93
+ for ci in changed_indices:
94
+ for offset in range(-ctx, ctx + 1):
95
+ idx = ci + offset
96
+ if 0 <= idx < len(lines):
97
+ visible.add(idx)
98
+
99
+ sorted_visible = sorted(visible)
100
+ hunks: list[list[int]] = []
101
+ current_hunk: list[int] = []
102
+ for idx in sorted_visible:
103
+ if current_hunk and idx > current_hunk[-1] + 1:
104
+ hunks.append(current_hunk)
105
+ current_hunk = [idx]
106
+ else:
107
+ current_hunk.append(idx)
108
+ if current_hunk:
109
+ hunks.append(current_hunk)
110
+
111
+ diff_parts = ["--- a/source.c", "+++ b/source.c"]
112
+ for hunk in hunks:
113
+ start_line = hunk[0] + 1
114
+ hunk_size = len(hunk)
115
+ diff_parts.append(f"@@ -{start_line},{hunk_size} +{start_line},{hunk_size} @@")
116
+ for idx in hunk:
117
+ line = lines[idx]
118
+ if idx in changed_indices:
119
+ diff_parts.append(f"+{line}")
120
+ else:
121
+ diff_parts.append(f" {line}")
122
+
123
+ return "\n".join(diff_parts)
124
+
125
+
126
+ # ---------------------------------------------------------------------------
127
+ # Fix 3: CWE rebalancing — cap dominant CWEs, merge tiny ones.
128
+ # ---------------------------------------------------------------------------
129
+
130
+ _MAX_PER_CWE_FRAC = 0.25
131
+ _MIN_CWE_SAMPLES = 20
132
+
133
+
134
+ def _rebalance(samples: list[dict], rng: random.Random, limit: int) -> list[dict]:
135
+ by_cwe: dict[str, list[dict]] = {}
136
+ for s in samples:
137
+ by_cwe.setdefault(s["cwe"] or "CWE-OTHER", []).append(s)
138
+
139
+ for cwe, items in list(by_cwe.items()):
140
+ if len(items) < _MIN_CWE_SAMPLES and cwe != "CWE-OTHER":
141
+ by_cwe.setdefault("CWE-OTHER", []).extend(items)
142
+ for item in items:
143
+ item["cwe"] = "CWE-OTHER"
144
+ del by_cwe[cwe]
145
+
146
+ cap = int(limit * _MAX_PER_CWE_FRAC)
147
+ kept: list[dict] = []
148
+ for cwe, items in by_cwe.items():
149
+ rng.shuffle(items)
150
+ kept.extend(items[:cap])
151
+
152
+ rng.shuffle(kept)
153
+ return kept[:limit]
154
+
155
+
156
+ def main() -> None:
157
+ ap = argparse.ArgumentParser(description="Preprocess Devign-derived samples into CommitGuard JSONL.")
158
+ ap.add_argument("--in", dest="inp", type=Path, default=None, help="Optional input JSONL.")
159
+ ap.add_argument("--out", dest="out", type=Path, default=Path("data/devign_filtered.jsonl"))
160
+ ap.add_argument("--test-out", dest="test_out", type=Path, default=Path("data/devign_test.jsonl"))
161
+ ap.add_argument("--limit", type=int, default=5000)
162
+ ap.add_argument("--test-limit", type=int, default=100)
163
+ ap.add_argument("--seed", type=int, default=42)
164
+ args = ap.parse_args()
165
+
166
+ rng = random.Random(args.seed)
167
+
168
+ if args.inp is None:
169
+ try:
170
+ from datasets import load_dataset
171
+ print("Loading DetectVul/devign from Hugging Face...")
172
+ ds = load_dataset('DetectVul/devign', split='train')
173
+ raw_rows = list(ds)
174
+ print(f"Loaded {len(raw_rows)} rows from HF.")
175
+ except Exception as e:
176
+ print(f"Failed to load from HF: {e}")
177
+ return
178
+ else:
179
+ raw_rows = _read_jsonl(args.inp)
180
+
181
+ all_samples: list[dict] = []
182
+
183
+ # Process all rows first
184
+ seen_ids = set()
185
+ for i, r in enumerate(raw_rows):
186
+ func = r.get("func")
187
+ if not func:
188
+ continue
189
+ if len(func.split("\n")) > 80:
190
+ continue
191
+
192
+ target = bool(r.get("target", False))
193
+ label = r.get("label", [])
194
+ vul_lines_code = []
195
+ vl = r.get("vul_lines")
196
+ if vl and isinstance(vl, dict):
197
+ vul_lines_code = vl.get("code", [])
198
+
199
+ cwe = infer_cwe(vul_lines_code, func) if target else None
200
+ diff = _build_diff(func, label, rng, target)
201
+
202
+ # Ensure unique sample_id
203
+ original_id = str(r.get("commit_id") or r.get("id") or f"row-{i}")
204
+ sample_id = original_id
205
+ suffix = 0
206
+ while sample_id in seen_ids:
207
+ suffix += 1
208
+ sample_id = f"{original_id}_{suffix}"
209
+ seen_ids.add(sample_id)
210
+
211
+ target_file = "source.c"
212
+
213
+ sample = {
214
+ "sample_id": sample_id,
215
+ "diff": diff,
216
+ "available_files": [target_file],
217
+ "is_vulnerable": target,
218
+ "cwe": cwe,
219
+ "target_file": target_file,
220
+ "files": {target_file: func},
221
+ }
222
+ all_samples.append(sample)
223
+
224
+ print(f"Total processed samples: {len(all_samples)}")
225
+
226
+ # Shuffle and split to ensure NO overlap
227
+ rng.shuffle(all_samples)
228
+
229
+ # We want to ensure test set has all CWEs if possible
230
+ # Let's pick test set first by picking a few from each CWE
231
+ test_samples: list[dict] = []
232
+
233
+ vuln_all = [s for s in all_samples if s["is_vulnerable"]]
234
+ safe_all = [s for s in all_samples if not s["is_vulnerable"]]
235
+
236
+ by_cwe: dict[str, list[dict]] = {}
237
+ for s in vuln_all:
238
+ by_cwe.setdefault(s["cwe"] or "CWE-OTHER", []).append(s)
239
+
240
+ # Try to pick 5 from each CWE for test set
241
+ for cwe in by_cwe:
242
+ test_samples.extend(by_cwe[cwe][:5])
243
+ by_cwe[cwe] = by_cwe[cwe][5:]
244
+
245
+ # Fill the rest of test set with random samples (half vuln, half safe)
246
+ remaining_vuln = [s for items in by_cwe.values() for s in items]
247
+ needed_vuln = (args.test_limit // 2) - sum(1 for s in test_samples if s["is_vulnerable"])
248
+ if needed_vuln > 0:
249
+ test_samples.extend(remaining_vuln[:needed_vuln])
250
+ remaining_vuln = remaining_vuln[needed_vuln:]
251
+
252
+ needed_safe = args.test_limit - len(test_samples)
253
+ test_samples.extend(safe_all[:needed_safe])
254
+ safe_all = safe_all[needed_safe:]
255
+
256
+ # Now remaining samples go to train
257
+ train_pool_vuln = remaining_vuln
258
+ train_pool_safe = safe_all
259
+
260
+ print(f"Test set: {len(test_samples)} samples")
261
+ _write_jsonl(args.test_out, test_samples)
262
+
263
+ # Rebalance training set
264
+ target_each = args.limit // 2
265
+ vuln_keep = _rebalance(train_pool_vuln, rng, target_each)
266
+ safe_keep = rng.sample(train_pool_safe, min(target_each, len(train_pool_safe)))
267
+
268
+ train_rows = vuln_keep + safe_keep
269
+ rng.shuffle(train_rows)
270
+
271
+ _write_jsonl(args.out, train_rows)
272
+
273
+ print(f"Wrote {len(train_rows)} training samples to {args.out}")
274
+ print(f"Wrote {len(test_samples)} test samples to {args.test_out}")
275
+
276
+ if __name__ == "__main__":
277
+ main()
scripts/run_and_plot_baseline.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import argparse
4
+ import json
5
+ from pathlib import Path
6
+ import sys
7
+
8
+
9
+ def main() -> None:
10
+ ap = argparse.ArgumentParser(description="Run a tiny baseline and save a reward-curve PNG.")
11
+ ap.add_argument("--episodes", type=int, default=200)
12
+ ap.add_argument("--out-dir", type=Path, default=Path("plots"))
13
+ args = ap.parse_args()
14
+
15
+ # Allow running from a fresh clone without `pip install -e .`.
16
+ repo_root = Path(__file__).resolve().parent.parent
17
+ sys.path.insert(0, str(repo_root))
18
+
19
+ # Local, in-process baseline (no server needed).
20
+ from commitguard_env.environment import CommitGuardEnvironment
21
+ from commitguard_env.models import CommitGuardAction
22
+
23
+ data_path = repo_root / "data" / "devign_filtered.jsonl"
24
+ env = CommitGuardEnvironment(data_path=data_path)
25
+
26
+ rewards: list[float] = []
27
+ for _ in range(args.episodes):
28
+ _ = env.reset()
29
+ # Naive always-vulnerable verdict baseline (intentionally dumb).
30
+ action = CommitGuardAction(
31
+ action_type="verdict",
32
+ is_vulnerable=True,
33
+ vuln_type="CWE-89",
34
+ exploit_sketch="sql select where concat injection",
35
+ )
36
+ _obs, reward, _done = env.step(action)
37
+ rewards.append(float(reward))
38
+
39
+ args.out_dir.mkdir(parents=True, exist_ok=True)
40
+ (args.out_dir / "baseline_rewards.json").write_text(json.dumps(rewards), encoding="utf-8")
41
+
42
+ import matplotlib.pyplot as plt
43
+
44
+ plt.figure(figsize=(8, 4))
45
+ plt.plot(rewards, linewidth=1)
46
+ plt.title("CommitGuard baseline reward curve (naive always-vulnerable)")
47
+ plt.xlabel("Episode")
48
+ plt.ylabel("Reward")
49
+ plt.tight_layout()
50
+ plt.savefig(args.out_dir / "baseline_reward_curve.png", dpi=180)
51
+
52
+
53
+ if __name__ == "__main__":
54
+ main()
55
+
scripts/train_grpo.py ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ import json
4
+ import argparse
5
+ from pathlib import Path
6
+
7
+ import torch
8
+ import wandb
9
+ from datasets import Dataset, load_dataset
10
+ from trl import GRPOConfig, GRPOTrainer
11
+ from unsloth import FastLanguageModel, PatchFastRL
12
+
13
+ sys.path.insert(0, str(Path(__file__).resolve().parent))
14
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
15
+ from agent_prompt import SYSTEM_PROMPT, get_agent_prompt
16
+ from commitguard_env.parse_action import parse_action
17
+ from commitguard_env.reward import compute_reward
18
+
19
+ PatchFastRL("GRPO", FastLanguageModel)
20
+
21
+ # --- Configuration ---
22
+ MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.2-3B-Instruct")
23
+ OUTPUT_DIR = os.getenv("OUTPUT_DIR", "outputs/commitguard-llama-3b-grpo")
24
+ WANDB_PROJECT = os.getenv("WANDB_PROJECT", "commitguard")
25
+
26
+ REPO_ROOT = Path(__file__).resolve().parent.parent
27
+ CWE_KEYWORDS_PATH = REPO_ROOT / "data" / "cwe_keywords.json"
28
+ CWE_KEYWORDS: dict[str, list[str]] = {}
29
+ if CWE_KEYWORDS_PATH.exists():
30
+ CWE_KEYWORDS = json.loads(CWE_KEYWORDS_PATH.read_text(encoding="utf-8"))
31
+
32
+ # Pre-built lookup: sample_id -> ground truth fields (loaded in build_dataset)
33
+ SAMPLE_LABELS: dict[str, dict] = {}
34
+
35
+
36
+ # --- Local reward: no HTTP, no latency ---
37
+ def get_reward_local(prompts, completions, sample_id, **kwargs) -> list[float]:
38
+ rewards = []
39
+ for p_id, completion in zip(sample_id, completions):
40
+ text = completion[-1]["content"] if isinstance(completion, list) else str(completion)
41
+ action = parse_action(text)
42
+ labels = SAMPLE_LABELS.get(p_id, {})
43
+ reward = compute_reward(
44
+ action=action,
45
+ is_vulnerable=labels.get("is_vulnerable"),
46
+ cwe=labels.get("cwe"),
47
+ target_file=labels.get("target_file"),
48
+ cwe_keywords=CWE_KEYWORDS,
49
+ context_requests=0,
50
+ )
51
+ rewards.append(reward)
52
+ return rewards
53
+
54
+
55
+ def format_prompt(sample):
56
+ # Using the Llama-3.2 prompt template from the plan
57
+ return {
58
+ "prompt": [
59
+ {"role": "system", "content": SYSTEM_PROMPT},
60
+ {"role": "user", "content": f"Analyze this commit and submit your verdict.\n\nCode diff:\n```diff\n{sample['diff']}\n```"},
61
+ ],
62
+ "sample_id": sample["sample_id"],
63
+ }
64
+
65
+
66
+ def build_dataset(n_samples: int) -> Dataset:
67
+ data_path = REPO_ROOT / "data" / "devign_filtered.jsonl"
68
+ if not data_path.exists():
69
+ print(f"Dataset file {data_path} not found.")
70
+ return Dataset.from_list([])
71
+
72
+ print(f"Loading training samples from {data_path}...")
73
+ raw_dataset = load_dataset("json", data_files=str(data_path), split="train")
74
+ raw_dataset = raw_dataset.select(range(min(n_samples, len(raw_dataset))))
75
+
76
+ for row in raw_dataset:
77
+ sid = row["sample_id"]
78
+ SAMPLE_LABELS[sid] = {
79
+ "is_vulnerable": row.get("is_vulnerable"),
80
+ "cwe": row.get("cwe"),
81
+ "target_file": row.get("target_file"),
82
+ }
83
+
84
+ dataset = raw_dataset.map(format_prompt)
85
+ print(f"Loaded {len(dataset)} samples ({len(SAMPLE_LABELS)} labels cached in-process).")
86
+ return dataset
87
+
88
+
89
+ def main():
90
+ ap = argparse.ArgumentParser()
91
+ ap.add_argument("--samples", type=int, default=200)
92
+ ap.add_argument("--max-steps", type=int, default=300)
93
+ ap.add_argument("--save-steps", type=int, default=50)
94
+ ap.add_argument("--num-generations", type=int, default=8)
95
+ ap.add_argument("--batch-size", type=int, default=1)
96
+ ap.add_argument("--grad-accum", type=int, default=4)
97
+ ap.add_argument("--lr", type=float, default=5e-6)
98
+ ap.add_argument("--no-wandb", action="store_true")
99
+ ap.add_argument("--push-to-hub", action="store_true")
100
+ ap.add_argument("--hub-model-id", type=str, default="inmodel-labs/commitguard-llama-3b")
101
+ args = ap.parse_args()
102
+
103
+ if not args.no_wandb:
104
+ wandb.init(project=WANDB_PROJECT, name=f"grpo-{MODEL_NAME.split('/')[-1]}-run1")
105
+
106
+ # 1. Load Model
107
+ print(f"Loading {MODEL_NAME} with Unsloth 4-bit...")
108
+ model, tokenizer = FastLanguageModel.from_pretrained(
109
+ model_name=MODEL_NAME,
110
+ max_seq_length=2048,
111
+ load_in_4bit=True,
112
+ fast_inference=True,
113
+ max_lora_rank=16,
114
+ )
115
+
116
+ model = FastLanguageModel.get_peft_model(
117
+ model,
118
+ r=8,
119
+ target_modules=[
120
+ "q_proj", "k_proj", "v_proj", "o_proj",
121
+ "gate_proj", "up_proj", "down_proj",
122
+ ],
123
+ lora_alpha=16,
124
+ lora_dropout=0,
125
+ bias="none",
126
+ use_gradient_checkpointing="unsloth",
127
+ random_state=3407,
128
+ )
129
+
130
+ # 2. Build dataset
131
+ dataset = build_dataset(args.samples)
132
+
133
+ # 3. GRPO config
134
+ training_args = GRPOConfig(
135
+ output_dir=OUTPUT_DIR,
136
+ num_generations=args.num_generations,
137
+ max_completion_length=256,
138
+ per_device_train_batch_size=args.batch_size,
139
+ gradient_accumulation_steps=args.grad_accum,
140
+ learning_rate=args.lr,
141
+ logging_steps=1,
142
+ save_steps=args.save_steps,
143
+ max_steps=args.max_steps,
144
+ report_to="none" if args.no_wandb else "wandb",
145
+ bf16=torch.cuda.is_bf16_supported(),
146
+ fp16=not torch.cuda.is_bf16_supported(),
147
+ )
148
+
149
+ # 4. Train
150
+ trainer = GRPOTrainer(
151
+ model=model,
152
+ processing_class=tokenizer,
153
+ reward_funcs=[get_reward_local],
154
+ args=training_args,
155
+ train_dataset=dataset,
156
+ )
157
+
158
+ print("Starting GRPO training...")
159
+ trainer.train()
160
+
161
+ # 5. Save
162
+ final_dir = f"{OUTPUT_DIR}/final"
163
+ model.save_pretrained_merged(final_dir, tokenizer, save_method="lora")
164
+ print(f"Training complete. LoRA adapter saved to {final_dir}")
165
+
166
+ if args.push_to_hub:
167
+ print(f"Pushing to HF Hub: {args.hub_model_id}")
168
+ model.push_to_hub(args.hub_model_id, token=True)
169
+ tokenizer.push_to_hub(args.hub_model_id, token=True)
170
+
171
+
172
+ if __name__ == "__main__":
173
+ main()
scripts/verify_3_action_loop.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import requests
2
+ import json
3
+ import sys
4
+
5
+ def test_loop():
6
+ base_url = "http://localhost:8000"
7
+
8
+ print("--- Phase 1: Reset ---")
9
+ r = requests.post(f"{base_url}/reset")
10
+ if r.status_code != 200:
11
+ print(f"FAILED: Reset returned {r.status_code}")
12
+ return
13
+ data = r.json()
14
+ print(f"Full response keys: {list(data.keys())}")
15
+ obs = data["observation"]
16
+ print(f"Observation value: {obs}")
17
+ episode_id = obs["episode_id"]
18
+ print(f"Observation keys: {list(obs.keys())}")
19
+ print(f"Episode ID: {episode_id}")
20
+ print(f"Diff length: {len(obs['diff'])}")
21
+
22
+ # Verify no leak
23
+ forbidden = ["is_vulnerable", "cwe", "cwe_type", "label"]
24
+ for f in forbidden:
25
+ if f in obs:
26
+ print(f"CRITICAL LEAK: '{f}' found in observation!")
27
+ sys.exit(1)
28
+
29
+ print("\n--- Phase 2: Action 'request_context' ---")
30
+ # Using the first available file if any
31
+ file_to_req = obs["available_files"][0] if obs["available_files"] else "unknown.c"
32
+ action = {
33
+ "action": f"<action><action_type>request_context</action_type><file_path>{file_to_req}</file_path></action>"
34
+ }
35
+ r = requests.post(f"{base_url}/step", json=action)
36
+ res = r.json()
37
+ print(f"Status: {r.status_code}, Reward: {res['reward']}, Done: {res['done']}")
38
+ print(f"Context snippets returned: {len(res['observation'].get('context_snippets', []))}")
39
+
40
+ print("\n--- Phase 3: Action 'analyze' ---")
41
+ action = {
42
+ "action": "<action><action_type>analyze</action_type><reasoning>Thinking about the pointer arithmetic in the diff...</reasoning></action>"
43
+ }
44
+ r = requests.post(f"{base_url}/step", json=action)
45
+ res = r.json()
46
+ print(f"Status: {r.status_code}, Reward: {res['reward']}, Done: {res['done']}")
47
+
48
+ print("\n--- Phase 4: Action 'verdict' ---")
49
+ action = {
50
+ "action": "<action><action_type>verdict</action_type><is_vulnerable>true</is_vulnerable><vuln_type>CWE-119</vuln_type><exploit_sketch>buffer overflow via unchecked memcpy</exploit_sketch></action>"
51
+ }
52
+ r = requests.post(f"{base_url}/step", json=action)
53
+ res = r.json()
54
+ print(f"Status: {r.status_code}, Reward: {res['reward']}, Done: {res['done']}")
55
+ print(f"Final Info: {res.get('info', 'No info')}")
56
+
57
+ print("\n--- Phase 5: Verify State (No Leaks) ---")
58
+ r = requests.get(f"{base_url}/state")
59
+ data = r.json()
60
+ state = data["state"]
61
+ print(f"State Episode ID: {state['episode_id']}")
62
+ print(f"Step Count: {state['step_count']}")
63
+ for f in forbidden:
64
+ if f in state:
65
+ # state() is allowed internal metadata, but the PRD says it shouldn't leak to agent.
66
+ # environment.py says: "state() must not leak labels; returning empty is fine"
67
+ print(f"LEAK WARNING: '{f}' found in state output!")
68
+
69
+ if __name__ == "__main__":
70
+ test_loop()
server/__init__.py ADDED
File without changes
server/app.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ from commitguard_env.server import app, main as server_main
2
+
3
+ def main():
4
+ server_main()
5
+
6
+ if __name__ == "__main__":
7
+ main()