File size: 6,592 Bytes
95cbc5b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
---
title: CommitGuard
emoji: πŸ›‘οΈ
colorFrom: indigo
colorTo: red
sdk: docker
pinned: false
---

# CommitGuard

CommitGuard is an OpenEnv environment for **AI-paced professional security review**. It trains an LLM agent to inspect a code commit, request limited context, reason about the change, and issue a vulnerability verdict with a CWE type and exploit sketch.

Primary hackathon theme: **Theme #3.1 - World Modeling / Professional Tasks**.  
Secondary theme: **Theme #2 - Long-Horizon Planning & Instruction Following**.

## Problem

AI coding agents now write and ship code much faster than traditional security review cycles can handle. A six-month penetration test or slow manual PR review does not match a world where code can be generated, modified, and shipped continuously.

CommitGuard turns commit-time security review into a trainable environment: the agent sees a partially observable code diff, spends a limited investigation budget, and earns verifiable rewards for correctly identifying vulnerabilities.

## Environment

Each episode is a single commit-level investigation.

1. `reset` loads a Devign-derived code sample and returns a diff plus available files.
2. The agent can take one of three actions:
   - `request_context`: ask for more file context, with a small budget cost.
   - `analyze`: write intermediate reasoning for traceability.
   - `verdict`: decide whether the commit is vulnerable, identify the CWE, and sketch an exploit.
3. `step` returns the next observation, scalar reward, and done flag.
4. `state` returns episode metadata without leaking labels.

The agent never sees ground truth labels. Ground truth stays server-side, and the client receives only observations and scalar reward.

## Reward

CommitGuard uses dataset-grounded RLVR-style rewards, not an LLM judge.

| Signal | Reward |
|---|---:|
| Correct vulnerable/safe verdict | +1.0 |
| Correct CWE classification | up to +0.5 |
| Plausible exploit sketch keyword match | up to +0.5 |
| False positive | -1.0 |
| False negative | -0.5 |
| Extra context requests | -0.05 each after the first |
| Malformed action | -0.5 |

This makes the task harder than static classification: the agent must manage investigation budget and produce structured, parseable actions.

Naive baseline strategies (always_vuln, always_safe, random) achieve near-zero precision, recall, and F1 β€” confirming no trivial strategy can game the reward signal.

![Baseline evaluation metrics](plots/readme_eval_baselines.gif)

## Results

We evaluated a baseline against the trained agent on 100 held-out samples.

| Run | Correct | Accuracy |
|---|---:|---:|
| Baseline | 50 / 100 | 50% |
| Trained | 74 / 100 | 74% |

![Vulnerability detection baseline vs trained](plots/readme_baseline_vs_trained.gif)

Cumulative mean reward across 500 episodes shows all naive strategies (always_vuln, always_safe, random) plateau at low reward, while the trained agent learns to do better.

![Cumulative mean reward by strategy](plots/readme_cumulative_mean_reward.gif)

The trained agent improves over the baseline on held-out commit-level vulnerability detection.

Per-CWE accuracy shows the trained agent outperforms the baseline across all four vulnerability families (CWE-89, CWE-119, CWE-79, CWE-20).

![Per-CWE breakdown](plots/readme_per_cwe.gif)

## Training

The judge-runnable training path is the Colab-ready notebook:

- [Training notebook](notebooks/train_commitguard.ipynb)

The script path is also available:

```bash
python scripts/train_grpo.py \
  --env-url https://nitishkumar-ai-commitguard-env.hf.space \
  --samples 200 \
  --max-steps 300 \
  --num-generations 4 \
  --batch-size 1 \
  --grad-accum 4
```

If `--env-url` or `COMMITGUARD_ENV_URL` is set, the training script scores completions through the running CommitGuard environment. Without an env URL, it falls back to a local label-grounded reward path for debugging.

The reward curve below shows the naive always-vulnerable baseline β€” flat and penalized β€” which the trained agent must surpass. Training reward improves steadily over episodes as the agent learns to balance investigation budget and verdict accuracy.

![Baseline reward curve](plots/readme_baseline_reward_curve.gif)

![GRPO training reward curve](plots/readme_grpo_reward_curve.gif)

## Links

- **Hugging Face Space:** [Nitishkumar-ai/commitguard-env](https://huggingface.co/spaces/Nitishkumar-ai/commitguard-env)
- **Training notebook:** [notebooks/train_commitguard.ipynb](notebooks/train_commitguard.ipynb)
- **Mini-blog / short writeup:** [commitguard_hf_blog.md](commitguard_hf_blog.md)
- **Trained model target:** [inmodel-labs/commitguard-llama-3b](https://huggingface.co/inmodel-labs/commitguard-llama-3b)
- **GCE training runbook:** [scripts/gce_vm_runbook.md](scripts/gce_vm_runbook.md)

## Project Structure

```text
commitguard/
β”œβ”€β”€ commitguard_env/    # Core logic (environment, server, model)
β”œβ”€β”€ docs/               # Detailed documentation and guides
β”œβ”€β”€ data/               # Devign-derived datasets
β”œβ”€β”€ scripts/            # Training and evaluation entrypoints
β”œβ”€β”€ results/            # Evaluation artifacts and JSON reports
β”œβ”€β”€ notebooks/          # Interactive training notebooks
β”œβ”€β”€ plots/              # Visualization artifacts
β”œβ”€β”€ tests/              # Comprehensive test suite
└── configs/            # Configuration files
```

## Quickstart

Install locally:

```bash
python -m pip install -e ".[dev]"
server
```

Health check:

```bash
curl http://localhost:8000/health
```

Run with Docker:

```bash
docker build -t commitguard .
docker run -p 7860:7860 commitguard
curl http://localhost:7860/health
```

## API

- `GET /health`
- `POST /reset`
- `POST /step`
- `GET /state`
- `GET /docs`

Example action:

```xml
<action>
  <action_type>verdict</action_type>
  <is_vulnerable>true</is_vulnerable>
  <vuln_type>CWE-119</vuln_type>
  <exploit_sketch>unchecked buffer copy can overflow the destination</exploit_sketch>
</action>
```

## Validation

Before submission:

```bash
pytest tests/test_action_parser.py
pytest tests/test_reward.py
pytest tests/test_no_leak.py
pytest tests/test_env_smoke.py
```

Also smoke-test the public Space:

```bash
curl https://nitishkumar-ai-commitguard-env.hf.space/health
```

## Scope

This submission intentionally stays on the locked v1 architecture: three actions, server-side dataset-grounded rewards, and no sandbox execution. Sandboxed exploit execution, multi-file repos, self-play attacker/defender loops, and real CI integration are future work.