Spaces:
Running on A10G
Running on A10G
File size: 6,592 Bytes
95cbc5b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | ---
title: CommitGuard
emoji: π‘οΈ
colorFrom: indigo
colorTo: red
sdk: docker
pinned: false
---
# CommitGuard
CommitGuard is an OpenEnv environment for **AI-paced professional security review**. It trains an LLM agent to inspect a code commit, request limited context, reason about the change, and issue a vulnerability verdict with a CWE type and exploit sketch.
Primary hackathon theme: **Theme #3.1 - World Modeling / Professional Tasks**.
Secondary theme: **Theme #2 - Long-Horizon Planning & Instruction Following**.
## Problem
AI coding agents now write and ship code much faster than traditional security review cycles can handle. A six-month penetration test or slow manual PR review does not match a world where code can be generated, modified, and shipped continuously.
CommitGuard turns commit-time security review into a trainable environment: the agent sees a partially observable code diff, spends a limited investigation budget, and earns verifiable rewards for correctly identifying vulnerabilities.
## Environment
Each episode is a single commit-level investigation.
1. `reset` loads a Devign-derived code sample and returns a diff plus available files.
2. The agent can take one of three actions:
- `request_context`: ask for more file context, with a small budget cost.
- `analyze`: write intermediate reasoning for traceability.
- `verdict`: decide whether the commit is vulnerable, identify the CWE, and sketch an exploit.
3. `step` returns the next observation, scalar reward, and done flag.
4. `state` returns episode metadata without leaking labels.
The agent never sees ground truth labels. Ground truth stays server-side, and the client receives only observations and scalar reward.
## Reward
CommitGuard uses dataset-grounded RLVR-style rewards, not an LLM judge.
| Signal | Reward |
|---|---:|
| Correct vulnerable/safe verdict | +1.0 |
| Correct CWE classification | up to +0.5 |
| Plausible exploit sketch keyword match | up to +0.5 |
| False positive | -1.0 |
| False negative | -0.5 |
| Extra context requests | -0.05 each after the first |
| Malformed action | -0.5 |
This makes the task harder than static classification: the agent must manage investigation budget and produce structured, parseable actions.
Naive baseline strategies (always_vuln, always_safe, random) achieve near-zero precision, recall, and F1 β confirming no trivial strategy can game the reward signal.

## Results
We evaluated a baseline against the trained agent on 100 held-out samples.
| Run | Correct | Accuracy |
|---|---:|---:|
| Baseline | 50 / 100 | 50% |
| Trained | 74 / 100 | 74% |

Cumulative mean reward across 500 episodes shows all naive strategies (always_vuln, always_safe, random) plateau at low reward, while the trained agent learns to do better.

The trained agent improves over the baseline on held-out commit-level vulnerability detection.
Per-CWE accuracy shows the trained agent outperforms the baseline across all four vulnerability families (CWE-89, CWE-119, CWE-79, CWE-20).

## Training
The judge-runnable training path is the Colab-ready notebook:
- [Training notebook](notebooks/train_commitguard.ipynb)
The script path is also available:
```bash
python scripts/train_grpo.py \
--env-url https://nitishkumar-ai-commitguard-env.hf.space \
--samples 200 \
--max-steps 300 \
--num-generations 4 \
--batch-size 1 \
--grad-accum 4
```
If `--env-url` or `COMMITGUARD_ENV_URL` is set, the training script scores completions through the running CommitGuard environment. Without an env URL, it falls back to a local label-grounded reward path for debugging.
The reward curve below shows the naive always-vulnerable baseline β flat and penalized β which the trained agent must surpass. Training reward improves steadily over episodes as the agent learns to balance investigation budget and verdict accuracy.


## Links
- **Hugging Face Space:** [Nitishkumar-ai/commitguard-env](https://huggingface.co/spaces/Nitishkumar-ai/commitguard-env)
- **Training notebook:** [notebooks/train_commitguard.ipynb](notebooks/train_commitguard.ipynb)
- **Mini-blog / short writeup:** [commitguard_hf_blog.md](commitguard_hf_blog.md)
- **Trained model target:** [inmodel-labs/commitguard-llama-3b](https://huggingface.co/inmodel-labs/commitguard-llama-3b)
- **GCE training runbook:** [scripts/gce_vm_runbook.md](scripts/gce_vm_runbook.md)
## Project Structure
```text
commitguard/
βββ commitguard_env/ # Core logic (environment, server, model)
βββ docs/ # Detailed documentation and guides
βββ data/ # Devign-derived datasets
βββ scripts/ # Training and evaluation entrypoints
βββ results/ # Evaluation artifacts and JSON reports
βββ notebooks/ # Interactive training notebooks
βββ plots/ # Visualization artifacts
βββ tests/ # Comprehensive test suite
βββ configs/ # Configuration files
```
## Quickstart
Install locally:
```bash
python -m pip install -e ".[dev]"
server
```
Health check:
```bash
curl http://localhost:8000/health
```
Run with Docker:
```bash
docker build -t commitguard .
docker run -p 7860:7860 commitguard
curl http://localhost:7860/health
```
## API
- `GET /health`
- `POST /reset`
- `POST /step`
- `GET /state`
- `GET /docs`
Example action:
```xml
<action>
<action_type>verdict</action_type>
<is_vulnerable>true</is_vulnerable>
<vuln_type>CWE-119</vuln_type>
<exploit_sketch>unchecked buffer copy can overflow the destination</exploit_sketch>
</action>
```
## Validation
Before submission:
```bash
pytest tests/test_action_parser.py
pytest tests/test_reward.py
pytest tests/test_no_leak.py
pytest tests/test_env_smoke.py
```
Also smoke-test the public Space:
```bash
curl https://nitishkumar-ai-commitguard-env.hf.space/health
```
## Scope
This submission intentionally stays on the locked v1 architecture: three actions, server-side dataset-grounded rewards, and no sandbox execution. Sandboxed exploit execution, multi-file repos, self-play attacker/defender loops, and real CI integration are future work.
|