commitguard-env / README.md
Nitishkumar-ai's picture
Deployment Build (Final): Professional Structure + Blog
95cbc5b
---
title: CommitGuard
emoji: πŸ›‘οΈ
colorFrom: indigo
colorTo: red
sdk: docker
pinned: false
---
# CommitGuard
CommitGuard is an OpenEnv environment for **AI-paced professional security review**. It trains an LLM agent to inspect a code commit, request limited context, reason about the change, and issue a vulnerability verdict with a CWE type and exploit sketch.
Primary hackathon theme: **Theme #3.1 - World Modeling / Professional Tasks**.
Secondary theme: **Theme #2 - Long-Horizon Planning & Instruction Following**.
## Problem
AI coding agents now write and ship code much faster than traditional security review cycles can handle. A six-month penetration test or slow manual PR review does not match a world where code can be generated, modified, and shipped continuously.
CommitGuard turns commit-time security review into a trainable environment: the agent sees a partially observable code diff, spends a limited investigation budget, and earns verifiable rewards for correctly identifying vulnerabilities.
## Environment
Each episode is a single commit-level investigation.
1. `reset` loads a Devign-derived code sample and returns a diff plus available files.
2. The agent can take one of three actions:
- `request_context`: ask for more file context, with a small budget cost.
- `analyze`: write intermediate reasoning for traceability.
- `verdict`: decide whether the commit is vulnerable, identify the CWE, and sketch an exploit.
3. `step` returns the next observation, scalar reward, and done flag.
4. `state` returns episode metadata without leaking labels.
The agent never sees ground truth labels. Ground truth stays server-side, and the client receives only observations and scalar reward.
## Reward
CommitGuard uses dataset-grounded RLVR-style rewards, not an LLM judge.
| Signal | Reward |
|---|---:|
| Correct vulnerable/safe verdict | +1.0 |
| Correct CWE classification | up to +0.5 |
| Plausible exploit sketch keyword match | up to +0.5 |
| False positive | -1.0 |
| False negative | -0.5 |
| Extra context requests | -0.05 each after the first |
| Malformed action | -0.5 |
This makes the task harder than static classification: the agent must manage investigation budget and produce structured, parseable actions.
Naive baseline strategies (always_vuln, always_safe, random) achieve near-zero precision, recall, and F1 β€” confirming no trivial strategy can game the reward signal.
![Baseline evaluation metrics](plots/readme_eval_baselines.gif)
## Results
We evaluated a baseline against the trained agent on 100 held-out samples.
| Run | Correct | Accuracy |
|---|---:|---:|
| Baseline | 50 / 100 | 50% |
| Trained | 74 / 100 | 74% |
![Vulnerability detection baseline vs trained](plots/readme_baseline_vs_trained.gif)
Cumulative mean reward across 500 episodes shows all naive strategies (always_vuln, always_safe, random) plateau at low reward, while the trained agent learns to do better.
![Cumulative mean reward by strategy](plots/readme_cumulative_mean_reward.gif)
The trained agent improves over the baseline on held-out commit-level vulnerability detection.
Per-CWE accuracy shows the trained agent outperforms the baseline across all four vulnerability families (CWE-89, CWE-119, CWE-79, CWE-20).
![Per-CWE breakdown](plots/readme_per_cwe.gif)
## Training
The judge-runnable training path is the Colab-ready notebook:
- [Training notebook](notebooks/train_commitguard.ipynb)
The script path is also available:
```bash
python scripts/train_grpo.py \
--env-url https://nitishkumar-ai-commitguard-env.hf.space \
--samples 200 \
--max-steps 300 \
--num-generations 4 \
--batch-size 1 \
--grad-accum 4
```
If `--env-url` or `COMMITGUARD_ENV_URL` is set, the training script scores completions through the running CommitGuard environment. Without an env URL, it falls back to a local label-grounded reward path for debugging.
The reward curve below shows the naive always-vulnerable baseline β€” flat and penalized β€” which the trained agent must surpass. Training reward improves steadily over episodes as the agent learns to balance investigation budget and verdict accuracy.
![Baseline reward curve](plots/readme_baseline_reward_curve.gif)
![GRPO training reward curve](plots/readme_grpo_reward_curve.gif)
## Links
- **Hugging Face Space:** [Nitishkumar-ai/commitguard-env](https://huggingface.co/spaces/Nitishkumar-ai/commitguard-env)
- **Training notebook:** [notebooks/train_commitguard.ipynb](notebooks/train_commitguard.ipynb)
- **Mini-blog / short writeup:** [commitguard_hf_blog.md](commitguard_hf_blog.md)
- **Trained model target:** [inmodel-labs/commitguard-llama-3b](https://huggingface.co/inmodel-labs/commitguard-llama-3b)
- **GCE training runbook:** [scripts/gce_vm_runbook.md](scripts/gce_vm_runbook.md)
## Project Structure
```text
commitguard/
β”œβ”€β”€ commitguard_env/ # Core logic (environment, server, model)
β”œβ”€β”€ docs/ # Detailed documentation and guides
β”œβ”€β”€ data/ # Devign-derived datasets
β”œβ”€β”€ scripts/ # Training and evaluation entrypoints
β”œβ”€β”€ results/ # Evaluation artifacts and JSON reports
β”œβ”€β”€ notebooks/ # Interactive training notebooks
β”œβ”€β”€ plots/ # Visualization artifacts
β”œβ”€β”€ tests/ # Comprehensive test suite
└── configs/ # Configuration files
```
## Quickstart
Install locally:
```bash
python -m pip install -e ".[dev]"
server
```
Health check:
```bash
curl http://localhost:8000/health
```
Run with Docker:
```bash
docker build -t commitguard .
docker run -p 7860:7860 commitguard
curl http://localhost:7860/health
```
## API
- `GET /health`
- `POST /reset`
- `POST /step`
- `GET /state`
- `GET /docs`
Example action:
```xml
<action>
<action_type>verdict</action_type>
<is_vulnerable>true</is_vulnerable>
<vuln_type>CWE-119</vuln_type>
<exploit_sketch>unchecked buffer copy can overflow the destination</exploit_sketch>
</action>
```
## Validation
Before submission:
```bash
pytest tests/test_action_parser.py
pytest tests/test_reward.py
pytest tests/test_no_leak.py
pytest tests/test_env_smoke.py
```
Also smoke-test the public Space:
```bash
curl https://nitishkumar-ai-commitguard-env.hf.space/health
```
## Scope
This submission intentionally stays on the locked v1 architecture: three actions, server-side dataset-grounded rewards, and no sandbox execution. Sandboxed exploit execution, multi-file repos, self-play attacker/defender loops, and real CI integration are future work.