Spaces:
Running on A10G
title: CommitGuard
emoji: π‘οΈ
colorFrom: indigo
colorTo: red
sdk: docker
pinned: false
CommitGuard
CommitGuard is an OpenEnv environment for AI-paced professional security review. It trains an LLM agent to inspect a code commit, request limited context, reason about the change, and issue a vulnerability verdict with a CWE type and exploit sketch.
Primary hackathon theme: Theme #3.1 - World Modeling / Professional Tasks.
Secondary theme: Theme #2 - Long-Horizon Planning & Instruction Following.
Problem
AI coding agents now write and ship code much faster than traditional security review cycles can handle. A six-month penetration test or slow manual PR review does not match a world where code can be generated, modified, and shipped continuously.
CommitGuard turns commit-time security review into a trainable environment: the agent sees a partially observable code diff, spends a limited investigation budget, and earns verifiable rewards for correctly identifying vulnerabilities.
Environment
Each episode is a single commit-level investigation.
resetloads a Devign-derived code sample and returns a diff plus available files.- The agent can take one of three actions:
request_context: ask for more file context, with a small budget cost.analyze: write intermediate reasoning for traceability.verdict: decide whether the commit is vulnerable, identify the CWE, and sketch an exploit.
stepreturns the next observation, scalar reward, and done flag.statereturns episode metadata without leaking labels.
The agent never sees ground truth labels. Ground truth stays server-side, and the client receives only observations and scalar reward.
Reward
CommitGuard uses dataset-grounded RLVR-style rewards, not an LLM judge.
| Signal | Reward |
|---|---|
| Correct vulnerable/safe verdict | +1.0 |
| Correct CWE classification | up to +0.5 |
| Plausible exploit sketch keyword match | up to +0.5 |
| False positive | -1.0 |
| False negative | -0.5 |
| Extra context requests | -0.05 each after the first |
| Malformed action | -0.5 |
This makes the task harder than static classification: the agent must manage investigation budget and produce structured, parseable actions.
Naive baseline strategies (always_vuln, always_safe, random) achieve near-zero precision, recall, and F1 β confirming no trivial strategy can game the reward signal.
Results
We evaluated a baseline against the trained agent on 100 held-out samples.
| Run | Correct | Accuracy |
|---|---|---|
| Baseline | 50 / 100 | 50% |
| Trained | 74 / 100 | 74% |
Cumulative mean reward across 500 episodes shows all naive strategies (always_vuln, always_safe, random) plateau at low reward, while the trained agent learns to do better.
The trained agent improves over the baseline on held-out commit-level vulnerability detection.
Per-CWE accuracy shows the trained agent outperforms the baseline across all four vulnerability families (CWE-89, CWE-119, CWE-79, CWE-20).
Training
The judge-runnable training path is the Colab-ready notebook:
The script path is also available:
python scripts/train_grpo.py \
--env-url https://nitishkumar-ai-commitguard-env.hf.space \
--samples 200 \
--max-steps 300 \
--num-generations 4 \
--batch-size 1 \
--grad-accum 4
If --env-url or COMMITGUARD_ENV_URL is set, the training script scores completions through the running CommitGuard environment. Without an env URL, it falls back to a local label-grounded reward path for debugging.
The reward curve below shows the naive always-vulnerable baseline β flat and penalized β which the trained agent must surpass. Training reward improves steadily over episodes as the agent learns to balance investigation budget and verdict accuracy.
Links
- Hugging Face Space: Nitishkumar-ai/commitguard-env
- Training notebook: notebooks/train_commitguard.ipynb
- Mini-blog / short writeup: commitguard_hf_blog.md
- Trained model target: inmodel-labs/commitguard-llama-3b
- GCE training runbook: scripts/gce_vm_runbook.md
Project Structure
commitguard/
βββ commitguard_env/ # Core logic (environment, server, model)
βββ docs/ # Detailed documentation and guides
βββ data/ # Devign-derived datasets
βββ scripts/ # Training and evaluation entrypoints
βββ results/ # Evaluation artifacts and JSON reports
βββ notebooks/ # Interactive training notebooks
βββ plots/ # Visualization artifacts
βββ tests/ # Comprehensive test suite
βββ configs/ # Configuration files
Quickstart
Install locally:
python -m pip install -e ".[dev]"
server
Health check:
curl http://localhost:8000/health
Run with Docker:
docker build -t commitguard .
docker run -p 7860:7860 commitguard
curl http://localhost:7860/health
API
GET /healthPOST /resetPOST /stepGET /stateGET /docs
Example action:
<action>
<action_type>verdict</action_type>
<is_vulnerable>true</is_vulnerable>
<vuln_type>CWE-119</vuln_type>
<exploit_sketch>unchecked buffer copy can overflow the destination</exploit_sketch>
</action>
Validation
Before submission:
pytest tests/test_action_parser.py
pytest tests/test_reward.py
pytest tests/test_no_leak.py
pytest tests/test_env_smoke.py
Also smoke-test the public Space:
curl https://nitishkumar-ai-commitguard-env.hf.space/health
Scope
This submission intentionally stays on the locked v1 architecture: three actions, server-side dataset-grounded rewards, and no sandbox execution. Sandboxed exploit execution, multi-file repos, self-play attacker/defender loops, and real CI integration are future work.





