Spaces:

Nitishkumar-ai
/

commitguard-env

Running on A10G

App Files Files Community

commitguard-env / README.md

Nitishkumar-ai

Deployment Build (Final): Professional Structure + Blog

95cbc5b about 4 hours ago

preview code

raw

history blame contribute delete

6.59 kB

	---
	title: CommitGuard
	emoji: 🛡️
	colorFrom: indigo
	colorTo: red
	sdk: docker
	pinned: false
	---

	# CommitGuard

	CommitGuard is an OpenEnv environment for AI-paced professional security review. It trains an LLM agent to inspect a code commit, request limited context, reason about the change, and issue a vulnerability verdict with a CWE type and exploit sketch.

	Primary hackathon theme: Theme #3.1 - World Modeling / Professional Tasks.
	Secondary theme: Theme #2 - Long-Horizon Planning & Instruction Following.

	## Problem

	AI coding agents now write and ship code much faster than traditional security review cycles can handle. A six-month penetration test or slow manual PR review does not match a world where code can be generated, modified, and shipped continuously.

	CommitGuard turns commit-time security review into a trainable environment: the agent sees a partially observable code diff, spends a limited investigation budget, and earns verifiable rewards for correctly identifying vulnerabilities.

	## Environment

	Each episode is a single commit-level investigation.

	1. `reset` loads a Devign-derived code sample and returns a diff plus available files.
	2. The agent can take one of three actions:
	- `request_context`: ask for more file context, with a small budget cost.
	- `analyze`: write intermediate reasoning for traceability.
	- `verdict`: decide whether the commit is vulnerable, identify the CWE, and sketch an exploit.
	3. `step` returns the next observation, scalar reward, and done flag.
	4. `state` returns episode metadata without leaking labels.

	The agent never sees ground truth labels. Ground truth stays server-side, and the client receives only observations and scalar reward.

	## Reward

	CommitGuard uses dataset-grounded RLVR-style rewards, not an LLM judge.

	\| Signal \| Reward \|
	\|---\|---:\|
	\| Correct vulnerable/safe verdict \| +1.0 \|
	\| Correct CWE classification \| up to +0.5 \|
	\| Plausible exploit sketch keyword match \| up to +0.5 \|
	\| False positive \| -1.0 \|
	\| False negative \| -0.5 \|
	\| Extra context requests \| -0.05 each after the first \|
	\| Malformed action \| -0.5 \|

	This makes the task harder than static classification: the agent must manage investigation budget and produce structured, parseable actions.

	Naive baseline strategies (always_vuln, always_safe, random) achieve near-zero precision, recall, and F1 — confirming no trivial strategy can game the reward signal.

	![Baseline evaluation metrics](plots/readme_eval_baselines.gif)

	## Results

	We evaluated a baseline against the trained agent on 100 held-out samples.

	\| Run \| Correct \| Accuracy \|
	\|---\|---:\|---:\|
	\| Baseline \| 50 / 100 \| 50% \|
	\| Trained \| 74 / 100 \| 74% \|

	![Vulnerability detection baseline vs trained](plots/readme_baseline_vs_trained.gif)

	Cumulative mean reward across 500 episodes shows all naive strategies (always_vuln, always_safe, random) plateau at low reward, while the trained agent learns to do better.

	![Cumulative mean reward by strategy](plots/readme_cumulative_mean_reward.gif)

	The trained agent improves over the baseline on held-out commit-level vulnerability detection.

	Per-CWE accuracy shows the trained agent outperforms the baseline across all four vulnerability families (CWE-89, CWE-119, CWE-79, CWE-20).

	![Per-CWE breakdown](plots/readme_per_cwe.gif)

	## Training

	The judge-runnable training path is the Colab-ready notebook:

	- [Training notebook](notebooks/train_commitguard.ipynb)

	The script path is also available:

	```bash
	python scripts/train_grpo.py \
	--env-url https://nitishkumar-ai-commitguard-env.hf.space \
	--samples 200 \
	--max-steps 300 \
	--num-generations 4 \
	--batch-size 1 \
	--grad-accum 4
	```

	If `--env-url` or `COMMITGUARD_ENV_URL` is set, the training script scores completions through the running CommitGuard environment. Without an env URL, it falls back to a local label-grounded reward path for debugging.

	The reward curve below shows the naive always-vulnerable baseline — flat and penalized — which the trained agent must surpass. Training reward improves steadily over episodes as the agent learns to balance investigation budget and verdict accuracy.

	![Baseline reward curve](plots/readme_baseline_reward_curve.gif)

	![GRPO training reward curve](plots/readme_grpo_reward_curve.gif)

	## Links

	- Hugging Face Space: [Nitishkumar-ai/commitguard-env](https://huggingface.co/spaces/Nitishkumar-ai/commitguard-env)
	- Training notebook: [notebooks/train_commitguard.ipynb](notebooks/train_commitguard.ipynb)
	- Mini-blog / short writeup: [commitguard_hf_blog.md](commitguard_hf_blog.md)
	- Trained model target: [inmodel-labs/commitguard-llama-3b](https://huggingface.co/inmodel-labs/commitguard-llama-3b)
	- GCE training runbook: [scripts/gce_vm_runbook.md](scripts/gce_vm_runbook.md)

	## Project Structure

	```text
	commitguard/
	├── commitguard_env/ # Core logic (environment, server, model)
	├── docs/ # Detailed documentation and guides
	├── data/ # Devign-derived datasets
	├── scripts/ # Training and evaluation entrypoints
	├── results/ # Evaluation artifacts and JSON reports
	├── notebooks/ # Interactive training notebooks
	├── plots/ # Visualization artifacts
	├── tests/ # Comprehensive test suite
	└── configs/ # Configuration files
	```

	## Quickstart

	Install locally:

	```bash
	python -m pip install -e ".[dev]"
	server
	```

	Health check:

	```bash
	curl http://localhost:8000/health
	```

	Run with Docker:

	```bash
	docker build -t commitguard .
	docker run -p 7860:7860 commitguard
	curl http://localhost:7860/health
	```

	## API

	- `GET /health`
	- `POST /reset`
	- `POST /step`
	- `GET /state`
	- `GET /docs`

	Example action:

	```xml
	<action>
	<action_type>verdict</action_type>
	<is_vulnerable>true</is_vulnerable>
	<vuln_type>CWE-119</vuln_type>
	<exploit_sketch>unchecked buffer copy can overflow the destination</exploit_sketch>
	</action>
	```

	## Validation

	Before submission:

	```bash
	pytest tests/test_action_parser.py
	pytest tests/test_reward.py
	pytest tests/test_no_leak.py
	pytest tests/test_env_smoke.py
	```

	Also smoke-test the public Space:

	```bash
	curl https://nitishkumar-ai-commitguard-env.hf.space/health
	```

	## Scope

	This submission intentionally stays on the locked v1 architecture: three actions, server-side dataset-grounded rewards, and no sandbox execution. Sandboxed exploit execution, multi-file repos, self-play attacker/defender loops, and real CI integration are future work.