Spaces:

Nitishkumar-ai
/

commitguard

Configuration error

App Files Files Community

commitguard / tasks_divyank.md

Nitishkumar-ai

Upload folder using huggingface_hub

e4f3d12 verified about 22 hours ago

preview code

raw

history blame contribute delete

2.43 kB

	# Tasks Divyank (Evaluation + Storytelling)

	Project: CommitGuard OpenEnv Hackathon Submission
	Submission deadline: Sunday 5:00 PM IST
	Your role: Own the evaluation pipeline and the storytelling assets (demo video, HF blog post). You are the "Quality & Communications" lead.

	---

	## Phase 1 & 2 Foundation & Integration (Saturday Night)

	### Task 1.1 Evaluation Script Hardening (2 hours)
	Goal: Take the baseline `scripts/evaluate.py` and make it a robust testing tool.
	- [x] Update `scripts/evaluate.py` to support multi-step episodes (up to 5 steps).
	- [x] Implement logic to handle the agent's `<action>` XML outputs sequentially.
	- [x] Add support for loading a PEFT/LoRA adapter (which Niti will provide after training).
	- [x] Ensure it generates `eval_results.json` with a breakdown of Accuracy vs. CWE type.

	### Task 1.2 Dataset Spot-Check (30 min)
	Goal: Verify the quality of Deepak's 5000-sample dataset in the `mvd` branch.
	- [x] Manually review 20 random samples from `data/devign_filtered.jsonl`.
	- [x] Ensure the `diff` and `cwe_type` are consistent and reasonable for a 3B model.

	---

	## Phase 3 Demo + Storytelling (Sunday Morning)

	### Task 3.1 Baseline vs. Trained Evaluation (P0)
	Goal: Produce the data that Niti needs for the final plots.
	- [x] Run the hardened `evaluate.py` against 100 held-out samples using the Untrained model.
	- [x] Run it again once Niti provides the Trained LoRA adapter.
	- [x] Capture the delta: "Detection accuracy: X% -> Y%".

	### Task 3.2 Demo Video Recording (P0)
	Goal: Create the visceral "Emotional Hook" for the judges.
	- [ ] Pick one "Hero Case" (e.g., a clear SQL Injection).
	- [ ] Record a 90-second side-by-side: Untrained fumbling vs. Trained reasoning and identifying the vulnerability.
	- [ ] Upload to YouTube as Unlisted and provide link for the README.

	### Task 3.3 HF Hub Blog Post (P1)
	Goal: Hit the rubric requirement for community outreach.
	- [ ] Write a post on the HF Hub explaining the project, the RLVR reward design, and the results.
	- [ ] Embed the demo video and the reward plots generated by Niti.

	---

	## Sync Points
	- [x] Midnight Saturday: Confirm `evaluate.py` can handle multi-step XML traces.
	- [x] 9:00 AM Sunday: Report final accuracy numbers (Baseline vs. Trained).
	- [ ] 3:00 PM Sunday: Final link check for Video and Blog post.