Spaces:

build-small-hackathon
/

workbench

Running on Zero

App Files Files Community

workbench / docs /PRD_IMPLEMENTATION_MATRIX.md

GitHub Actions

Initial ZeroGPU deployment with spaces shim

7f9dfed 9 days ago

preview code

Raw

History Blame Contribute Delete

7.85 kB

	# PRD Implementation Matrix

	This file maps the main PRD and extension PRD to current implementation status.

	## Summary

	The full PRD and extension PRD are not fully implemented yet.

	Current state:

	- Foundation, docs, test policy, quality gates, CI, placeholder Gradio surfaces, and a Plant
	Discovery reference app exist.
	- Shared app state, service registries, local event logging, tab-level error events, and local
	trace preview exist.
	- Local llama.cpp settings, GGUF/mmproj pickers, and command generation exist without startup downloads.
	- GGUF export planning exists with tool detection and explicit non-executing command plans.
	- Local JSONL tracing and optional Trackio wrapper exist.
	- Dataset statistics and local MCP tool functions exist.
	- OCR correction loop exists locally for CSV/JSONL prediction imports into Field Notes.
	- VINDEX integration boundary exists locally as non-executing MCP-style planning tools.
	- Local non-autonomous agent mode exists with trace export.
	- Real local model inference is partially implemented through llama.cpp, llama-cpp-python, Ollama,
	OpenAI-compatible/LM Studio, SGLang, Transformers text, and MiniCPM vision services. Verified
	local paths now include llama.cpp CLI text, llama-cpp-python GGUF text, LM Studio text, and
	OpenBMB MiniCPM-V Plant image inference. The Status
	tab includes llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang command/check/stop setup,
	and Ollama local model listing plus explicit pull-command planning. LM Studio text generation is
	live-verified with `llama-3.2-1b-instruct`; the other real backends still need local verification.
	- `WORKBENCH_DEPLOYMENT=space` now hides placeholder backend choices and refuses placeholder/demo
	service creation for deployed app paths.
	- LoRA training execution, served MCP endpoint, deployment, and most extensions are not implemented.
	- Placeholder services remain intentionally visible so the app never pretends to be real inference.

	## Main PRD

	\| PRD Area \| Status \| Evidence / Next Step \|
	\| --- \| --- \| --- \|
	\| Purpose and design philosophy \| Documented \| `README.md`, `docs/ROADMAP.md` \|
	\| Template architecture \| Partial \| Config-driven model catalog exists; `docs/TEMPLATE_HOWTO.md` and `plant/` show the first domain-app pattern \|
	\| System architecture \| Partial \| `app.py`, `core/`, `models/`, `ui/`, `datasets/`, local app state/events \|
	\| Model registry \| Partial \| `config/models.yaml`, `models/model_catalog.py`; includes GGUF and backend capability metadata \|
	\| Five inference modes \| Partial \| llama.cpp, llama-cpp-python, OpenAI-compatible/LM Studio, and MiniCPM-V Plant image inference are locally verified; Ollama generation, SGLang server generation, vLLM server generation, and llama.cpp mmproj vision remain unverified \|
	\| Trackio \| Partial \| Local traces, optional Trackio wrapper, and HF Space sync docs exist; credentials/package setup still missing \|
	\| MCP layer \| Partial \| Local tool functions, Gradio-native MCP path metadata, `mcp_server=True` launch flag, and local invocation tests exist; full external client verification still missing \|
	\| Training pipeline \| Partial \| `training/` package supports dry-run planning, non-executing LoRA request planning, export planning, exact-match/perplexity evaluation, and local logging; real PEFT/TRL execution missing \|
	\| Export and quantization \| Partial \| `training/export.py` and Export tab plan downloads/conversion/quantization and expose existing exported files for download; execution still missing \|
	\| Agent mode \| Partial \| Local deterministic task and paper-to-code trace loops exist with safety gates; autonomous execution and remote uploads missing \|
	\| UI tabs \| Partial \| Tabs exist; Chat/Vision/Dataset/Field Notes/Status have behavior; Status includes SGLang setup; tab actions have Gradio progress indicators; Chat/Vision/Dataset have tab-level status/error messages; compact responsive CSS exists; several tabs are still placeholders \|
	\| Field notes \| Partial \| CSV save, SQLite store, corrected/tag/training filters, media paths, OCR uncertain import, JSONL export, and local HF Dataset export exist; remote HF upload missing \|
	\| Directory structure \| Partial \| Foundation exists; many PRD packages missing \|
	\| Configuration schema \| Partial \| Model/training config plus ignored local backend config exists; validation is lightweight \|
	\| Dependencies \| Partial \| Runtime/dev deps exist for scaffold; full model/training deps not added \|
	\| Hackathon demo flow \| Partial \| `docs/HACKATHON_SUBMISSION.md` drafts story, user, demo flow, script, social post, and URLs; real backend and Space URL still missing \|
	\| Corrections from PRD v1 \| Documented in PRD \| Not all implemented \|
	\| Roadmap and extension points \| Documented \| `docs/ROADMAP.md`, `docs/TASKS.md` \|

	## Extension PRD

	\| Extension \| Status \| Evidence / Next Step \|
	\| --- \| --- \| --- \|
	\| vLLM serving tab \| Implemented locally, not locally verified \| `models/vllm_runner.py` and vLLM tab provide command planning, health checks, metrics parsing, benchmark logging, and OpenAI-compatible chat client; needs installed/running vLLM for real serving \|
	\| Ollama quick-start \| Partial \| Service, UI backend selector, local model listing, explicit pull-command planning, and setup docs exist; local Ollama install/real model verification missing \|
	\| Reward model eval \| Implemented locally \| `training/reward_eval.py` provides deterministic reward scoring, best-of-N, DPO pairs, and LoRA-vs-base reward reports \|
	\| Synthetic data generation \| Implemented locally \| `datasets/synthetic.py` provides deterministic generation, validation, filtering, augmentation, and JSONL export \|
	\| Paper-to-code agent \| Implemented locally \| Agent tab and `agent/runner.py` support paper input, research/plan/implementation/verify trace, and safety gates without autonomous execution \|
	\| HF Spaces deploy \| Partial \| README metadata, deployment helper, command plan, required-file validation, Workbench/Plant target URLs, and remote/build status checks exist; HF auth/remote/push/build verification still missing \|
	\| VINDEX integration \| Implemented locally, execution disabled \| `mcp_tools/vindex_tool.py` validates VINDEX methods, builds safe call plans, reports dependency/server status, and documents that actual edits require a verified local VINDEX install \|
	\| OCR pipeline hook \| Implemented locally \| `datasets/ocr.py` and Field Notes tab support local OCR prediction loading, confidence thresholds, uncertain import, human correction, and corrected JSONL export \|
	\| MiniCPM Desk-Pet \| Not implemented \| Needs persona schema/export \|
	\| MiniCPM-o audio tab \| Not implemented \| Needs audio tab and omnimodal backend \|
	\| Cross-extension wiring \| Partial \| OCR -> Field Notes -> Training, Synthetic Gen -> Reward Eval -> DPO, Agent -> Desk-Pet Persona, and HF Spaces -> Trackio are documented; remaining wiring depends on unimplemented runtime modules \|

	## Quality Coverage

	Current verified gates:

	- Structure check passes.
	- 187 unit/user-story tests pass.
	- Coverage report passes at 68%, above the current 60% configured threshold.
	- 2 lightweight performance tests pass.
	- Ruff passes.
	- Mypy passes.
	- Pylint passes at 10/10.
	- Bandit reports no issues.
	- Pip-audit reports no known vulnerabilities in `.venv`.
	- LM Studio `/v1/models` and `/v1/chat/completions` are verified locally for
	`llama-3.2-1b-instruct`.
	- Workbench and Plant Playwright screenshot flows pass through `npm run e2e`.
	- CI workflow exists but has not run remotely.
	- App launch has been verified locally through Playwright, but the server is not currently left running.

	## No Pretend-Done Rule

	Any row marked `Partial`, `Placeholder`, or `Not implemented` must not be described as complete.
	When a row is implemented, update this file, `docs/TASKS.md`, and `docs/IMPLEMENTATION_STATUS.md`.