Spaces:

build-small-hackathon
/

workbench

Running on Zero

App Files Files Community

workbench / docs /PRD_IMPLEMENTATION_MATRIX.md

GitHub Actions

Initial ZeroGPU deployment with spaces shim

7f9dfed 5 days ago

preview code

raw

history blame contribute delete

7.85 kB

A newer version of the Gradio SDK is available: 6.18.0

Upgrade

PRD Implementation Matrix

This file maps the main PRD and extension PRD to current implementation status.

Summary

The full PRD and extension PRD are not fully implemented yet.

Current state:

Foundation, docs, test policy, quality gates, CI, placeholder Gradio surfaces, and a Plant Discovery reference app exist.
Shared app state, service registries, local event logging, tab-level error events, and local trace preview exist.
Local llama.cpp settings, GGUF/mmproj pickers, and command generation exist without startup downloads.
GGUF export planning exists with tool detection and explicit non-executing command plans.
Local JSONL tracing and optional Trackio wrapper exist.
Dataset statistics and local MCP tool functions exist.
OCR correction loop exists locally for CSV/JSONL prediction imports into Field Notes.
VINDEX integration boundary exists locally as non-executing MCP-style planning tools.
Local non-autonomous agent mode exists with trace export.
Real local model inference is partially implemented through llama.cpp, llama-cpp-python, Ollama, OpenAI-compatible/LM Studio, SGLang, Transformers text, and MiniCPM vision services. Verified local paths now include llama.cpp CLI text, llama-cpp-python GGUF text, LM Studio text, and OpenBMB MiniCPM-V Plant image inference. The Status tab includes llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang command/check/stop setup, and Ollama local model listing plus explicit pull-command planning. LM Studio text generation is live-verified with llama-3.2-1b-instruct; the other real backends still need local verification.
WORKBENCH_DEPLOYMENT=space now hides placeholder backend choices and refuses placeholder/demo service creation for deployed app paths.
LoRA training execution, served MCP endpoint, deployment, and most extensions are not implemented.
Placeholder services remain intentionally visible so the app never pretends to be real inference.

Main PRD

PRD Area	Status	Evidence / Next Step
Purpose and design philosophy	Documented	`README.md`, `docs/ROADMAP.md`
Template architecture	Partial	Config-driven model catalog exists; `docs/TEMPLATE_HOWTO.md` and `plant/` show the first domain-app pattern
System architecture	Partial	`app.py`, `core/`, `models/`, `ui/`, `datasets/`, local app state/events
Model registry	Partial	`config/models.yaml`, `models/model_catalog.py`; includes GGUF and backend capability metadata
Five inference modes	Partial	llama.cpp, llama-cpp-python, OpenAI-compatible/LM Studio, and MiniCPM-V Plant image inference are locally verified; Ollama generation, SGLang server generation, vLLM server generation, and llama.cpp mmproj vision remain unverified
Trackio	Partial	Local traces, optional Trackio wrapper, and HF Space sync docs exist; credentials/package setup still missing
MCP layer	Partial	Local tool functions, Gradio-native MCP path metadata, `mcp_server=True` launch flag, and local invocation tests exist; full external client verification still missing
Training pipeline	Partial	`training/` package supports dry-run planning, non-executing LoRA request planning, export planning, exact-match/perplexity evaluation, and local logging; real PEFT/TRL execution missing
Export and quantization	Partial	`training/export.py` and Export tab plan downloads/conversion/quantization and expose existing exported files for download; execution still missing
Agent mode	Partial	Local deterministic task and paper-to-code trace loops exist with safety gates; autonomous execution and remote uploads missing
UI tabs	Partial	Tabs exist; Chat/Vision/Dataset/Field Notes/Status have behavior; Status includes SGLang setup; tab actions have Gradio progress indicators; Chat/Vision/Dataset have tab-level status/error messages; compact responsive CSS exists; several tabs are still placeholders
Field notes	Partial	CSV save, SQLite store, corrected/tag/training filters, media paths, OCR uncertain import, JSONL export, and local HF Dataset export exist; remote HF upload missing
Directory structure	Partial	Foundation exists; many PRD packages missing
Configuration schema	Partial	Model/training config plus ignored local backend config exists; validation is lightweight
Dependencies	Partial	Runtime/dev deps exist for scaffold; full model/training deps not added
Hackathon demo flow	Partial	`docs/HACKATHON_SUBMISSION.md` drafts story, user, demo flow, script, social post, and URLs; real backend and Space URL still missing
Corrections from PRD v1	Documented in PRD	Not all implemented
Roadmap and extension points	Documented	`docs/ROADMAP.md`, `docs/TASKS.md`

Extension PRD

Extension	Status	Evidence / Next Step
vLLM serving tab	Implemented locally, not locally verified	`models/vllm_runner.py` and vLLM tab provide command planning, health checks, metrics parsing, benchmark logging, and OpenAI-compatible chat client; needs installed/running vLLM for real serving
Ollama quick-start	Partial	Service, UI backend selector, local model listing, explicit pull-command planning, and setup docs exist; local Ollama install/real model verification missing
Reward model eval	Implemented locally	`training/reward_eval.py` provides deterministic reward scoring, best-of-N, DPO pairs, and LoRA-vs-base reward reports
Synthetic data generation	Implemented locally	`datasets/synthetic.py` provides deterministic generation, validation, filtering, augmentation, and JSONL export
Paper-to-code agent	Implemented locally	Agent tab and `agent/runner.py` support paper input, research/plan/implementation/verify trace, and safety gates without autonomous execution
HF Spaces deploy	Partial	README metadata, deployment helper, command plan, required-file validation, Workbench/Plant target URLs, and remote/build status checks exist; HF auth/remote/push/build verification still missing
VINDEX integration	Implemented locally, execution disabled	`mcp_tools/vindex_tool.py` validates VINDEX methods, builds safe call plans, reports dependency/server status, and documents that actual edits require a verified local VINDEX install
OCR pipeline hook	Implemented locally	`datasets/ocr.py` and Field Notes tab support local OCR prediction loading, confidence thresholds, uncertain import, human correction, and corrected JSONL export
MiniCPM Desk-Pet	Not implemented	Needs persona schema/export
MiniCPM-o audio tab	Not implemented	Needs audio tab and omnimodal backend
Cross-extension wiring	Partial	OCR -> Field Notes -> Training, Synthetic Gen -> Reward Eval -> DPO, Agent -> Desk-Pet Persona, and HF Spaces -> Trackio are documented; remaining wiring depends on unimplemented runtime modules

Quality Coverage

Current verified gates:

Structure check passes.
187 unit/user-story tests pass.
Coverage report passes at 68%, above the current 60% configured threshold.
2 lightweight performance tests pass.
Ruff passes.
Mypy passes.
Pylint passes at 10/10.
Bandit reports no issues.
Pip-audit reports no known vulnerabilities in .venv.
LM Studio /v1/models and /v1/chat/completions are verified locally for llama-3.2-1b-instruct.
Workbench and Plant Playwright screenshot flows pass through npm run e2e.
CI workflow exists but has not run remotely.
App launch has been verified locally through Playwright, but the server is not currently left running.

No Pretend-Done Rule

Any row marked Partial, Placeholder, or Not implemented must not be described as complete. When a row is implemented, update this file, docs/TASKS.md, and docs/IMPLEMENTATION_STATUS.md.