workbench / docs /PRD_IMPLEMENTATION_MATRIX.md
GitHub Actions
Initial ZeroGPU deployment with spaces shim
7f9dfed

A newer version of the Gradio SDK is available: 6.18.0

Upgrade

PRD Implementation Matrix

This file maps the main PRD and extension PRD to current implementation status.

Summary

The full PRD and extension PRD are not fully implemented yet.

Current state:

  • Foundation, docs, test policy, quality gates, CI, placeholder Gradio surfaces, and a Plant Discovery reference app exist.
  • Shared app state, service registries, local event logging, tab-level error events, and local trace preview exist.
  • Local llama.cpp settings, GGUF/mmproj pickers, and command generation exist without startup downloads.
  • GGUF export planning exists with tool detection and explicit non-executing command plans.
  • Local JSONL tracing and optional Trackio wrapper exist.
  • Dataset statistics and local MCP tool functions exist.
  • OCR correction loop exists locally for CSV/JSONL prediction imports into Field Notes.
  • VINDEX integration boundary exists locally as non-executing MCP-style planning tools.
  • Local non-autonomous agent mode exists with trace export.
  • Real local model inference is partially implemented through llama.cpp, llama-cpp-python, Ollama, OpenAI-compatible/LM Studio, SGLang, Transformers text, and MiniCPM vision services. Verified local paths now include llama.cpp CLI text, llama-cpp-python GGUF text, LM Studio text, and OpenBMB MiniCPM-V Plant image inference. The Status tab includes llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang command/check/stop setup, and Ollama local model listing plus explicit pull-command planning. LM Studio text generation is live-verified with llama-3.2-1b-instruct; the other real backends still need local verification.
  • WORKBENCH_DEPLOYMENT=space now hides placeholder backend choices and refuses placeholder/demo service creation for deployed app paths.
  • LoRA training execution, served MCP endpoint, deployment, and most extensions are not implemented.
  • Placeholder services remain intentionally visible so the app never pretends to be real inference.

Main PRD

PRD Area Status Evidence / Next Step
Purpose and design philosophy Documented README.md, docs/ROADMAP.md
Template architecture Partial Config-driven model catalog exists; docs/TEMPLATE_HOWTO.md and plant/ show the first domain-app pattern
System architecture Partial app.py, core/, models/, ui/, datasets/, local app state/events
Model registry Partial config/models.yaml, models/model_catalog.py; includes GGUF and backend capability metadata
Five inference modes Partial llama.cpp, llama-cpp-python, OpenAI-compatible/LM Studio, and MiniCPM-V Plant image inference are locally verified; Ollama generation, SGLang server generation, vLLM server generation, and llama.cpp mmproj vision remain unverified
Trackio Partial Local traces, optional Trackio wrapper, and HF Space sync docs exist; credentials/package setup still missing
MCP layer Partial Local tool functions, Gradio-native MCP path metadata, mcp_server=True launch flag, and local invocation tests exist; full external client verification still missing
Training pipeline Partial training/ package supports dry-run planning, non-executing LoRA request planning, export planning, exact-match/perplexity evaluation, and local logging; real PEFT/TRL execution missing
Export and quantization Partial training/export.py and Export tab plan downloads/conversion/quantization and expose existing exported files for download; execution still missing
Agent mode Partial Local deterministic task and paper-to-code trace loops exist with safety gates; autonomous execution and remote uploads missing
UI tabs Partial Tabs exist; Chat/Vision/Dataset/Field Notes/Status have behavior; Status includes SGLang setup; tab actions have Gradio progress indicators; Chat/Vision/Dataset have tab-level status/error messages; compact responsive CSS exists; several tabs are still placeholders
Field notes Partial CSV save, SQLite store, corrected/tag/training filters, media paths, OCR uncertain import, JSONL export, and local HF Dataset export exist; remote HF upload missing
Directory structure Partial Foundation exists; many PRD packages missing
Configuration schema Partial Model/training config plus ignored local backend config exists; validation is lightweight
Dependencies Partial Runtime/dev deps exist for scaffold; full model/training deps not added
Hackathon demo flow Partial docs/HACKATHON_SUBMISSION.md drafts story, user, demo flow, script, social post, and URLs; real backend and Space URL still missing
Corrections from PRD v1 Documented in PRD Not all implemented
Roadmap and extension points Documented docs/ROADMAP.md, docs/TASKS.md

Extension PRD

Extension Status Evidence / Next Step
vLLM serving tab Implemented locally, not locally verified models/vllm_runner.py and vLLM tab provide command planning, health checks, metrics parsing, benchmark logging, and OpenAI-compatible chat client; needs installed/running vLLM for real serving
Ollama quick-start Partial Service, UI backend selector, local model listing, explicit pull-command planning, and setup docs exist; local Ollama install/real model verification missing
Reward model eval Implemented locally training/reward_eval.py provides deterministic reward scoring, best-of-N, DPO pairs, and LoRA-vs-base reward reports
Synthetic data generation Implemented locally datasets/synthetic.py provides deterministic generation, validation, filtering, augmentation, and JSONL export
Paper-to-code agent Implemented locally Agent tab and agent/runner.py support paper input, research/plan/implementation/verify trace, and safety gates without autonomous execution
HF Spaces deploy Partial README metadata, deployment helper, command plan, required-file validation, Workbench/Plant target URLs, and remote/build status checks exist; HF auth/remote/push/build verification still missing
VINDEX integration Implemented locally, execution disabled mcp_tools/vindex_tool.py validates VINDEX methods, builds safe call plans, reports dependency/server status, and documents that actual edits require a verified local VINDEX install
OCR pipeline hook Implemented locally datasets/ocr.py and Field Notes tab support local OCR prediction loading, confidence thresholds, uncertain import, human correction, and corrected JSONL export
MiniCPM Desk-Pet Not implemented Needs persona schema/export
MiniCPM-o audio tab Not implemented Needs audio tab and omnimodal backend
Cross-extension wiring Partial OCR -> Field Notes -> Training, Synthetic Gen -> Reward Eval -> DPO, Agent -> Desk-Pet Persona, and HF Spaces -> Trackio are documented; remaining wiring depends on unimplemented runtime modules

Quality Coverage

Current verified gates:

  • Structure check passes.
  • 187 unit/user-story tests pass.
  • Coverage report passes at 68%, above the current 60% configured threshold.
  • 2 lightweight performance tests pass.
  • Ruff passes.
  • Mypy passes.
  • Pylint passes at 10/10.
  • Bandit reports no issues.
  • Pip-audit reports no known vulnerabilities in .venv.
  • LM Studio /v1/models and /v1/chat/completions are verified locally for llama-3.2-1b-instruct.
  • Workbench and Plant Playwright screenshot flows pass through npm run e2e.
  • CI workflow exists but has not run remotely.
  • App launch has been verified locally through Playwright, but the server is not currently left running.

No Pretend-Done Rule

Any row marked Partial, Placeholder, or Not implemented must not be described as complete. When a row is implemented, update this file, docs/TASKS.md, and docs/IMPLEMENTATION_STATUS.md.