# PRD Implementation Matrix This file maps the main PRD and extension PRD to current implementation status. ## Summary The full PRD and extension PRD are not fully implemented yet. Current state: - Foundation, docs, test policy, quality gates, CI, placeholder Gradio surfaces, and a Plant Discovery reference app exist. - Shared app state, service registries, local event logging, tab-level error events, and local trace preview exist. - Local llama.cpp settings, GGUF/mmproj pickers, and command generation exist without startup downloads. - GGUF export planning exists with tool detection and explicit non-executing command plans. - Local JSONL tracing and optional Trackio wrapper exist. - Dataset statistics and local MCP tool functions exist. - OCR correction loop exists locally for CSV/JSONL prediction imports into Field Notes. - VINDEX integration boundary exists locally as non-executing MCP-style planning tools. - Local non-autonomous agent mode exists with trace export. - Real local model inference is partially implemented through llama.cpp, llama-cpp-python, Ollama, OpenAI-compatible/LM Studio, SGLang, Transformers text, and MiniCPM vision services. Verified local paths now include llama.cpp CLI text, llama-cpp-python GGUF text, LM Studio text, and OpenBMB MiniCPM-V Plant image inference. The Status tab includes llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang command/check/stop setup, and Ollama local model listing plus explicit pull-command planning. LM Studio text generation is live-verified with `llama-3.2-1b-instruct`; the other real backends still need local verification. - `WORKBENCH_DEPLOYMENT=space` now hides placeholder backend choices and refuses placeholder/demo service creation for deployed app paths. - LoRA training execution, served MCP endpoint, deployment, and most extensions are not implemented. - Placeholder services remain intentionally visible so the app never pretends to be real inference. ## Main PRD | PRD Area | Status | Evidence / Next Step | | --- | --- | --- | | Purpose and design philosophy | Documented | `README.md`, `docs/ROADMAP.md` | | Template architecture | Partial | Config-driven model catalog exists; `docs/TEMPLATE_HOWTO.md` and `plant/` show the first domain-app pattern | | System architecture | Partial | `app.py`, `core/`, `models/`, `ui/`, `datasets/`, local app state/events | | Model registry | Partial | `config/models.yaml`, `models/model_catalog.py`; includes GGUF and backend capability metadata | | Five inference modes | Partial | llama.cpp, llama-cpp-python, OpenAI-compatible/LM Studio, and MiniCPM-V Plant image inference are locally verified; Ollama generation, SGLang server generation, vLLM server generation, and llama.cpp mmproj vision remain unverified | | Trackio | Partial | Local traces, optional Trackio wrapper, and HF Space sync docs exist; credentials/package setup still missing | | MCP layer | Partial | Local tool functions, Gradio-native MCP path metadata, `mcp_server=True` launch flag, and local invocation tests exist; full external client verification still missing | | Training pipeline | Partial | `training/` package supports dry-run planning, non-executing LoRA request planning, export planning, exact-match/perplexity evaluation, and local logging; real PEFT/TRL execution missing | | Export and quantization | Partial | `training/export.py` and Export tab plan downloads/conversion/quantization and expose existing exported files for download; execution still missing | | Agent mode | Partial | Local deterministic task and paper-to-code trace loops exist with safety gates; autonomous execution and remote uploads missing | | UI tabs | Partial | Tabs exist; Chat/Vision/Dataset/Field Notes/Status have behavior; Status includes SGLang setup; tab actions have Gradio progress indicators; Chat/Vision/Dataset have tab-level status/error messages; compact responsive CSS exists; several tabs are still placeholders | | Field notes | Partial | CSV save, SQLite store, corrected/tag/training filters, media paths, OCR uncertain import, JSONL export, and local HF Dataset export exist; remote HF upload missing | | Directory structure | Partial | Foundation exists; many PRD packages missing | | Configuration schema | Partial | Model/training config plus ignored local backend config exists; validation is lightweight | | Dependencies | Partial | Runtime/dev deps exist for scaffold; full model/training deps not added | | Hackathon demo flow | Partial | `docs/HACKATHON_SUBMISSION.md` drafts story, user, demo flow, script, social post, and URLs; real backend and Space URL still missing | | Corrections from PRD v1 | Documented in PRD | Not all implemented | | Roadmap and extension points | Documented | `docs/ROADMAP.md`, `docs/TASKS.md` | ## Extension PRD | Extension | Status | Evidence / Next Step | | --- | --- | --- | | vLLM serving tab | Implemented locally, not locally verified | `models/vllm_runner.py` and vLLM tab provide command planning, health checks, metrics parsing, benchmark logging, and OpenAI-compatible chat client; needs installed/running vLLM for real serving | | Ollama quick-start | Partial | Service, UI backend selector, local model listing, explicit pull-command planning, and setup docs exist; local Ollama install/real model verification missing | | Reward model eval | Implemented locally | `training/reward_eval.py` provides deterministic reward scoring, best-of-N, DPO pairs, and LoRA-vs-base reward reports | | Synthetic data generation | Implemented locally | `datasets/synthetic.py` provides deterministic generation, validation, filtering, augmentation, and JSONL export | | Paper-to-code agent | Implemented locally | Agent tab and `agent/runner.py` support paper input, research/plan/implementation/verify trace, and safety gates without autonomous execution | | HF Spaces deploy | Partial | README metadata, deployment helper, command plan, required-file validation, Workbench/Plant target URLs, and remote/build status checks exist; HF auth/remote/push/build verification still missing | | VINDEX integration | Implemented locally, execution disabled | `mcp_tools/vindex_tool.py` validates VINDEX methods, builds safe call plans, reports dependency/server status, and documents that actual edits require a verified local VINDEX install | | OCR pipeline hook | Implemented locally | `datasets/ocr.py` and Field Notes tab support local OCR prediction loading, confidence thresholds, uncertain import, human correction, and corrected JSONL export | | MiniCPM Desk-Pet | Not implemented | Needs persona schema/export | | MiniCPM-o audio tab | Not implemented | Needs audio tab and omnimodal backend | | Cross-extension wiring | Partial | OCR -> Field Notes -> Training, Synthetic Gen -> Reward Eval -> DPO, Agent -> Desk-Pet Persona, and HF Spaces -> Trackio are documented; remaining wiring depends on unimplemented runtime modules | ## Quality Coverage Current verified gates: - Structure check passes. - 187 unit/user-story tests pass. - Coverage report passes at 68%, above the current 60% configured threshold. - 2 lightweight performance tests pass. - Ruff passes. - Mypy passes. - Pylint passes at 10/10. - Bandit reports no issues. - Pip-audit reports no known vulnerabilities in `.venv`. - LM Studio `/v1/models` and `/v1/chat/completions` are verified locally for `llama-3.2-1b-instruct`. - Workbench and Plant Playwright screenshot flows pass through `npm run e2e`. - CI workflow exists but has not run remotely. - App launch has been verified locally through Playwright, but the server is not currently left running. ## No Pretend-Done Rule Any row marked `Partial`, `Placeholder`, or `Not implemented` must not be described as complete. When a row is implemented, update this file, `docs/TASKS.md`, and `docs/IMPLEMENTATION_STATUS.md`.