File size: 7,848 Bytes
7f9dfed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# PRD Implementation Matrix

This file maps the main PRD and extension PRD to current implementation status.

## Summary

The full PRD and extension PRD are not fully implemented yet.

Current state:

- Foundation, docs, test policy, quality gates, CI, placeholder Gradio surfaces, and a Plant
  Discovery reference app exist.
- Shared app state, service registries, local event logging, tab-level error events, and local
  trace preview exist.
- Local llama.cpp settings, GGUF/mmproj pickers, and command generation exist without startup downloads.
- GGUF export planning exists with tool detection and explicit non-executing command plans.
- Local JSONL tracing and optional Trackio wrapper exist.
- Dataset statistics and local MCP tool functions exist.
- OCR correction loop exists locally for CSV/JSONL prediction imports into Field Notes.
- VINDEX integration boundary exists locally as non-executing MCP-style planning tools.
- Local non-autonomous agent mode exists with trace export.
- Real local model inference is partially implemented through llama.cpp, llama-cpp-python, Ollama,
  OpenAI-compatible/LM Studio, SGLang, Transformers text, and MiniCPM vision services. Verified
  local paths now include llama.cpp CLI text, llama-cpp-python GGUF text, LM Studio text, and
  OpenBMB MiniCPM-V Plant image inference. The Status
  tab includes llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang command/check/stop setup,
  and Ollama local model listing plus explicit pull-command planning. LM Studio text generation is
  live-verified with `llama-3.2-1b-instruct`; the other real backends still need local verification.
- `WORKBENCH_DEPLOYMENT=space` now hides placeholder backend choices and refuses placeholder/demo
  service creation for deployed app paths.
- LoRA training execution, served MCP endpoint, deployment, and most extensions are not implemented.
- Placeholder services remain intentionally visible so the app never pretends to be real inference.

## Main PRD

| PRD Area | Status | Evidence / Next Step |
| --- | --- | --- |
| Purpose and design philosophy | Documented | `README.md`, `docs/ROADMAP.md` |
| Template architecture | Partial | Config-driven model catalog exists; `docs/TEMPLATE_HOWTO.md` and `plant/` show the first domain-app pattern |
| System architecture | Partial | `app.py`, `core/`, `models/`, `ui/`, `datasets/`, local app state/events |
| Model registry | Partial | `config/models.yaml`, `models/model_catalog.py`; includes GGUF and backend capability metadata |
| Five inference modes | Partial | llama.cpp, llama-cpp-python, OpenAI-compatible/LM Studio, and MiniCPM-V Plant image inference are locally verified; Ollama generation, SGLang server generation, vLLM server generation, and llama.cpp mmproj vision remain unverified |
| Trackio | Partial | Local traces, optional Trackio wrapper, and HF Space sync docs exist; credentials/package setup still missing |
| MCP layer | Partial | Local tool functions, Gradio-native MCP path metadata, `mcp_server=True` launch flag, and local invocation tests exist; full external client verification still missing |
| Training pipeline | Partial | `training/` package supports dry-run planning, non-executing LoRA request planning, export planning, exact-match/perplexity evaluation, and local logging; real PEFT/TRL execution missing |
| Export and quantization | Partial | `training/export.py` and Export tab plan downloads/conversion/quantization and expose existing exported files for download; execution still missing |
| Agent mode | Partial | Local deterministic task and paper-to-code trace loops exist with safety gates; autonomous execution and remote uploads missing |
| UI tabs | Partial | Tabs exist; Chat/Vision/Dataset/Field Notes/Status have behavior; Status includes SGLang setup; tab actions have Gradio progress indicators; Chat/Vision/Dataset have tab-level status/error messages; compact responsive CSS exists; several tabs are still placeholders |
| Field notes | Partial | CSV save, SQLite store, corrected/tag/training filters, media paths, OCR uncertain import, JSONL export, and local HF Dataset export exist; remote HF upload missing |
| Directory structure | Partial | Foundation exists; many PRD packages missing |
| Configuration schema | Partial | Model/training config plus ignored local backend config exists; validation is lightweight |
| Dependencies | Partial | Runtime/dev deps exist for scaffold; full model/training deps not added |
| Hackathon demo flow | Partial | `docs/HACKATHON_SUBMISSION.md` drafts story, user, demo flow, script, social post, and URLs; real backend and Space URL still missing |
| Corrections from PRD v1 | Documented in PRD | Not all implemented |
| Roadmap and extension points | Documented | `docs/ROADMAP.md`, `docs/TASKS.md` |

## Extension PRD

| Extension | Status | Evidence / Next Step |
| --- | --- | --- |
| vLLM serving tab | Implemented locally, not locally verified | `models/vllm_runner.py` and vLLM tab provide command planning, health checks, metrics parsing, benchmark logging, and OpenAI-compatible chat client; needs installed/running vLLM for real serving |
| Ollama quick-start | Partial | Service, UI backend selector, local model listing, explicit pull-command planning, and setup docs exist; local Ollama install/real model verification missing |
| Reward model eval | Implemented locally | `training/reward_eval.py` provides deterministic reward scoring, best-of-N, DPO pairs, and LoRA-vs-base reward reports |
| Synthetic data generation | Implemented locally | `datasets/synthetic.py` provides deterministic generation, validation, filtering, augmentation, and JSONL export |
| Paper-to-code agent | Implemented locally | Agent tab and `agent/runner.py` support paper input, research/plan/implementation/verify trace, and safety gates without autonomous execution |
| HF Spaces deploy | Partial | README metadata, deployment helper, command plan, required-file validation, Workbench/Plant target URLs, and remote/build status checks exist; HF auth/remote/push/build verification still missing |
| VINDEX integration | Implemented locally, execution disabled | `mcp_tools/vindex_tool.py` validates VINDEX methods, builds safe call plans, reports dependency/server status, and documents that actual edits require a verified local VINDEX install |
| OCR pipeline hook | Implemented locally | `datasets/ocr.py` and Field Notes tab support local OCR prediction loading, confidence thresholds, uncertain import, human correction, and corrected JSONL export |
| MiniCPM Desk-Pet | Not implemented | Needs persona schema/export |
| MiniCPM-o audio tab | Not implemented | Needs audio tab and omnimodal backend |
| Cross-extension wiring | Partial | OCR -> Field Notes -> Training, Synthetic Gen -> Reward Eval -> DPO, Agent -> Desk-Pet Persona, and HF Spaces -> Trackio are documented; remaining wiring depends on unimplemented runtime modules |

## Quality Coverage

Current verified gates:

- Structure check passes.
- 187 unit/user-story tests pass.
- Coverage report passes at 68%, above the current 60% configured threshold.
- 2 lightweight performance tests pass.
- Ruff passes.
- Mypy passes.
- Pylint passes at 10/10.
- Bandit reports no issues.
- Pip-audit reports no known vulnerabilities in `.venv`.
- LM Studio `/v1/models` and `/v1/chat/completions` are verified locally for
  `llama-3.2-1b-instruct`.
- Workbench and Plant Playwright screenshot flows pass through `npm run e2e`.
- CI workflow exists but has not run remotely.
- App launch has been verified locally through Playwright, but the server is not currently left running.

## No Pretend-Done Rule

Any row marked `Partial`, `Placeholder`, or `Not implemented` must not be described as complete.
When a row is implemented, update this file, `docs/TASKS.md`, and `docs/IMPLEMENTATION_STATUS.md`.