Spaces:
Running on Zero
Running on Zero
| # PRD Implementation Matrix | |
| This file maps the main PRD and extension PRD to current implementation status. | |
| ## Summary | |
| The full PRD and extension PRD are not fully implemented yet. | |
| Current state: | |
| - Foundation, docs, test policy, quality gates, CI, placeholder Gradio surfaces, and a Plant | |
| Discovery reference app exist. | |
| - Shared app state, service registries, local event logging, tab-level error events, and local | |
| trace preview exist. | |
| - Local llama.cpp settings, GGUF/mmproj pickers, and command generation exist without startup downloads. | |
| - GGUF export planning exists with tool detection and explicit non-executing command plans. | |
| - Local JSONL tracing and optional Trackio wrapper exist. | |
| - Dataset statistics and local MCP tool functions exist. | |
| - OCR correction loop exists locally for CSV/JSONL prediction imports into Field Notes. | |
| - VINDEX integration boundary exists locally as non-executing MCP-style planning tools. | |
| - Local non-autonomous agent mode exists with trace export. | |
| - Real local model inference is partially implemented through llama.cpp, llama-cpp-python, Ollama, | |
| OpenAI-compatible/LM Studio, SGLang, Transformers text, and MiniCPM vision services. Verified | |
| local paths now include llama.cpp CLI text, llama-cpp-python GGUF text, LM Studio text, and | |
| OpenBMB MiniCPM-V Plant image inference. The Status | |
| tab includes llama.cpp setup, LM Studio/OpenAI-compatible setup, SGLang command/check/stop setup, | |
| and Ollama local model listing plus explicit pull-command planning. LM Studio text generation is | |
| live-verified with `llama-3.2-1b-instruct`; the other real backends still need local verification. | |
| - `WORKBENCH_DEPLOYMENT=space` now hides placeholder backend choices and refuses placeholder/demo | |
| service creation for deployed app paths. | |
| - LoRA training execution, served MCP endpoint, deployment, and most extensions are not implemented. | |
| - Placeholder services remain intentionally visible so the app never pretends to be real inference. | |
| ## Main PRD | |
| | PRD Area | Status | Evidence / Next Step | | |
| | --- | --- | --- | | |
| | Purpose and design philosophy | Documented | `README.md`, `docs/ROADMAP.md` | | |
| | Template architecture | Partial | Config-driven model catalog exists; `docs/TEMPLATE_HOWTO.md` and `plant/` show the first domain-app pattern | | |
| | System architecture | Partial | `app.py`, `core/`, `models/`, `ui/`, `datasets/`, local app state/events | | |
| | Model registry | Partial | `config/models.yaml`, `models/model_catalog.py`; includes GGUF and backend capability metadata | | |
| | Five inference modes | Partial | llama.cpp, llama-cpp-python, OpenAI-compatible/LM Studio, and MiniCPM-V Plant image inference are locally verified; Ollama generation, SGLang server generation, vLLM server generation, and llama.cpp mmproj vision remain unverified | | |
| | Trackio | Partial | Local traces, optional Trackio wrapper, and HF Space sync docs exist; credentials/package setup still missing | | |
| | MCP layer | Partial | Local tool functions, Gradio-native MCP path metadata, `mcp_server=True` launch flag, and local invocation tests exist; full external client verification still missing | | |
| | Training pipeline | Partial | `training/` package supports dry-run planning, non-executing LoRA request planning, export planning, exact-match/perplexity evaluation, and local logging; real PEFT/TRL execution missing | | |
| | Export and quantization | Partial | `training/export.py` and Export tab plan downloads/conversion/quantization and expose existing exported files for download; execution still missing | | |
| | Agent mode | Partial | Local deterministic task and paper-to-code trace loops exist with safety gates; autonomous execution and remote uploads missing | | |
| | UI tabs | Partial | Tabs exist; Chat/Vision/Dataset/Field Notes/Status have behavior; Status includes SGLang setup; tab actions have Gradio progress indicators; Chat/Vision/Dataset have tab-level status/error messages; compact responsive CSS exists; several tabs are still placeholders | | |
| | Field notes | Partial | CSV save, SQLite store, corrected/tag/training filters, media paths, OCR uncertain import, JSONL export, and local HF Dataset export exist; remote HF upload missing | | |
| | Directory structure | Partial | Foundation exists; many PRD packages missing | | |
| | Configuration schema | Partial | Model/training config plus ignored local backend config exists; validation is lightweight | | |
| | Dependencies | Partial | Runtime/dev deps exist for scaffold; full model/training deps not added | | |
| | Hackathon demo flow | Partial | `docs/HACKATHON_SUBMISSION.md` drafts story, user, demo flow, script, social post, and URLs; real backend and Space URL still missing | | |
| | Corrections from PRD v1 | Documented in PRD | Not all implemented | | |
| | Roadmap and extension points | Documented | `docs/ROADMAP.md`, `docs/TASKS.md` | | |
| ## Extension PRD | |
| | Extension | Status | Evidence / Next Step | | |
| | --- | --- | --- | | |
| | vLLM serving tab | Implemented locally, not locally verified | `models/vllm_runner.py` and vLLM tab provide command planning, health checks, metrics parsing, benchmark logging, and OpenAI-compatible chat client; needs installed/running vLLM for real serving | | |
| | Ollama quick-start | Partial | Service, UI backend selector, local model listing, explicit pull-command planning, and setup docs exist; local Ollama install/real model verification missing | | |
| | Reward model eval | Implemented locally | `training/reward_eval.py` provides deterministic reward scoring, best-of-N, DPO pairs, and LoRA-vs-base reward reports | | |
| | Synthetic data generation | Implemented locally | `datasets/synthetic.py` provides deterministic generation, validation, filtering, augmentation, and JSONL export | | |
| | Paper-to-code agent | Implemented locally | Agent tab and `agent/runner.py` support paper input, research/plan/implementation/verify trace, and safety gates without autonomous execution | | |
| | HF Spaces deploy | Partial | README metadata, deployment helper, command plan, required-file validation, Workbench/Plant target URLs, and remote/build status checks exist; HF auth/remote/push/build verification still missing | | |
| | VINDEX integration | Implemented locally, execution disabled | `mcp_tools/vindex_tool.py` validates VINDEX methods, builds safe call plans, reports dependency/server status, and documents that actual edits require a verified local VINDEX install | | |
| | OCR pipeline hook | Implemented locally | `datasets/ocr.py` and Field Notes tab support local OCR prediction loading, confidence thresholds, uncertain import, human correction, and corrected JSONL export | | |
| | MiniCPM Desk-Pet | Not implemented | Needs persona schema/export | | |
| | MiniCPM-o audio tab | Not implemented | Needs audio tab and omnimodal backend | | |
| | Cross-extension wiring | Partial | OCR -> Field Notes -> Training, Synthetic Gen -> Reward Eval -> DPO, Agent -> Desk-Pet Persona, and HF Spaces -> Trackio are documented; remaining wiring depends on unimplemented runtime modules | | |
| ## Quality Coverage | |
| Current verified gates: | |
| - Structure check passes. | |
| - 187 unit/user-story tests pass. | |
| - Coverage report passes at 68%, above the current 60% configured threshold. | |
| - 2 lightweight performance tests pass. | |
| - Ruff passes. | |
| - Mypy passes. | |
| - Pylint passes at 10/10. | |
| - Bandit reports no issues. | |
| - Pip-audit reports no known vulnerabilities in `.venv`. | |
| - LM Studio `/v1/models` and `/v1/chat/completions` are verified locally for | |
| `llama-3.2-1b-instruct`. | |
| - Workbench and Plant Playwright screenshot flows pass through `npm run e2e`. | |
| - CI workflow exists but has not run remotely. | |
| - App launch has been verified locally through Playwright, but the server is not currently left running. | |
| ## No Pretend-Done Rule | |
| Any row marked `Partial`, `Placeholder`, or `Not implemented` must not be described as complete. | |
| When a row is implemented, update this file, `docs/TASKS.md`, and `docs/IMPLEMENTATION_STATUS.md`. | |