File size: 14,174 Bytes
755a930 5576f9a 755a930 5576f9a 755a930 5576f9a 755a930 c362999 755a930 c362999 755a930 7a42df5 755a930 5576f9a 755a930 5576f9a 755a930 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 | # Collab Editor - Test Plan
Companion document to [SPECIFICATION.md](./SPECIFICATION.md). Covers the critical paths that, if broken, would cause data loss, publishing failures, or security holes. Intentionally **not** exhaustive - the project is evolving fast, and these tests are chosen to survive refactors.
## Principles
- Test **behavior**, not implementation details.
- Focus on what **breaks silently** (data loss, corrupt publish, auth bypass).
- Keep tests **decoupled from internals** so they survive refactors.
- Fast to run - no heavy dependencies (Playwright PDF, real HF API) in the main suite.
## Stack
| Layer | Tool | Scope |
|-------|------|-------|
| Backend unit/integration | Vitest + Supertest | Publisher, API routes, auth guards |
| Yjs helpers | In-memory Y.Doc fixtures | Publisher extraction, storage logic |
| Frontend unit | Vitest + Testing Library | Agent tool execution, undo batching |
| E2E | Playwright (`backend/e2e`, `frontend/tests`) | Critical editor flows: load, edit, comments, publish round-trip |
E2E is opt-in (`npm run test:e2e` in `backend/`, plus Playwright specs in `frontend/tests/`); the fast suite (`npm run test`) stays hermetic and Playwright-free so it can run in CI without a browser install.
---
## 1. Publisher Pipeline (P0)
The core value of the product. If publishing breaks, the article disappears.
### 1.1 Y.Doc extraction
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 1.1.1 | Extract frontmatter from Y.Doc | Given a Y.Doc with title, authors, affiliations in `Y.Map("frontmatter")` / When `extractFromYDoc` runs / Then returned object contains all frontmatter fields with correct types | P0 |
| 1.1.2 | Extract content from empty doc | Given a Y.Doc with an empty `Y.XmlFragment("default")` / When extracted / Then returns empty content without throwing | P0 |
| 1.1.3 | Extract with citations | Given a Y.Doc with entries in `Y.Map("citations")` / When extracted / Then citations map is included in output as CSL-JSON | P0 |
### 1.2 HTML generation
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 1.2.1 | Produces valid self-contained HTML | Given extracted doc data / When `renderArticleHTML` runs / Then output is valid HTML with inline CSS, no external stylesheet links | P0 |
| 1.2.2 | CSS variables resolved | Given template CSS with `@custom-media` / When `resolveCustomMedia` runs / Then output contains only standard `@media` rules, no `@custom-media` | P0 |
| 1.2.3 | TOC generated from headings | Given content with h2 and h3 / When rendered / Then HTML contains TOC nav with matching anchor links | P1 |
| 1.2.4 | Theme toggle present | Given any doc / When rendered / Then output contains theme toggle SVG (sun/moon) and associated script | P1 |
| 1.2.5 | Bibliography injected | Given doc with citations / When rendered / Then HTML contains bibliography section with formatted entries | P0 |
### 1.3 Post-processing
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 1.3.1 | Accordion to details/summary | Given HTML with accordion div / When post-processed / Then output uses `<details>` and `<summary>` tags | P1 |
| 1.3.2 | htmlEmbed to iframe | Given HTML with htmlEmbed node / When post-processed / Then output contains `<iframe>` with correct src | P0 |
| 1.3.3 | Mermaid to pre block | Given HTML with mermaid node / When post-processed / Then output contains `<pre class="mermaid">` | P1 |
### 1.4 Publish idempotency and restore
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 1.4.1 | Publish twice gives same result | Given a Y.Doc / When published twice / Then both HTML outputs are byte-identical (no timestamps or random IDs) | P0 |
| 1.4.2 | Published article restored on boot | Given published assets in HF dataset but empty local FS / When `ensurePublishedRestored` runs / Then `data/published/default/index.html` exists locally | P0 |
| 1.4.3 | GET / serves published article | Given local published index.html exists / When GET / / Then response is the published HTML with 200 | P0 |
---
## 2. Persistence and HF Storage (P0)
Data loss is game over for a collaborative editor.
### 2.1 Local persistence
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 2.1.1 | Store writes .yjs file | Given a Y.Doc update triggers `debouncedSave` via `onChange` / When the debounce (2s) elapses / Then `data/<name>.yjs` exists and contains valid Yjs binary | P0 |
| 2.1.2 | Fetch reads local file | Given `data/default.yjs` exists / When Database.fetch / Then Y.Doc is hydrated with stored content | P0 |
| 2.1.3 | Fetch falls back to HF pull | Given no local .yjs file but HF dataset has one / When Database.fetch / Then file is pulled from HF and Y.Doc is hydrated | P0 |
### 2.2 HF dataset sync
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 2.2.1 | Push debounced at 10s | Given two rapid store calls / When 10s elapses / Then only one HF push is made | P1 |
| 2.2.2 | flushAll on SIGTERM | Given pending debounced pushes / When SIGTERM received / Then all pending data is pushed before exit | P0 |
### 2.3 Image upload
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 2.3.1 | Upload returns proxy URL | Given a valid image file / When POST /api/upload / Then response contains a `/d/images/...` URL routed through the editor's dataset proxy (the underlying HF dataset is private) | P1 |
| 2.3.2 | Reject oversized file | Given a file > 10MB / When POST /api/upload / Then response is 413 or 400 with error message | P1 |
| 2.3.3 | Proxy serves images | Given an uploaded image / When GET `/d/images/<file>` / Then 200 + image bytes (proxy attaches a server-side token to fetch the private HF dataset) | P1 |
| 2.3.4 | Proxy whitelist | Given any path under `/d/articles/...` (raw Y.js drafts) / When GET / Then 404 - never expose drafts via the proxy | P0 |
### 2.4 Storage status & disaster recovery
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 2.4.1 | Status surfaces dataset error | Given `createRepo` returns 403 / When GET /api/storage/status / Then response `lastError.stage === "dataset-create"` with `statusCode: 403` | P0 |
| 2.4.2 | Status clears on recovery | Given a previous push error / When the next push succeeds / Then `lastError` is null and `lastCloudPushAt` is updated | P1 |
| 2.4.3 | Status auth-gated | Given an anonymous request (oauthEnabled) / When GET /api/storage/status / Then 403 - don't leak dataset error details | P1 |
| 2.4.4 | Eager creation on login | Given a successful /api/auth/status with canEdit / When the request completes / Then `ensureDatasetExists` has been attempted (success surfaces within one storage-status poll, failure surfaces too) | P0 |
| 2.4.5 | Admin export streams .yjs | Given an editor user / When GET /api/admin/export-doc?name=default / Then 200 + `Content-Disposition: attachment` + raw .yjs body | P1 |
| 2.4.6 | Admin export auth-gated | Given a non-canEdit request / When GET /api/admin/export-doc / Then 403 | P0 |
---
## 3. API Routes - HTTP Contracts (P1)
Test the request/response shape, not the internal logic.
### 3.1 Publish
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 3.1.1 | Publish returns success | Given a valid Y.Doc in Hocuspocus / When POST /api/publish / Then response contains `{ success: true, htmlUrl }` | P0 |
| 3.1.2 | Publish writes local HTML | Given POST /api/publish succeeds / When checking local FS / Then `data/published/default/index.html` exists | P0 |
### 3.2 Chat (AI Agent)
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 3.2.1 | Stream returns valid SSE | Given a chat message / When POST /api/chat / Then response is `text/event-stream` with parseable SSE events | P1 |
| 3.2.2 | Tool calls included in stream | Given a prompt that triggers a tool / When streamed / Then SSE contains tool_call events with name and arguments | P1 |
### 3.3 Citations
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 3.3.1 | Resolve DOI to CSL-JSON | Given a valid DOI string / When POST /api/citations/resolve / Then response contains CSL-JSON with title and authors | P1 |
| 3.3.2 | Import BibTeX | Given a valid BibTeX string / When POST /api/citations/import-bib / Then response contains CSL-JSON entries | P1 |
| 3.3.3 | Format to HTML bibliography | Given CSL-JSON entries + style / When POST /api/citations/format / Then response contains HTML string | P1 |
### 3.4 Auth status
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 3.4.1 | Unauthenticated status | Given no cookie / When GET /api/auth/status / Then `{ authenticated: false, canEdit: false }` | P1 |
| 3.4.2 | Authenticated status | Given valid hf_access_token cookie / When GET /api/auth/status / Then `{ authenticated: true, canEdit: true, user: {...} }` | P1 |
---
## 4. Auth and Security (P0)
### 4.1 Route protection
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 4.1.1 | Publish requires auth | Given OAuth enabled + no cookie / When POST /api/publish / Then 401 or 403 | P0 |
| 4.1.2 | Reset-document requires auth | Given OAuth enabled + no cookie / When POST /api/admin/reset-document / Then 401 or 403 | P0 |
| 4.1.3 | Upload works without auth | Given no cookie / When POST /api/upload with valid image / Then 200 (upload is not auth-gated per spec) | P1 |
| 4.1.4 | Upload rate-limited per IP | Given more than 30 uploads within 60s from the same IP / When POST /api/upload / Then 429 with `Retry-After` header (anti-abuse since upload is anonymous) | P1 |
### 4.2 OAuth flow
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 4.2.1 | CSRF state validated | Given an OAuth callback with wrong state / When GET /auth/callback / Then rejected (400 or redirect to error) | P0 |
| 4.2.2 | Cookie set on success | Given valid OAuth callback / When processed / Then `hf_access_token` cookie is set (httpOnly, secure, sameSite: none) | P0 |
### 4.3 WebSocket auth
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 4.3.1 | WS rejected without token | Given OAuth enabled / When WebSocket connects to /collab without token / Then connection rejected | P0 |
| 4.3.2 | WS accepted with valid token | Given OAuth enabled + valid token / When WebSocket connects to /collab / Then connection accepted, sync starts | P1 |
### 4.4 XSS (known risk from spec)
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 4.4.1 | Licence field escaped | Given `meta.licence` containing `<script>alert(1)</script>` / When published HTML rendered / Then script tag is escaped or stripped | P0 |
| 4.4.2 | Bibliography HTML sanitized | Given biblioHtml containing malicious script / When injected into published page / Then script is escaped or stripped | P0 |
---
## 5. AI Agent - Plumbing (P1)
We do NOT test LLM output quality. We test the infrastructure around it.
### 5.1 Context building
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 5.1.1 | Context includes doc text | Given a doc with content / When building chat context / Then context object contains document text | P1 |
| 5.1.2 | Context includes selection | Given a text selection / When building chat context / Then context object contains selection text and position | P1 |
| 5.1.3 | Context includes frontmatter | Given doc with title and authors / When building chat context / Then context object contains frontmatter fields | P1 |
### 5.2 Tool execution (client-side)
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 5.2.1 | replaceSelection applies | Given a selection in the editor / When replaceSelection tool called with new text / Then selection is replaced in the TipTap doc | P1 |
| 5.2.2 | applyDiff does search/replace | Given doc with "Hello world" / When applyDiff called with search="world" replace="editor" / Then doc contains "Hello editor" | P1 |
| 5.2.3 | updateFrontmatter modifies map | Given frontmatter with title "Old" / When updateFrontmatter called with title="New" / Then `Y.Map("frontmatter").get("title")` is "New" | P1 |
### 5.3 Undo batching
| # | Test | Given / When / Then | Priority |
|---|------|---------------------|----------|
| 5.3.1 | Agent edits batch into one undo | Given agent executes 3 tool calls (replaceSelection + applyDiff + updateFrontmatter) / When user presses Cmd+Z once / Then all 3 changes are reverted | P0 |
| 5.3.2 | Manual edits not in agent batch | Given user types text then agent edits / When user presses Cmd+Z / Then only agent edits are reverted, user text remains | P1 |
---
## What we do NOT test (and why)
| Area | Reason |
|------|--------|
| Yjs real-time sync | Yjs/Hocuspocus are mature libs - testing their CRDT sync is noise |
| Visual rendering of components | Project evolves too fast, snapshot tests would break constantly |
| PDF generation (Playwright) | Heavy dependency, visual output - better as a manual smoke test |
| Slash commands / Bubble toolbar | UI that will change - covered by the opt-in Playwright suite instead of the hermetic one |
| CSS architecture | Not meaningfully unit-testable |
| TipTap extension registration | Framework internals, not business logic |
---
## Summary
| Section | P0 tests | P1 tests | Total |
|---------|----------|----------|-------|
| 1. Publisher Pipeline | 8 | 4 | 12 |
| 2. Persistence / Storage | 4 | 3 | 7 |
| 3. API Routes | 2 | 6 | 8 |
| 4. Auth / Security | 5 | 2 | 7 |
| 5. AI Agent | 1 | 7 | 8 |
| **Total** | **20** | **22** | **42** |
Start with the 20 P0 tests. They cover the "if this breaks, we lose data or trust" surface. Add P1 tests as the codebase stabilizes.
|