Collab Editor - Test Plan
Companion document to SPECIFICATION.md. Covers the critical paths that, if broken, would cause data loss, publishing failures, or security holes. Intentionally not exhaustive - the project is evolving fast, and these tests are chosen to survive refactors.
Principles
- Test behavior, not implementation details.
- Focus on what breaks silently (data loss, corrupt publish, auth bypass).
- Keep tests decoupled from internals so they survive refactors.
- Fast to run - no heavy dependencies (Playwright PDF, real HF API) in the main suite.
Stack
| Layer |
Tool |
Scope |
| Backend unit/integration |
Vitest + Supertest |
Publisher, API routes, auth guards |
| Yjs helpers |
In-memory Y.Doc fixtures |
Publisher extraction, storage logic |
| Frontend unit |
Vitest + Testing Library |
Agent tool execution, undo batching |
| E2E |
Playwright (backend/e2e, frontend/tests) |
Critical editor flows: load, edit, comments, publish round-trip |
E2E is opt-in (npm run test:e2e in backend/, plus Playwright specs in frontend/tests/); the fast suite (npm run test) stays hermetic and Playwright-free so it can run in CI without a browser install.
1. Publisher Pipeline (P0)
The core value of the product. If publishing breaks, the article disappears.
1.1 Y.Doc extraction
| # |
Test |
Given / When / Then |
Priority |
| 1.1.1 |
Extract frontmatter from Y.Doc |
Given a Y.Doc with title, authors, affiliations in Y.Map("frontmatter") / When extractFromYDoc runs / Then returned object contains all frontmatter fields with correct types |
P0 |
| 1.1.2 |
Extract content from empty doc |
Given a Y.Doc with an empty Y.XmlFragment("default") / When extracted / Then returns empty content without throwing |
P0 |
| 1.1.3 |
Extract with citations |
Given a Y.Doc with entries in Y.Map("citations") / When extracted / Then citations map is included in output as CSL-JSON |
P0 |
1.2 HTML generation
| # |
Test |
Given / When / Then |
Priority |
| 1.2.1 |
Produces valid self-contained HTML |
Given extracted doc data / When renderArticleHTML runs / Then output is valid HTML with inline CSS, no external stylesheet links |
P0 |
| 1.2.2 |
CSS variables resolved |
Given template CSS with @custom-media / When resolveCustomMedia runs / Then output contains only standard @media rules, no @custom-media |
P0 |
| 1.2.3 |
TOC generated from headings |
Given content with h2 and h3 / When rendered / Then HTML contains TOC nav with matching anchor links |
P1 |
| 1.2.4 |
Theme toggle present |
Given any doc / When rendered / Then output contains theme toggle SVG (sun/moon) and associated script |
P1 |
| 1.2.5 |
Bibliography injected |
Given doc with citations / When rendered / Then HTML contains bibliography section with formatted entries |
P0 |
1.3 Post-processing
| # |
Test |
Given / When / Then |
Priority |
| 1.3.1 |
Accordion to details/summary |
Given HTML with accordion div / When post-processed / Then output uses <details> and <summary> tags |
P1 |
| 1.3.2 |
htmlEmbed to iframe |
Given HTML with htmlEmbed node / When post-processed / Then output contains <iframe> with correct src |
P0 |
| 1.3.3 |
Mermaid to pre block |
Given HTML with mermaid node / When post-processed / Then output contains <pre class="mermaid"> |
P1 |
1.4 Publish idempotency and restore
| # |
Test |
Given / When / Then |
Priority |
| 1.4.1 |
Publish twice gives same result |
Given a Y.Doc / When published twice / Then both HTML outputs are byte-identical (no timestamps or random IDs) |
P0 |
| 1.4.2 |
Published article restored on boot |
Given published assets in HF dataset but empty local FS / When ensurePublishedRestored runs / Then data/published/default/index.html exists locally |
P0 |
| 1.4.3 |
GET / serves published article |
Given local published index.html exists / When GET / / Then response is the published HTML with 200 |
P0 |
2. Persistence and HF Storage (P0)
Data loss is game over for a collaborative editor.
2.1 Local persistence
| # |
Test |
Given / When / Then |
Priority |
| 2.1.1 |
Store writes .yjs file |
Given a Y.Doc update triggers debouncedSave via onChange / When the debounce (2s) elapses / Then data/<name>.yjs exists and contains valid Yjs binary |
P0 |
| 2.1.2 |
Fetch reads local file |
Given data/default.yjs exists / When Database.fetch / Then Y.Doc is hydrated with stored content |
P0 |
| 2.1.3 |
Fetch falls back to HF pull |
Given no local .yjs file but HF dataset has one / When Database.fetch / Then file is pulled from HF and Y.Doc is hydrated |
P0 |
2.2 HF dataset sync
| # |
Test |
Given / When / Then |
Priority |
| 2.2.1 |
Push debounced at 10s |
Given two rapid store calls / When 10s elapses / Then only one HF push is made |
P1 |
| 2.2.2 |
flushAll on SIGTERM |
Given pending debounced pushes / When SIGTERM received / Then all pending data is pushed before exit |
P0 |
2.3 Image upload
| # |
Test |
Given / When / Then |
Priority |
| 2.3.1 |
Upload returns proxy URL |
Given a valid image file / When POST /api/upload / Then response contains a /d/images/... URL routed through the editor's dataset proxy (the underlying HF dataset is private) |
P1 |
| 2.3.2 |
Reject oversized file |
Given a file > 10MB / When POST /api/upload / Then response is 413 or 400 with error message |
P1 |
| 2.3.3 |
Proxy serves images |
Given an uploaded image / When GET /d/images/<file> / Then 200 + image bytes (proxy attaches a server-side token to fetch the private HF dataset) |
P1 |
| 2.3.4 |
Proxy whitelist |
Given any path under /d/articles/... (raw Y.js drafts) / When GET / Then 404 - never expose drafts via the proxy |
P0 |
2.4 Storage status & disaster recovery
| # |
Test |
Given / When / Then |
Priority |
| 2.4.1 |
Status surfaces dataset error |
Given createRepo returns 403 / When GET /api/storage/status / Then response lastError.stage === "dataset-create" with statusCode: 403 |
P0 |
| 2.4.2 |
Status clears on recovery |
Given a previous push error / When the next push succeeds / Then lastError is null and lastCloudPushAt is updated |
P1 |
| 2.4.3 |
Status auth-gated |
Given an anonymous request (oauthEnabled) / When GET /api/storage/status / Then 403 - don't leak dataset error details |
P1 |
| 2.4.4 |
Eager creation on login |
Given a successful /api/auth/status with canEdit / When the request completes / Then ensureDatasetExists has been attempted (success surfaces within one storage-status poll, failure surfaces too) |
P0 |
| 2.4.5 |
Admin export streams .yjs |
Given an editor user / When GET /api/admin/export-doc?name=default / Then 200 + Content-Disposition: attachment + raw .yjs body |
P1 |
| 2.4.6 |
Admin export auth-gated |
Given a non-canEdit request / When GET /api/admin/export-doc / Then 403 |
P0 |
3. API Routes - HTTP Contracts (P1)
Test the request/response shape, not the internal logic.
3.1 Publish
| # |
Test |
Given / When / Then |
Priority |
| 3.1.1 |
Publish returns success |
Given a valid Y.Doc in Hocuspocus / When POST /api/publish / Then response contains { success: true, htmlUrl } |
P0 |
| 3.1.2 |
Publish writes local HTML |
Given POST /api/publish succeeds / When checking local FS / Then data/published/default/index.html exists |
P0 |
3.2 Chat (AI Agent)
| # |
Test |
Given / When / Then |
Priority |
| 3.2.1 |
Stream returns valid SSE |
Given a chat message / When POST /api/chat / Then response is text/event-stream with parseable SSE events |
P1 |
| 3.2.2 |
Tool calls included in stream |
Given a prompt that triggers a tool / When streamed / Then SSE contains tool_call events with name and arguments |
P1 |
3.3 Citations
| # |
Test |
Given / When / Then |
Priority |
| 3.3.1 |
Resolve DOI to CSL-JSON |
Given a valid DOI string / When POST /api/citations/resolve / Then response contains CSL-JSON with title and authors |
P1 |
| 3.3.2 |
Import BibTeX |
Given a valid BibTeX string / When POST /api/citations/import-bib / Then response contains CSL-JSON entries |
P1 |
| 3.3.3 |
Format to HTML bibliography |
Given CSL-JSON entries + style / When POST /api/citations/format / Then response contains HTML string |
P1 |
3.4 Auth status
| # |
Test |
Given / When / Then |
Priority |
| 3.4.1 |
Unauthenticated status |
Given no cookie / When GET /api/auth/status / Then { authenticated: false, canEdit: false } |
P1 |
| 3.4.2 |
Authenticated status |
Given valid hf_access_token cookie / When GET /api/auth/status / Then { authenticated: true, canEdit: true, user: {...} } |
P1 |
4. Auth and Security (P0)
4.1 Route protection
| # |
Test |
Given / When / Then |
Priority |
| 4.1.1 |
Publish requires auth |
Given OAuth enabled + no cookie / When POST /api/publish / Then 401 or 403 |
P0 |
| 4.1.2 |
Reset-document requires auth |
Given OAuth enabled + no cookie / When POST /api/admin/reset-document / Then 401 or 403 |
P0 |
| 4.1.3 |
Upload works without auth |
Given no cookie / When POST /api/upload with valid image / Then 200 (upload is not auth-gated per spec) |
P1 |
| 4.1.4 |
Upload rate-limited per IP |
Given more than 30 uploads within 60s from the same IP / When POST /api/upload / Then 429 with Retry-After header (anti-abuse since upload is anonymous) |
P1 |
4.2 OAuth flow
| # |
Test |
Given / When / Then |
Priority |
| 4.2.1 |
CSRF state validated |
Given an OAuth callback with wrong state / When GET /auth/callback / Then rejected (400 or redirect to error) |
P0 |
| 4.2.2 |
Cookie set on success |
Given valid OAuth callback / When processed / Then hf_access_token cookie is set (httpOnly, secure, sameSite: none) |
P0 |
4.3 WebSocket auth
| # |
Test |
Given / When / Then |
Priority |
| 4.3.1 |
WS rejected without token |
Given OAuth enabled / When WebSocket connects to /collab without token / Then connection rejected |
P0 |
| 4.3.2 |
WS accepted with valid token |
Given OAuth enabled + valid token / When WebSocket connects to /collab / Then connection accepted, sync starts |
P1 |
4.4 XSS (known risk from spec)
| # |
Test |
Given / When / Then |
Priority |
| 4.4.1 |
Licence field escaped |
Given meta.licence containing <script>alert(1)</script> / When published HTML rendered / Then script tag is escaped or stripped |
P0 |
| 4.4.2 |
Bibliography HTML sanitized |
Given biblioHtml containing malicious script / When injected into published page / Then script is escaped or stripped |
P0 |
5. AI Agent - Plumbing (P1)
We do NOT test LLM output quality. We test the infrastructure around it.
5.1 Context building
| # |
Test |
Given / When / Then |
Priority |
| 5.1.1 |
Context includes doc text |
Given a doc with content / When building chat context / Then context object contains document text |
P1 |
| 5.1.2 |
Context includes selection |
Given a text selection / When building chat context / Then context object contains selection text and position |
P1 |
| 5.1.3 |
Context includes frontmatter |
Given doc with title and authors / When building chat context / Then context object contains frontmatter fields |
P1 |
5.2 Tool execution (client-side)
| # |
Test |
Given / When / Then |
Priority |
| 5.2.1 |
replaceSelection applies |
Given a selection in the editor / When replaceSelection tool called with new text / Then selection is replaced in the TipTap doc |
P1 |
| 5.2.2 |
applyDiff does search/replace |
Given doc with "Hello world" / When applyDiff called with search="world" replace="editor" / Then doc contains "Hello editor" |
P1 |
| 5.2.3 |
updateFrontmatter modifies map |
Given frontmatter with title "Old" / When updateFrontmatter called with title="New" / Then Y.Map("frontmatter").get("title") is "New" |
P1 |
5.3 Undo batching
| # |
Test |
Given / When / Then |
Priority |
| 5.3.1 |
Agent edits batch into one undo |
Given agent executes 3 tool calls (replaceSelection + applyDiff + updateFrontmatter) / When user presses Cmd+Z once / Then all 3 changes are reverted |
P0 |
| 5.3.2 |
Manual edits not in agent batch |
Given user types text then agent edits / When user presses Cmd+Z / Then only agent edits are reverted, user text remains |
P1 |
What we do NOT test (and why)
| Area |
Reason |
| Yjs real-time sync |
Yjs/Hocuspocus are mature libs - testing their CRDT sync is noise |
| Visual rendering of components |
Project evolves too fast, snapshot tests would break constantly |
| PDF generation (Playwright) |
Heavy dependency, visual output - better as a manual smoke test |
| Slash commands / Bubble toolbar |
UI that will change - covered by the opt-in Playwright suite instead of the hermetic one |
| CSS architecture |
Not meaningfully unit-testable |
| TipTap extension registration |
Framework internals, not business logic |
Summary
| Section |
P0 tests |
P1 tests |
Total |
| 1. Publisher Pipeline |
8 |
4 |
12 |
| 2. Persistence / Storage |
4 |
3 |
7 |
| 3. API Routes |
2 |
6 |
8 |
| 4. Auth / Security |
5 |
2 |
7 |
| 5. AI Agent |
1 |
7 |
8 |
| Total |
20 |
22 |
42 |
Start with the 20 P0 tests. They cover the "if this breaks, we lose data or trust" surface. Add P1 tests as the codebase stabilizes.