carbon-tokenization

Running

App Files Files Community

carbon-tokenization / docs /TESTS.md

tfrere HF Staff

feat(storage): first-class data - no silent failures in the persistence pipeline

7a42df5 22 days ago

preview code

raw

history blame contribute delete

14.2 kB

Collab Editor - Test Plan

Companion document to SPECIFICATION.md. Covers the critical paths that, if broken, would cause data loss, publishing failures, or security holes. Intentionally not exhaustive - the project is evolving fast, and these tests are chosen to survive refactors.

Principles

Test behavior, not implementation details.
Focus on what breaks silently (data loss, corrupt publish, auth bypass).
Keep tests decoupled from internals so they survive refactors.
Fast to run - no heavy dependencies (Playwright PDF, real HF API) in the main suite.

Stack

Layer	Tool	Scope
Backend unit/integration	Vitest + Supertest	Publisher, API routes, auth guards
Yjs helpers	In-memory Y.Doc fixtures	Publisher extraction, storage logic
Frontend unit	Vitest + Testing Library	Agent tool execution, undo batching
E2E	Playwright (`backend/e2e`, `frontend/tests`)	Critical editor flows: load, edit, comments, publish round-trip

E2E is opt-in (npm run test:e2e in backend/, plus Playwright specs in frontend/tests/); the fast suite (npm run test) stays hermetic and Playwright-free so it can run in CI without a browser install.

1. Publisher Pipeline (P0)

The core value of the product. If publishing breaks, the article disappears.

1.1 Y.Doc extraction

#	Test	Given / When / Then	Priority
1.1.1	Extract frontmatter from Y.Doc	Given a Y.Doc with title, authors, affiliations in `Y.Map("frontmatter")` / When `extractFromYDoc` runs / Then returned object contains all frontmatter fields with correct types	P0
1.1.2	Extract content from empty doc	Given a Y.Doc with an empty `Y.XmlFragment("default")` / When extracted / Then returns empty content without throwing	P0
1.1.3	Extract with citations	Given a Y.Doc with entries in `Y.Map("citations")` / When extracted / Then citations map is included in output as CSL-JSON	P0

1.2 HTML generation

#	Test	Given / When / Then	Priority
1.2.1	Produces valid self-contained HTML	Given extracted doc data / When `renderArticleHTML` runs / Then output is valid HTML with inline CSS, no external stylesheet links	P0
1.2.2	CSS variables resolved	Given template CSS with `@custom-media` / When `resolveCustomMedia` runs / Then output contains only standard `@media` rules, no `@custom-media`	P0
1.2.3	TOC generated from headings	Given content with h2 and h3 / When rendered / Then HTML contains TOC nav with matching anchor links	P1
1.2.4	Theme toggle present	Given any doc / When rendered / Then output contains theme toggle SVG (sun/moon) and associated script	P1
1.2.5	Bibliography injected	Given doc with citations / When rendered / Then HTML contains bibliography section with formatted entries	P0

1.3 Post-processing

#	Test	Given / When / Then	Priority
1.3.1	Accordion to details/summary	Given HTML with accordion div / When post-processed / Then output uses `<details>` and `<summary>` tags	P1
1.3.2	htmlEmbed to iframe	Given HTML with htmlEmbed node / When post-processed / Then output contains `<iframe>` with correct src	P0
1.3.3	Mermaid to pre block	Given HTML with mermaid node / When post-processed / Then output contains `<pre class="mermaid">`	P1

1.4 Publish idempotency and restore

#	Test	Given / When / Then	Priority
1.4.1	Publish twice gives same result	Given a Y.Doc / When published twice / Then both HTML outputs are byte-identical (no timestamps or random IDs)	P0
1.4.2	Published article restored on boot	Given published assets in HF dataset but empty local FS / When `ensurePublishedRestored` runs / Then `data/published/default/index.html` exists locally	P0
1.4.3	GET / serves published article	Given local published index.html exists / When GET / / Then response is the published HTML with 200	P0

2. Persistence and HF Storage (P0)

Data loss is game over for a collaborative editor.

2.1 Local persistence

#	Test	Given / When / Then	Priority
2.1.1	Store writes .yjs file	Given a Y.Doc update triggers `debouncedSave` via `onChange` / When the debounce (2s) elapses / Then `data/<name>.yjs` exists and contains valid Yjs binary	P0
2.1.2	Fetch reads local file	Given `data/default.yjs` exists / When Database.fetch / Then Y.Doc is hydrated with stored content	P0
2.1.3	Fetch falls back to HF pull	Given no local .yjs file but HF dataset has one / When Database.fetch / Then file is pulled from HF and Y.Doc is hydrated	P0

2.2 HF dataset sync

#	Test	Given / When / Then	Priority
2.2.1	Push debounced at 10s	Given two rapid store calls / When 10s elapses / Then only one HF push is made	P1
2.2.2	flushAll on SIGTERM	Given pending debounced pushes / When SIGTERM received / Then all pending data is pushed before exit	P0

2.3 Image upload

#	Test	Given / When / Then	Priority
2.3.1	Upload returns proxy URL	Given a valid image file / When POST /api/upload / Then response contains a `/d/images/...` URL routed through the editor's dataset proxy (the underlying HF dataset is private)	P1
2.3.2	Reject oversized file	Given a file > 10MB / When POST /api/upload / Then response is 413 or 400 with error message	P1
2.3.3	Proxy serves images	Given an uploaded image / When GET `/d/images/<file>` / Then 200 + image bytes (proxy attaches a server-side token to fetch the private HF dataset)	P1
2.3.4	Proxy whitelist	Given any path under `/d/articles/...` (raw Y.js drafts) / When GET / Then 404 - never expose drafts via the proxy	P0

2.4 Storage status & disaster recovery

#	Test	Given / When / Then	Priority
2.4.1	Status surfaces dataset error	Given `createRepo` returns 403 / When GET /api/storage/status / Then response `lastError.stage === "dataset-create"` with `statusCode: 403`	P0
2.4.2	Status clears on recovery	Given a previous push error / When the next push succeeds / Then `lastError` is null and `lastCloudPushAt` is updated	P1
2.4.3	Status auth-gated	Given an anonymous request (oauthEnabled) / When GET /api/storage/status / Then 403 - don't leak dataset error details	P1
2.4.4	Eager creation on login	Given a successful /api/auth/status with canEdit / When the request completes / Then `ensureDatasetExists` has been attempted (success surfaces within one storage-status poll, failure surfaces too)	P0
2.4.5	Admin export streams .yjs	Given an editor user / When GET /api/admin/export-doc?name=default / Then 200 + `Content-Disposition: attachment` + raw .yjs body	P1
2.4.6	Admin export auth-gated	Given a non-canEdit request / When GET /api/admin/export-doc / Then 403	P0

3. API Routes - HTTP Contracts (P1)

Test the request/response shape, not the internal logic.

3.1 Publish

#	Test	Given / When / Then	Priority
3.1.1	Publish returns success	Given a valid Y.Doc in Hocuspocus / When POST /api/publish / Then response contains `{ success: true, htmlUrl }`	P0
3.1.2	Publish writes local HTML	Given POST /api/publish succeeds / When checking local FS / Then `data/published/default/index.html` exists	P0

3.2 Chat (AI Agent)

#	Test	Given / When / Then	Priority
3.2.1	Stream returns valid SSE	Given a chat message / When POST /api/chat / Then response is `text/event-stream` with parseable SSE events	P1
3.2.2	Tool calls included in stream	Given a prompt that triggers a tool / When streamed / Then SSE contains tool_call events with name and arguments	P1

3.3 Citations

#	Test	Given / When / Then	Priority
3.3.1	Resolve DOI to CSL-JSON	Given a valid DOI string / When POST /api/citations/resolve / Then response contains CSL-JSON with title and authors	P1
3.3.2	Import BibTeX	Given a valid BibTeX string / When POST /api/citations/import-bib / Then response contains CSL-JSON entries	P1
3.3.3	Format to HTML bibliography	Given CSL-JSON entries + style / When POST /api/citations/format / Then response contains HTML string	P1

3.4 Auth status

#	Test	Given / When / Then	Priority
3.4.1	Unauthenticated status	Given no cookie / When GET /api/auth/status / Then `{ authenticated: false, canEdit: false }`	P1
3.4.2	Authenticated status	Given valid hf_access_token cookie / When GET /api/auth/status / Then `{ authenticated: true, canEdit: true, user: {...} }`	P1

4. Auth and Security (P0)

4.1 Route protection

#	Test	Given / When / Then	Priority
4.1.1	Publish requires auth	Given OAuth enabled + no cookie / When POST /api/publish / Then 401 or 403	P0
4.1.2	Reset-document requires auth	Given OAuth enabled + no cookie / When POST /api/admin/reset-document / Then 401 or 403	P0
4.1.3	Upload works without auth	Given no cookie / When POST /api/upload with valid image / Then 200 (upload is not auth-gated per spec)	P1
4.1.4	Upload rate-limited per IP	Given more than 30 uploads within 60s from the same IP / When POST /api/upload / Then 429 with `Retry-After` header (anti-abuse since upload is anonymous)	P1

4.2 OAuth flow

#	Test	Given / When / Then	Priority
4.2.1	CSRF state validated	Given an OAuth callback with wrong state / When GET /auth/callback / Then rejected (400 or redirect to error)	P0
4.2.2	Cookie set on success	Given valid OAuth callback / When processed / Then `hf_access_token` cookie is set (httpOnly, secure, sameSite: none)	P0

4.3 WebSocket auth

#	Test	Given / When / Then	Priority
4.3.1	WS rejected without token	Given OAuth enabled / When WebSocket connects to /collab without token / Then connection rejected	P0
4.3.2	WS accepted with valid token	Given OAuth enabled + valid token / When WebSocket connects to /collab / Then connection accepted, sync starts	P1

4.4 XSS (known risk from spec)

#	Test	Given / When / Then	Priority
4.4.1	Licence field escaped	Given `meta.licence` containing `<script>alert(1)</script>` / When published HTML rendered / Then script tag is escaped or stripped	P0
4.4.2	Bibliography HTML sanitized	Given biblioHtml containing malicious script / When injected into published page / Then script is escaped or stripped	P0

5. AI Agent - Plumbing (P1)

We do NOT test LLM output quality. We test the infrastructure around it.

5.1 Context building

#	Test	Given / When / Then	Priority
5.1.1	Context includes doc text	Given a doc with content / When building chat context / Then context object contains document text	P1
5.1.2	Context includes selection	Given a text selection / When building chat context / Then context object contains selection text and position	P1
5.1.3	Context includes frontmatter	Given doc with title and authors / When building chat context / Then context object contains frontmatter fields	P1

5.2 Tool execution (client-side)

#	Test	Given / When / Then	Priority
5.2.1	replaceSelection applies	Given a selection in the editor / When replaceSelection tool called with new text / Then selection is replaced in the TipTap doc	P1
5.2.2	applyDiff does search/replace	Given doc with "Hello world" / When applyDiff called with search="world" replace="editor" / Then doc contains "Hello editor"	P1
5.2.3	updateFrontmatter modifies map	Given frontmatter with title "Old" / When updateFrontmatter called with title="New" / Then `Y.Map("frontmatter").get("title")` is "New"	P1

5.3 Undo batching

#	Test	Given / When / Then	Priority
5.3.1	Agent edits batch into one undo	Given agent executes 3 tool calls (replaceSelection + applyDiff + updateFrontmatter) / When user presses Cmd+Z once / Then all 3 changes are reverted	P0
5.3.2	Manual edits not in agent batch	Given user types text then agent edits / When user presses Cmd+Z / Then only agent edits are reverted, user text remains	P1

What we do NOT test (and why)

Area	Reason
Yjs real-time sync	Yjs/Hocuspocus are mature libs - testing their CRDT sync is noise
Visual rendering of components	Project evolves too fast, snapshot tests would break constantly
PDF generation (Playwright)	Heavy dependency, visual output - better as a manual smoke test
Slash commands / Bubble toolbar	UI that will change - covered by the opt-in Playwright suite instead of the hermetic one
CSS architecture	Not meaningfully unit-testable
TipTap extension registration	Framework internals, not business logic

Summary

Section	P0 tests	P1 tests	Total
1. Publisher Pipeline	8	4	12
2. Persistence / Storage	4	3	7
3. API Routes	2	6	8
4. Auth / Security	5	2	7
5. AI Agent	1	7	8
Total	20	22	42

Start with the 20 P0 tests. They cover the "if this breaks, we lose data or trust" surface. Add P1 tests as the codebase stabilizes.