Video2Guide / USER_GUIDE.md
Claude
Add USER_GUIDE.md: end-user walkthrough of upload, AI options, editor, downloads
2403a32
# VideoGuideMaker β€” User Guide
Turn a recorded lecture into an accessible, navigable study guide in a few minutes.
You hand it a video and a transcript. It picks the visually distinct moments (slide changes, demos, board work), pulls the on-screen text, and produces a polished HTML document β€” one section per slide β€” with timestamps, narration, key terms, and equations. You can review and edit before publishing.
---
## Table of contents
1. [Quick start](#quick-start)
2. [What you'll need before you upload](#what-youll-need-before-you-upload)
3. [The upload page, field by field](#the-upload-page-field-by-field)
4. [What happens after you click Generate](#what-happens-after-you-click-generate)
5. [The review & edit screen](#the-review--edit-screen)
6. [Output formats](#output-formats)
7. [Tips for the best results](#tips-for-the-best-results)
8. [Troubleshooting](#troubleshooting)
---
## Quick start
1. Open the site.
2. Drop in your video and your transcript.
3. (Optional) paste an Anthropic API key and tick the AI features you want.
4. Click **Generate study guide β†’**.
5. Wait for the progress bar.
6. On the editor screen, fix any rough spots, then click **Download final HTML** or **Download ZIP**.
That's it. The rest of this guide is detail.
---
## What you'll need before you upload
| You need | What it looks like | Notes |
|---|---|---|
| **A lecture video** | `.mp4`, `.mov`, `.mkv`, `.webm`, `.m4v`, `.avi` | Up to 500 MB. A 30-minute lecture is typical. |
| **A transcript** | `.srt` or `.vtt` (subtitle file) | Up to 5 MB. Required β€” the app aligns narration to slides using your transcript's timestamps. |
| **(Optional) an Anthropic API key** | Starts with `sk-ant-…` | Only needed if you want the AI features. The key is used per-job and is **never stored**. |
> **Don't have a transcript yet?** YouTube auto-captions, Descript, Otter, and Zoom all export `.srt` or `.vtt`. Quality matters β€” bad captions = bad section alignment.
---
## The upload page, field by field
The page is divided into numbered sections.
### 1. Upload
**Study guide title** β€” the heading of the finished document. Defaults to *"Lecture Study Guide"*. Change it.
**Video file** β€” drag and drop, or click to browse.
**Transcript** β€” same.
### 2. AI assist (optional)
Everything in this section requires an Anthropic API key. **Skip this section entirely if you don't have one** β€” the rest of the app still works.
**Anthropic API key** β€” paste your key here. Stored only for this job, never logged or saved on the server.
The five toggles below decide what Claude does for each frame. Each adds a small per-frame cost (a few cents for a typical lecture).
| Toggle | What Claude produces | Recommendation |
|---|---|---|
| **Section titles** | A short, descriptive title for each segment ("Combining AI models") instead of the default "Segment 3 β€” 4:25". | Tick this β€” it makes the table of contents readable. |
| **Alt-text drafts** | Accessible alt text describing the *purpose* of each visual, not just a transcription. | Tick if you'll publish the guide where accessibility matters. |
| **Key terms** | A list of named concepts the lecturer is teaching, with short definitions. | Tick for content-rich lectures. |
| **Math equations** | LaTeX for equations on the slide, plus a screen-reader description. | Tick if your slides contain math. |
| **On-screen text (Claude)** | Replaces the local OCR engine with Claude's reading. Much better on coloured callouts, decorative fonts, and rendered math. | Tick if your slides have any visual flair beyond plain black-on-white text. |
You can pick any combination. Toggles are independent.
### 3. Media extras
**Per-segment audio** β€” slices the lecture's narration into one MP3 clip per segment so learners can re-listen. Adds processing time and ~500 KB per clip. No API key needed. Off by default.
### 4. Output
Pick exactly one format:
| Format | Best for | Notes |
|---|---|---|
| **Review & edit** | Most cases. | Opens an in-browser editor; download from there once you're happy. |
| **Single HTML** | Quick share, no review. | One self-contained `.html` file with images and audio embedded. Just emails. |
| **Zip bundle** | Hosting on a website. | HTML plus a `static/` folder of images and audio. |
> πŸ’‘ **You can pick "Review & edit" and still get any of the other formats from the editor's toolbar β€” no need to re-run the pipeline.**
### Advanced settings (collapsed by default)
Only open this if the defaults don't work for your video. All of these have sane defaults.
**Scene-change sensitivity** (default 27)
How visually different two frames must be before the app counts a new slide. **Lower** (5–15) catches every animation step β€” good if your lecturer uses progressive reveals. **Higher** (35+) only major slide changes β€” good if you want one frame per slide.
**Minimum gap between scenes** (default off)
Drops new scenes that arrive less than X seconds after the previous one. Useful when a slide builds up over several seconds β€” set to ~5 s to keep just the final state.
**Max frames** (default unlimited)
Hard cap on how many frames the guide will contain. Useful for very long lectures or to bound API cost.
**Instructor-frame face threshold** (default 0.12)
The app automatically drops frames that look like a talking-head shot of the instructor (no slide content). This slider controls how aggressive that filter is. Lower = drops more. Raise it if your slides have an inset webcam you want to keep.
**Skip OCR** (off)
Faster, but on-screen text won't appear in the guide. Doesn't affect the AI on-screen-text toggle (Claude reads the image directly).
**Document language** (default `en`)
A BCP-47 code like `en`, `en-US`, `fr`, `de`. Sets the document language and selects local OCR language packs.
---
## What happens after you click Generate
You'll see a progress bar with stages:
1. **Detecting scenes** β€” finds visually distinct moments in the video.
2. **Filtering instructor frames** β€” drops talking-head shots.
3. **Loading transcript** β€” reads your captions.
4. **Running OCR** β€” extracts on-screen text from each kept frame.
5. **LLM extraction** *(only if you ticked any AI toggle)* β€” sends each frame to Claude.
6. **Slicing audio** *(only if you ticked Per-segment audio)* β€” extracts MP3 clips.
7. **Building segments** / **Rendering** β€” produces the final HTML.
Typical wall time: **about 30 s per minute of video** without AI; **about 60 s per minute** with AI features on.
If anything goes wrong, you'll see a red error banner with a message. Click **Try again** to retry without re-uploading (your form values are preserved).
---
## The review & edit screen
If you picked the **Review & edit** output, this is where you'll spend most of your time.
The screen has three areas:
### The toolbar (top, sticky)
| Button | What it does |
|---|---|
| **Save edits (JSON)** | Downloads a small `.edits.json` file with everything you've changed. Useful as a checkpoint or for handing the edits to someone else to re-render with the CLI. |
| **Load edits (JSON)** | Re-applies a previously-saved `.edits.json` to the current page. |
| **Preview** | Opens a clean copy of the guide (no editor chrome) in a new tab so you can see the final look. |
| **Download final HTML** | Saves the finished guide as a single self-contained `.html` file. Images and audio inlined. |
| **Download ZIP** | Saves the finished guide as a zip with separate asset files. Better for hosting on a website. |
Status messages appear next to the buttons.
### The document body
Edit anything that's underlined or has a placeholder. Specifically, per segment you can:
#### **Section title**
The heading. Editable text field. If you ticked **Section titles** AI, it'll have Claude's draft; otherwise it's a default like "Segment 3 β€” 4:25".
#### **Alt-text**
The accessible description of the slide image. Editable text field. Drafts marked as "needs review" produce an explicit author-warning banner at the top of the page until you confirm them.
#### **Choose frame** *(when alternates exist)*
A row of thumbnails under the slide. The app captures up to three candidate frames per scene; click any thumbnail to swap which one is the canonical image. The on-screen text panel updates automatically to match the picked frame.
#### **On-screen text from video**
A collapsible panel under the slide. The OCR'd (or Claude-read, if you enabled it) text from the slide. Editable β€” fix mis-reads here.
#### **(Audio clip)**
An audio player, only present if you ticked **Per-segment audio**. Plays the lecturer's narration for just this segment.
#### **Narration**
The transcript text aligned to this segment. Editable β€” clean up speech disfluencies, fix homophone errors, or rewrite into prose.
#### **Key terms** *(when AI key terms is on)*
A list of term + definition rows. Add, edit, or remove. Empty rows are dropped on download.
#### **Math equations** *(when AI math is on)*
A list of LaTeX expressions with optional labels and captions. Live-preview MathJax rendering on the right.
#### **Include in final guide** (checkbox per section)
Untick to drop a segment from the final document entirely. Useful for transitions, intro slides, or instructor-only frames the filter let through.
### Per-section status
The order on every segment is:
```
[Slide image]
↓
[Choose frame thumbnails] (if alternates)
↓
[On-screen text from video]
↓
[πŸ”Š Audio clip] (if Per-segment audio is on)
↓
[Narration]
↓
[Key terms] (if AI key terms is on)
↓
[Math equations] (if AI math equations is on)
```
---
## Output formats
Whatever output format you pick (or whatever you click in the editor's toolbar), the result is **valid HTML you can email, host, or print**. Specifics:
### Single HTML
- One `.html` file. Open in any browser.
- Images and audio are encoded as base64 data URIs.
- File can be 10–50 MB depending on slide count and audio.
- Best for: emailing, archiving.
### Zip bundle
- A `.zip` containing `study-guide.html` and a `static/` folder.
- Images stay as `.jpg` files; audio as `.mp3`.
- Best for: hosting on a static site, GitHub Pages, an LMS.
### Review & edit (in the editor)
- The editor page itself. Won't be downloaded as-is.
- From here, click any of the toolbar's download buttons.
---
## Tips for the best results
**Use a clean transcript.**
Auto-generated captions are usually fine, but they tend to misspell technical terms. A 5-minute pass over the `.srt` before uploading saves a lot of editing later.
**Set scene-change sensitivity to your lecture style.**
Hand-drawn whiteboard work needs lower (10–15). Slide-only decks work fine at 27. Talking-head with a single shared screen needs 35+.
**Tick AI on-screen text for slides with colour callouts.**
The local OCR engine struggles with white-text-on-coloured-background. Claude reads them perfectly.
**Don't tick every AI feature on the first run.**
Each costs API tokens. Section titles and alt text are the highest-leverage two β€” start with those. Add key terms / equations / OCR if specific gaps appear.
**The "Include" checkbox is your friend.**
It's faster to drop bad segments than to re-run the pipeline with stricter settings.
**Save the JSON before downloading.**
If you spend 20 minutes editing and then your browser tab closes, the JSON checkpoint is the only way to recover.
---
## Troubleshooting
**"No usable visual segments found"**
The scene detector found nothing or every frame was filtered as instructor-only. Try (a) lowering the scene-change sensitivity, or (b) raising the instructor-frame face threshold.
**"Server is at the 8-job concurrency cap"**
Someone else's job is currently running on the same server. Wait a minute and click **Generate** again.
**"Video exceeds the 500 MB upload limit" / "Transcript exceeds the 5 MB upload limit"**
Compress the video before uploading (HandBrake, ffmpeg) or split it. The transcript limit is huge β€” if you're hitting it, your captions file is likely a binary blob, not a real `.srt`.
**Progress stalls on OCR for a long time**
Tesseract is slow on the free CPU tier. A 30-min lecture takes 1–2 minutes for OCR. If it's been 5+ minutes, tick **Skip OCR** in advanced settings and re-run, then optionally turn on the AI **On-screen text** toggle instead β€” Claude is much faster.
**The OCR text reads like garbage**
Coloured callouts and decorative fonts confuse Tesseract. Tick **On-screen text (Claude)** under AI assist.
**Equations render as raw LaTeX**
Make sure MathJax loaded β€” check the browser console. If you're behind a corporate firewall that blocks `cdn.jsdelivr.net`, MathJax won't load.
**The instructor's face is showing as the slide image**
Click an alternate thumbnail under that slide. The app keeps up to two alternates per scene precisely for this case.
**Edits got lost when I refreshed**
Always **Save edits (JSON)** before refreshing, closing the tab, or stepping away. The editor doesn't persist to localStorage.
**Downloaded HTML won't open**
Some email clients block `.html` attachments. Zip the file or use the Zip bundle output.
**Audio plays in the editor but not in the downloaded file**
The download path inlines audio as data URIs. If you downloaded the zip bundle, the audio is in `static/segment_*.mp3` β€” check your browser console for missing-file errors.
---
If you hit something this guide doesn't cover, hand the error message to whoever set up the app for you. Most issues are resolved by lowering the scene-change sensitivity or ticking the **On-screen text (Claude)** toggle.