# VideoGuideMaker — User Guide

Turn a recorded lecture into an accessible, navigable study guide in a few minutes.

You hand it a video and a transcript. It picks the visually distinct moments (slide changes, demos, board work), pulls the on-screen text, and produces a polished HTML document — one section per slide — with timestamps, narration, key terms, and equations. You can review and edit before publishing.

---

## Table of contents

1. [Quick start](#quick-start)
2. [What you'll need before you upload](#what-youll-need-before-you-upload)
3. [The upload page, field by field](#the-upload-page-field-by-field)
4. [What happens after you click Generate](#what-happens-after-you-click-generate)
5. [The review & edit screen](#the-review--edit-screen)
6. [Output formats](#output-formats)
7. [Tips for the best results](#tips-for-the-best-results)
8. [Troubleshooting](#troubleshooting)

---

## Quick start

1. Open the site.
2. Drop in your video and your transcript.
3. (Optional) paste an Anthropic API key and tick the AI features you want.
4. Click **Generate study guide →**.
5. Wait for the progress bar.
6. On the editor screen, fix any rough spots, then click **Download final HTML** or **Download ZIP**.

That's it. The rest of this guide is detail.

---

## What you'll need before you upload

| You need | What it looks like | Notes |
|---|---|---|
| **A lecture video** | `.mp4`, `.mov`, `.mkv`, `.webm`, `.m4v`, `.avi` | Up to 500 MB. A 30-minute lecture is typical. |
| **A transcript** | `.srt` or `.vtt` (subtitle file) | Up to 5 MB. Required — the app aligns narration to slides using your transcript's timestamps. |
| **(Optional) an Anthropic API key** | Starts with `sk-ant-…` | Only needed if you want the AI features. The key is used per-job and is **never stored**. |

> **Don't have a transcript yet?** YouTube auto-captions, Descript, Otter, and Zoom all export `.srt` or `.vtt`. Quality matters — bad captions = bad section alignment.

---

## The upload page, field by field

The page is divided into numbered sections.

### 1. Upload

**Study guide title** — the heading of the finished document. Defaults to *"Lecture Study Guide"*. Change it.

**Video file** — drag and drop, or click to browse.

**Transcript** — same.

### 2. AI assist (optional)

Everything in this section requires an Anthropic API key. **Skip this section entirely if you don't have one** — the rest of the app still works.

**Anthropic API key** — paste your key here. Stored only for this job, never logged or saved on the server.

The five toggles below decide what Claude does for each frame. Each adds a small per-frame cost (a few cents for a typical lecture).

| Toggle | What Claude produces | Recommendation |
|---|---|---|
| **Section titles** | A short, descriptive title for each segment ("Combining AI models") instead of the default "Segment 3 — 4:25". | Tick this — it makes the table of contents readable. |
| **Alt-text drafts** | Accessible alt text describing the *purpose* of each visual, not just a transcription. | Tick if you'll publish the guide where accessibility matters. |
| **Key terms** | A list of named concepts the lecturer is teaching, with short definitions. | Tick for content-rich lectures. |
| **Math equations** | LaTeX for equations on the slide, plus a screen-reader description. | Tick if your slides contain math. |
| **On-screen text (Claude)** | Replaces the local OCR engine with Claude's reading. Much better on coloured callouts, decorative fonts, and rendered math. | Tick if your slides have any visual flair beyond plain black-on-white text. |

You can pick any combination. Toggles are independent.

### 3. Media extras

**Per-segment audio** — slices the lecture's narration into one MP3 clip per segment so learners can re-listen. Adds processing time and ~500 KB per clip. No API key needed. Off by default.

### 4. Output

Pick exactly one format:

| Format | Best for | Notes |
|---|---|---|
| **Review & edit** | Most cases. | Opens an in-browser editor; download from there once you're happy. |
| **Single HTML** | Quick share, no review. | One self-contained `.html` file with images and audio embedded. Just emails. |
| **Zip bundle** | Hosting on a website. | HTML plus a `static/` folder of images and audio. |

> 💡 **You can pick "Review & edit" and still get any of the other formats from the editor's toolbar — no need to re-run the pipeline.**

### Advanced settings (collapsed by default)

Only open this if the defaults don't work for your video. All of these have sane defaults.

**Scene-change sensitivity** (default 27)
How visually different two frames must be before the app counts a new slide. **Lower** (5–15) catches every animation step — good if your lecturer uses progressive reveals. **Higher** (35+) only major slide changes — good if you want one frame per slide.

**Minimum gap between scenes** (default off)
Drops new scenes that arrive less than X seconds after the previous one. Useful when a slide builds up over several seconds — set to ~5 s to keep just the final state.

**Max frames** (default unlimited)
Hard cap on how many frames the guide will contain. Useful for very long lectures or to bound API cost.

**Instructor-frame face threshold** (default 0.12)
The app automatically drops frames that look like a talking-head shot of the instructor (no slide content). This slider controls how aggressive that filter is. Lower = drops more. Raise it if your slides have an inset webcam you want to keep.

**Skip OCR** (off)
Faster, but on-screen text won't appear in the guide. Doesn't affect the AI on-screen-text toggle (Claude reads the image directly).

**Document language** (default `en`)
A BCP-47 code like `en`, `en-US`, `fr`, `de`. Sets the document language and selects local OCR language packs.

---

## What happens after you click Generate

You'll see a progress bar with stages:

1. **Detecting scenes** — finds visually distinct moments in the video.
2. **Filtering instructor frames** — drops talking-head shots.
3. **Loading transcript** — reads your captions.
4. **Running OCR** — extracts on-screen text from each kept frame.
5. **LLM extraction** *(only if you ticked any AI toggle)* — sends each frame to Claude.
6. **Slicing audio** *(only if you ticked Per-segment audio)* — extracts MP3 clips.
7. **Building segments** / **Rendering** — produces the final HTML.

Typical wall time: **about 30 s per minute of video** without AI; **about 60 s per minute** with AI features on.

If anything goes wrong, you'll see a red error banner with a message. Click **Try again** to retry without re-uploading (your form values are preserved).

---

## The review & edit screen

If you picked the **Review & edit** output, this is where you'll spend most of your time.

The screen has three areas:

### The toolbar (top, sticky)

| Button | What it does |
|---|---|
| **Save edits (JSON)** | Downloads a small `.edits.json` file with everything you've changed. Useful as a checkpoint or for handing the edits to someone else to re-render with the CLI. |
| **Load edits (JSON)** | Re-applies a previously-saved `.edits.json` to the current page. |
| **Preview** | Opens a clean copy of the guide (no editor chrome) in a new tab so you can see the final look. |
| **Download final HTML** | Saves the finished guide as a single self-contained `.html` file. Images and audio inlined. |
| **Download ZIP** | Saves the finished guide as a zip with separate asset files. Better for hosting on a website. |

Status messages appear next to the buttons.

### The document body

Edit anything that's underlined or has a placeholder. Specifically, per segment you can:

#### **Section title**
The heading. Editable text field. If you ticked **Section titles** AI, it'll have Claude's draft; otherwise it's a default like "Segment 3 — 4:25".

#### **Alt-text**
The accessible description of the slide image. Editable text field. Drafts marked as "needs review" produce an explicit author-warning banner at the top of the page until you confirm them.

#### **Choose frame** *(when alternates exist)*
A row of thumbnails under the slide. The app captures up to three candidate frames per scene; click any thumbnail to swap which one is the canonical image. The on-screen text panel updates automatically to match the picked frame.

#### **On-screen text from video**
A collapsible panel under the slide. The OCR'd (or Claude-read, if you enabled it) text from the slide. Editable — fix mis-reads here.

#### **(Audio clip)**
An audio player, only present if you ticked **Per-segment audio**. Plays the lecturer's narration for just this segment.

#### **Narration**
The transcript text aligned to this segment. Editable — clean up speech disfluencies, fix homophone errors, or rewrite into prose.

#### **Key terms** *(when AI key terms is on)*
A list of term + definition rows. Add, edit, or remove. Empty rows are dropped on download.

#### **Math equations** *(when AI math is on)*
A list of LaTeX expressions with optional labels and captions. Live-preview MathJax rendering on the right.

#### **Include in final guide** (checkbox per section)
Untick to drop a segment from the final document entirely. Useful for transitions, intro slides, or instructor-only frames the filter let through.

### Per-section status

The order on every segment is:

```
[Slide image]
   ↓
[Choose frame thumbnails]   (if alternates)
   ↓
[On-screen text from video]
   ↓
[🔊 Audio clip]              (if Per-segment audio is on)
   ↓
[Narration]
   ↓
[Key terms]                  (if AI key terms is on)
   ↓
[Math equations]             (if AI math equations is on)
```

---

## Output formats

Whatever output format you pick (or whatever you click in the editor's toolbar), the result is **valid HTML you can email, host, or print**. Specifics:

### Single HTML
- One `.html` file. Open in any browser.
- Images and audio are encoded as base64 data URIs.
- File can be 10–50 MB depending on slide count and audio.
- Best for: emailing, archiving.

### Zip bundle
- A `.zip` containing `study-guide.html` and a `static/` folder.
- Images stay as `.jpg` files; audio as `.mp3`.
- Best for: hosting on a static site, GitHub Pages, an LMS.

### Review & edit (in the editor)
- The editor page itself. Won't be downloaded as-is.
- From here, click any of the toolbar's download buttons.

---

## Tips for the best results

**Use a clean transcript.**
Auto-generated captions are usually fine, but they tend to misspell technical terms. A 5-minute pass over the `.srt` before uploading saves a lot of editing later.

**Set scene-change sensitivity to your lecture style.**
Hand-drawn whiteboard work needs lower (10–15). Slide-only decks work fine at 27. Talking-head with a single shared screen needs 35+.

**Tick AI on-screen text for slides with colour callouts.**
The local OCR engine struggles with white-text-on-coloured-background. Claude reads them perfectly.

**Don't tick every AI feature on the first run.**
Each costs API tokens. Section titles and alt text are the highest-leverage two — start with those. Add key terms / equations / OCR if specific gaps appear.

**The "Include" checkbox is your friend.**
It's faster to drop bad segments than to re-run the pipeline with stricter settings.

**Save the JSON before downloading.**
If you spend 20 minutes editing and then your browser tab closes, the JSON checkpoint is the only way to recover.

---

## Troubleshooting

**"No usable visual segments found"**
The scene detector found nothing or every frame was filtered as instructor-only. Try (a) lowering the scene-change sensitivity, or (b) raising the instructor-frame face threshold.

**"Server is at the 8-job concurrency cap"**
Someone else's job is currently running on the same server. Wait a minute and click **Generate** again.

**"Video exceeds the 500 MB upload limit" / "Transcript exceeds the 5 MB upload limit"**
Compress the video before uploading (HandBrake, ffmpeg) or split it. The transcript limit is huge — if you're hitting it, your captions file is likely a binary blob, not a real `.srt`.

**Progress stalls on OCR for a long time**
Tesseract is slow on the free CPU tier. A 30-min lecture takes 1–2 minutes for OCR. If it's been 5+ minutes, tick **Skip OCR** in advanced settings and re-run, then optionally turn on the AI **On-screen text** toggle instead — Claude is much faster.

**The OCR text reads like garbage**
Coloured callouts and decorative fonts confuse Tesseract. Tick **On-screen text (Claude)** under AI assist.

**Equations render as raw LaTeX**
Make sure MathJax loaded — check the browser console. If you're behind a corporate firewall that blocks `cdn.jsdelivr.net`, MathJax won't load.

**The instructor's face is showing as the slide image**
Click an alternate thumbnail under that slide. The app keeps up to two alternates per scene precisely for this case.

**Edits got lost when I refreshed**
Always **Save edits (JSON)** before refreshing, closing the tab, or stepping away. The editor doesn't persist to localStorage.

**Downloaded HTML won't open**
Some email clients block `.html` attachments. Zip the file or use the Zip bundle output.

**Audio plays in the editor but not in the downloaded file**
The download path inlines audio as data URIs. If you downloaded the zip bundle, the audio is in `static/segment_*.mp3` — check your browser console for missing-file errors.

---

If you hit something this guide doesn't cover, hand the error message to whoever set up the app for you. Most issues are resolved by lowering the scene-change sensitivity or ticking the **On-screen text (Claude)** toggle.