# VideoGuideMaker — User Guide Turn a recorded lecture into an accessible, navigable study guide in a few minutes. You hand it a video and a transcript. It picks the visually distinct moments (slide changes, demos, board work), pulls the on-screen text, and produces a polished HTML document — one section per slide — with timestamps, narration, key terms, and equations. You can review and edit before publishing. --- ## Table of contents 1. [Quick start](#quick-start) 2. [What you'll need before you upload](#what-youll-need-before-you-upload) 3. [The upload page, field by field](#the-upload-page-field-by-field) 4. [What happens after you click Generate](#what-happens-after-you-click-generate) 5. [The review & edit screen](#the-review--edit-screen) 6. [Output formats](#output-formats) 7. [Tips for the best results](#tips-for-the-best-results) 8. [Troubleshooting](#troubleshooting) --- ## Quick start 1. Open the site. 2. Drop in your video and your transcript. 3. (Optional) paste an Anthropic API key and tick the AI features you want. 4. Click **Generate study guide →**. 5. Wait for the progress bar. 6. On the editor screen, fix any rough spots, then click **Download final HTML** or **Download ZIP**. That's it. The rest of this guide is detail. --- ## What you'll need before you upload | You need | What it looks like | Notes | |---|---|---| | **A lecture video** | `.mp4`, `.mov`, `.mkv`, `.webm`, `.m4v`, `.avi` | Up to 500 MB. A 30-minute lecture is typical. | | **A transcript** | `.srt` or `.vtt` (subtitle file) | Up to 5 MB. Required — the app aligns narration to slides using your transcript's timestamps. | | **(Optional) an Anthropic API key** | Starts with `sk-ant-…` | Only needed if you want the AI features. The key is used per-job and is **never stored**. | > **Don't have a transcript yet?** YouTube auto-captions, Descript, Otter, and Zoom all export `.srt` or `.vtt`. Quality matters — bad captions = bad section alignment. --- ## The upload page, field by field The page is divided into numbered sections. ### 1. Upload **Study guide title** — the heading of the finished document. Defaults to *"Lecture Study Guide"*. Change it. **Video file** — drag and drop, or click to browse. **Transcript** — same. ### 2. AI assist (optional) Everything in this section requires an Anthropic API key. **Skip this section entirely if you don't have one** — the rest of the app still works. **Anthropic API key** — paste your key here. Stored only for this job, never logged or saved on the server. The five toggles below decide what Claude does for each frame. Each adds a small per-frame cost (a few cents for a typical lecture). | Toggle | What Claude produces | Recommendation | |---|---|---| | **Section titles** | A short, descriptive title for each segment ("Combining AI models") instead of the default "Segment 3 — 4:25". | Tick this — it makes the table of contents readable. | | **Alt-text drafts** | Accessible alt text describing the *purpose* of each visual, not just a transcription. | Tick if you'll publish the guide where accessibility matters. | | **Key terms** | A list of named concepts the lecturer is teaching, with short definitions. | Tick for content-rich lectures. | | **Math equations** | LaTeX for equations on the slide, plus a screen-reader description. | Tick if your slides contain math. | | **On-screen text (Claude)** | Replaces the local OCR engine with Claude's reading. Much better on coloured callouts, decorative fonts, and rendered math. | Tick if your slides have any visual flair beyond plain black-on-white text. | You can pick any combination. Toggles are independent. ### 3. Media extras **Per-segment audio** — slices the lecture's narration into one MP3 clip per segment so learners can re-listen. Adds processing time and ~500 KB per clip. No API key needed. Off by default. ### 4. Output Pick exactly one format: | Format | Best for | Notes | |---|---|---| | **Review & edit** | Most cases. | Opens an in-browser editor; download from there once you're happy. | | **Single HTML** | Quick share, no review. | One self-contained `.html` file with images and audio embedded. Just emails. | | **Zip bundle** | Hosting on a website. | HTML plus a `static/` folder of images and audio. | > 💡 **You can pick "Review & edit" and still get any of the other formats from the editor's toolbar — no need to re-run the pipeline.** ### Advanced settings (collapsed by default) Only open this if the defaults don't work for your video. All of these have sane defaults. **Scene-change sensitivity** (default 27) How visually different two frames must be before the app counts a new slide. **Lower** (5–15) catches every animation step — good if your lecturer uses progressive reveals. **Higher** (35+) only major slide changes — good if you want one frame per slide. **Minimum gap between scenes** (default off) Drops new scenes that arrive less than X seconds after the previous one. Useful when a slide builds up over several seconds — set to ~5 s to keep just the final state. **Max frames** (default unlimited) Hard cap on how many frames the guide will contain. Useful for very long lectures or to bound API cost. **Instructor-frame face threshold** (default 0.12) The app automatically drops frames that look like a talking-head shot of the instructor (no slide content). This slider controls how aggressive that filter is. Lower = drops more. Raise it if your slides have an inset webcam you want to keep. **Skip OCR** (off) Faster, but on-screen text won't appear in the guide. Doesn't affect the AI on-screen-text toggle (Claude reads the image directly). **Document language** (default `en`) A BCP-47 code like `en`, `en-US`, `fr`, `de`. Sets the document language and selects local OCR language packs. --- ## What happens after you click Generate You'll see a progress bar with stages: 1. **Detecting scenes** — finds visually distinct moments in the video. 2. **Filtering instructor frames** — drops talking-head shots. 3. **Loading transcript** — reads your captions. 4. **Running OCR** — extracts on-screen text from each kept frame. 5. **LLM extraction** *(only if you ticked any AI toggle)* — sends each frame to Claude. 6. **Slicing audio** *(only if you ticked Per-segment audio)* — extracts MP3 clips. 7. **Building segments** / **Rendering** — produces the final HTML. Typical wall time: **about 30 s per minute of video** without AI; **about 60 s per minute** with AI features on. If anything goes wrong, you'll see a red error banner with a message. Click **Try again** to retry without re-uploading (your form values are preserved). --- ## The review & edit screen If you picked the **Review & edit** output, this is where you'll spend most of your time. The screen has three areas: ### The toolbar (top, sticky) | Button | What it does | |---|---| | **Save edits (JSON)** | Downloads a small `.edits.json` file with everything you've changed. Useful as a checkpoint or for handing the edits to someone else to re-render with the CLI. | | **Load edits (JSON)** | Re-applies a previously-saved `.edits.json` to the current page. | | **Preview** | Opens a clean copy of the guide (no editor chrome) in a new tab so you can see the final look. | | **Download final HTML** | Saves the finished guide as a single self-contained `.html` file. Images and audio inlined. | | **Download ZIP** | Saves the finished guide as a zip with separate asset files. Better for hosting on a website. | Status messages appear next to the buttons. ### The document body Edit anything that's underlined or has a placeholder. Specifically, per segment you can: #### **Section title** The heading. Editable text field. If you ticked **Section titles** AI, it'll have Claude's draft; otherwise it's a default like "Segment 3 — 4:25". #### **Alt-text** The accessible description of the slide image. Editable text field. Drafts marked as "needs review" produce an explicit author-warning banner at the top of the page until you confirm them. #### **Choose frame** *(when alternates exist)* A row of thumbnails under the slide. The app captures up to three candidate frames per scene; click any thumbnail to swap which one is the canonical image. The on-screen text panel updates automatically to match the picked frame. #### **On-screen text from video** A collapsible panel under the slide. The OCR'd (or Claude-read, if you enabled it) text from the slide. Editable — fix mis-reads here. #### **(Audio clip)** An audio player, only present if you ticked **Per-segment audio**. Plays the lecturer's narration for just this segment. #### **Narration** The transcript text aligned to this segment. Editable — clean up speech disfluencies, fix homophone errors, or rewrite into prose. #### **Key terms** *(when AI key terms is on)* A list of term + definition rows. Add, edit, or remove. Empty rows are dropped on download. #### **Math equations** *(when AI math is on)* A list of LaTeX expressions with optional labels and captions. Live-preview MathJax rendering on the right. #### **Include in final guide** (checkbox per section) Untick to drop a segment from the final document entirely. Useful for transitions, intro slides, or instructor-only frames the filter let through. ### Per-section status The order on every segment is: ``` [Slide image] ↓ [Choose frame thumbnails] (if alternates) ↓ [On-screen text from video] ↓ [🔊 Audio clip] (if Per-segment audio is on) ↓ [Narration] ↓ [Key terms] (if AI key terms is on) ↓ [Math equations] (if AI math equations is on) ``` --- ## Output formats Whatever output format you pick (or whatever you click in the editor's toolbar), the result is **valid HTML you can email, host, or print**. Specifics: ### Single HTML - One `.html` file. Open in any browser. - Images and audio are encoded as base64 data URIs. - File can be 10–50 MB depending on slide count and audio. - Best for: emailing, archiving. ### Zip bundle - A `.zip` containing `study-guide.html` and a `static/` folder. - Images stay as `.jpg` files; audio as `.mp3`. - Best for: hosting on a static site, GitHub Pages, an LMS. ### Review & edit (in the editor) - The editor page itself. Won't be downloaded as-is. - From here, click any of the toolbar's download buttons. --- ## Tips for the best results **Use a clean transcript.** Auto-generated captions are usually fine, but they tend to misspell technical terms. A 5-minute pass over the `.srt` before uploading saves a lot of editing later. **Set scene-change sensitivity to your lecture style.** Hand-drawn whiteboard work needs lower (10–15). Slide-only decks work fine at 27. Talking-head with a single shared screen needs 35+. **Tick AI on-screen text for slides with colour callouts.** The local OCR engine struggles with white-text-on-coloured-background. Claude reads them perfectly. **Don't tick every AI feature on the first run.** Each costs API tokens. Section titles and alt text are the highest-leverage two — start with those. Add key terms / equations / OCR if specific gaps appear. **The "Include" checkbox is your friend.** It's faster to drop bad segments than to re-run the pipeline with stricter settings. **Save the JSON before downloading.** If you spend 20 minutes editing and then your browser tab closes, the JSON checkpoint is the only way to recover. --- ## Troubleshooting **"No usable visual segments found"** The scene detector found nothing or every frame was filtered as instructor-only. Try (a) lowering the scene-change sensitivity, or (b) raising the instructor-frame face threshold. **"Server is at the 8-job concurrency cap"** Someone else's job is currently running on the same server. Wait a minute and click **Generate** again. **"Video exceeds the 500 MB upload limit" / "Transcript exceeds the 5 MB upload limit"** Compress the video before uploading (HandBrake, ffmpeg) or split it. The transcript limit is huge — if you're hitting it, your captions file is likely a binary blob, not a real `.srt`. **Progress stalls on OCR for a long time** Tesseract is slow on the free CPU tier. A 30-min lecture takes 1–2 minutes for OCR. If it's been 5+ minutes, tick **Skip OCR** in advanced settings and re-run, then optionally turn on the AI **On-screen text** toggle instead — Claude is much faster. **The OCR text reads like garbage** Coloured callouts and decorative fonts confuse Tesseract. Tick **On-screen text (Claude)** under AI assist. **Equations render as raw LaTeX** Make sure MathJax loaded — check the browser console. If you're behind a corporate firewall that blocks `cdn.jsdelivr.net`, MathJax won't load. **The instructor's face is showing as the slide image** Click an alternate thumbnail under that slide. The app keeps up to two alternates per scene precisely for this case. **Edits got lost when I refreshed** Always **Save edits (JSON)** before refreshing, closing the tab, or stepping away. The editor doesn't persist to localStorage. **Downloaded HTML won't open** Some email clients block `.html` attachments. Zip the file or use the Zip bundle output. **Audio plays in the editor but not in the downloaded file** The download path inlines audio as data URIs. If you downloaded the zip bundle, the audio is in `static/segment_*.mp3` — check your browser console for missing-file errors. --- If you hit something this guide doesn't cover, hand the error message to whoever set up the app for you. Most issues are resolved by lowering the scene-change sensitivity or ticking the **On-screen text (Claude)** toggle.