Spaces:
Running
Running
| # VideoGuideMaker β User Guide | |
| Turn a recorded lecture into an accessible, navigable study guide in a few minutes. | |
| You hand it a video and a transcript. It picks the visually distinct moments (slide changes, demos, board work), pulls the on-screen text, and produces a polished HTML document β one section per slide β with timestamps, narration, key terms, and equations. You can review and edit before publishing. | |
| --- | |
| ## Table of contents | |
| 1. [Quick start](#quick-start) | |
| 2. [What you'll need before you upload](#what-youll-need-before-you-upload) | |
| 3. [The upload page, field by field](#the-upload-page-field-by-field) | |
| 4. [What happens after you click Generate](#what-happens-after-you-click-generate) | |
| 5. [The review & edit screen](#the-review--edit-screen) | |
| 6. [Output formats](#output-formats) | |
| 7. [Tips for the best results](#tips-for-the-best-results) | |
| 8. [Troubleshooting](#troubleshooting) | |
| --- | |
| ## Quick start | |
| 1. Open the site. | |
| 2. Drop in your video and your transcript. | |
| 3. (Optional) paste an Anthropic API key and tick the AI features you want. | |
| 4. Click **Generate study guide β**. | |
| 5. Wait for the progress bar. | |
| 6. On the editor screen, fix any rough spots, then click **Download final HTML** or **Download ZIP**. | |
| That's it. The rest of this guide is detail. | |
| --- | |
| ## What you'll need before you upload | |
| | You need | What it looks like | Notes | | |
| |---|---|---| | |
| | **A lecture video** | `.mp4`, `.mov`, `.mkv`, `.webm`, `.m4v`, `.avi` | Up to 500 MB. A 30-minute lecture is typical. | | |
| | **A transcript** | `.srt` or `.vtt` (subtitle file) | Up to 5 MB. Required β the app aligns narration to slides using your transcript's timestamps. | | |
| | **(Optional) an Anthropic API key** | Starts with `sk-ant-β¦` | Only needed if you want the AI features. The key is used per-job and is **never stored**. | | |
| > **Don't have a transcript yet?** YouTube auto-captions, Descript, Otter, and Zoom all export `.srt` or `.vtt`. Quality matters β bad captions = bad section alignment. | |
| --- | |
| ## The upload page, field by field | |
| The page is divided into numbered sections. | |
| ### 1. Upload | |
| **Study guide title** β the heading of the finished document. Defaults to *"Lecture Study Guide"*. Change it. | |
| **Video file** β drag and drop, or click to browse. | |
| **Transcript** β same. | |
| ### 2. AI assist (optional) | |
| Everything in this section requires an Anthropic API key. **Skip this section entirely if you don't have one** β the rest of the app still works. | |
| **Anthropic API key** β paste your key here. Stored only for this job, never logged or saved on the server. | |
| The five toggles below decide what Claude does for each frame. Each adds a small per-frame cost (a few cents for a typical lecture). | |
| | Toggle | What Claude produces | Recommendation | | |
| |---|---|---| | |
| | **Section titles** | A short, descriptive title for each segment ("Combining AI models") instead of the default "Segment 3 β 4:25". | Tick this β it makes the table of contents readable. | | |
| | **Alt-text drafts** | Accessible alt text describing the *purpose* of each visual, not just a transcription. | Tick if you'll publish the guide where accessibility matters. | | |
| | **Key terms** | A list of named concepts the lecturer is teaching, with short definitions. | Tick for content-rich lectures. | | |
| | **Math equations** | LaTeX for equations on the slide, plus a screen-reader description. | Tick if your slides contain math. | | |
| | **On-screen text (Claude)** | Replaces the local OCR engine with Claude's reading. Much better on coloured callouts, decorative fonts, and rendered math. | Tick if your slides have any visual flair beyond plain black-on-white text. | | |
| You can pick any combination. Toggles are independent. | |
| ### 3. Media extras | |
| **Per-segment audio** β slices the lecture's narration into one MP3 clip per segment so learners can re-listen. Adds processing time and ~500 KB per clip. No API key needed. Off by default. | |
| ### 4. Output | |
| Pick exactly one format: | |
| | Format | Best for | Notes | | |
| |---|---|---| | |
| | **Review & edit** | Most cases. | Opens an in-browser editor; download from there once you're happy. | | |
| | **Single HTML** | Quick share, no review. | One self-contained `.html` file with images and audio embedded. Just emails. | | |
| | **Zip bundle** | Hosting on a website. | HTML plus a `static/` folder of images and audio. | | |
| > π‘ **You can pick "Review & edit" and still get any of the other formats from the editor's toolbar β no need to re-run the pipeline.** | |
| ### Advanced settings (collapsed by default) | |
| Only open this if the defaults don't work for your video. All of these have sane defaults. | |
| **Scene-change sensitivity** (default 27) | |
| How visually different two frames must be before the app counts a new slide. **Lower** (5β15) catches every animation step β good if your lecturer uses progressive reveals. **Higher** (35+) only major slide changes β good if you want one frame per slide. | |
| **Minimum gap between scenes** (default off) | |
| Drops new scenes that arrive less than X seconds after the previous one. Useful when a slide builds up over several seconds β set to ~5 s to keep just the final state. | |
| **Max frames** (default unlimited) | |
| Hard cap on how many frames the guide will contain. Useful for very long lectures or to bound API cost. | |
| **Instructor-frame face threshold** (default 0.12) | |
| The app automatically drops frames that look like a talking-head shot of the instructor (no slide content). This slider controls how aggressive that filter is. Lower = drops more. Raise it if your slides have an inset webcam you want to keep. | |
| **Skip OCR** (off) | |
| Faster, but on-screen text won't appear in the guide. Doesn't affect the AI on-screen-text toggle (Claude reads the image directly). | |
| **Document language** (default `en`) | |
| A BCP-47 code like `en`, `en-US`, `fr`, `de`. Sets the document language and selects local OCR language packs. | |
| --- | |
| ## What happens after you click Generate | |
| You'll see a progress bar with stages: | |
| 1. **Detecting scenes** β finds visually distinct moments in the video. | |
| 2. **Filtering instructor frames** β drops talking-head shots. | |
| 3. **Loading transcript** β reads your captions. | |
| 4. **Running OCR** β extracts on-screen text from each kept frame. | |
| 5. **LLM extraction** *(only if you ticked any AI toggle)* β sends each frame to Claude. | |
| 6. **Slicing audio** *(only if you ticked Per-segment audio)* β extracts MP3 clips. | |
| 7. **Building segments** / **Rendering** β produces the final HTML. | |
| Typical wall time: **about 30 s per minute of video** without AI; **about 60 s per minute** with AI features on. | |
| If anything goes wrong, you'll see a red error banner with a message. Click **Try again** to retry without re-uploading (your form values are preserved). | |
| --- | |
| ## The review & edit screen | |
| If you picked the **Review & edit** output, this is where you'll spend most of your time. | |
| The screen has three areas: | |
| ### The toolbar (top, sticky) | |
| | Button | What it does | | |
| |---|---| | |
| | **Save edits (JSON)** | Downloads a small `.edits.json` file with everything you've changed. Useful as a checkpoint or for handing the edits to someone else to re-render with the CLI. | | |
| | **Load edits (JSON)** | Re-applies a previously-saved `.edits.json` to the current page. | | |
| | **Preview** | Opens a clean copy of the guide (no editor chrome) in a new tab so you can see the final look. | | |
| | **Download final HTML** | Saves the finished guide as a single self-contained `.html` file. Images and audio inlined. | | |
| | **Download ZIP** | Saves the finished guide as a zip with separate asset files. Better for hosting on a website. | | |
| Status messages appear next to the buttons. | |
| ### The document body | |
| Edit anything that's underlined or has a placeholder. Specifically, per segment you can: | |
| #### **Section title** | |
| The heading. Editable text field. If you ticked **Section titles** AI, it'll have Claude's draft; otherwise it's a default like "Segment 3 β 4:25". | |
| #### **Alt-text** | |
| The accessible description of the slide image. Editable text field. Drafts marked as "needs review" produce an explicit author-warning banner at the top of the page until you confirm them. | |
| #### **Choose frame** *(when alternates exist)* | |
| A row of thumbnails under the slide. The app captures up to three candidate frames per scene; click any thumbnail to swap which one is the canonical image. The on-screen text panel updates automatically to match the picked frame. | |
| #### **On-screen text from video** | |
| A collapsible panel under the slide. The OCR'd (or Claude-read, if you enabled it) text from the slide. Editable β fix mis-reads here. | |
| #### **(Audio clip)** | |
| An audio player, only present if you ticked **Per-segment audio**. Plays the lecturer's narration for just this segment. | |
| #### **Narration** | |
| The transcript text aligned to this segment. Editable β clean up speech disfluencies, fix homophone errors, or rewrite into prose. | |
| #### **Key terms** *(when AI key terms is on)* | |
| A list of term + definition rows. Add, edit, or remove. Empty rows are dropped on download. | |
| #### **Math equations** *(when AI math is on)* | |
| A list of LaTeX expressions with optional labels and captions. Live-preview MathJax rendering on the right. | |
| #### **Include in final guide** (checkbox per section) | |
| Untick to drop a segment from the final document entirely. Useful for transitions, intro slides, or instructor-only frames the filter let through. | |
| ### Per-section status | |
| The order on every segment is: | |
| ``` | |
| [Slide image] | |
| β | |
| [Choose frame thumbnails] (if alternates) | |
| β | |
| [On-screen text from video] | |
| β | |
| [π Audio clip] (if Per-segment audio is on) | |
| β | |
| [Narration] | |
| β | |
| [Key terms] (if AI key terms is on) | |
| β | |
| [Math equations] (if AI math equations is on) | |
| ``` | |
| --- | |
| ## Output formats | |
| Whatever output format you pick (or whatever you click in the editor's toolbar), the result is **valid HTML you can email, host, or print**. Specifics: | |
| ### Single HTML | |
| - One `.html` file. Open in any browser. | |
| - Images and audio are encoded as base64 data URIs. | |
| - File can be 10β50 MB depending on slide count and audio. | |
| - Best for: emailing, archiving. | |
| ### Zip bundle | |
| - A `.zip` containing `study-guide.html` and a `static/` folder. | |
| - Images stay as `.jpg` files; audio as `.mp3`. | |
| - Best for: hosting on a static site, GitHub Pages, an LMS. | |
| ### Review & edit (in the editor) | |
| - The editor page itself. Won't be downloaded as-is. | |
| - From here, click any of the toolbar's download buttons. | |
| --- | |
| ## Tips for the best results | |
| **Use a clean transcript.** | |
| Auto-generated captions are usually fine, but they tend to misspell technical terms. A 5-minute pass over the `.srt` before uploading saves a lot of editing later. | |
| **Set scene-change sensitivity to your lecture style.** | |
| Hand-drawn whiteboard work needs lower (10β15). Slide-only decks work fine at 27. Talking-head with a single shared screen needs 35+. | |
| **Tick AI on-screen text for slides with colour callouts.** | |
| The local OCR engine struggles with white-text-on-coloured-background. Claude reads them perfectly. | |
| **Don't tick every AI feature on the first run.** | |
| Each costs API tokens. Section titles and alt text are the highest-leverage two β start with those. Add key terms / equations / OCR if specific gaps appear. | |
| **The "Include" checkbox is your friend.** | |
| It's faster to drop bad segments than to re-run the pipeline with stricter settings. | |
| **Save the JSON before downloading.** | |
| If you spend 20 minutes editing and then your browser tab closes, the JSON checkpoint is the only way to recover. | |
| --- | |
| ## Troubleshooting | |
| **"No usable visual segments found"** | |
| The scene detector found nothing or every frame was filtered as instructor-only. Try (a) lowering the scene-change sensitivity, or (b) raising the instructor-frame face threshold. | |
| **"Server is at the 8-job concurrency cap"** | |
| Someone else's job is currently running on the same server. Wait a minute and click **Generate** again. | |
| **"Video exceeds the 500 MB upload limit" / "Transcript exceeds the 5 MB upload limit"** | |
| Compress the video before uploading (HandBrake, ffmpeg) or split it. The transcript limit is huge β if you're hitting it, your captions file is likely a binary blob, not a real `.srt`. | |
| **Progress stalls on OCR for a long time** | |
| Tesseract is slow on the free CPU tier. A 30-min lecture takes 1β2 minutes for OCR. If it's been 5+ minutes, tick **Skip OCR** in advanced settings and re-run, then optionally turn on the AI **On-screen text** toggle instead β Claude is much faster. | |
| **The OCR text reads like garbage** | |
| Coloured callouts and decorative fonts confuse Tesseract. Tick **On-screen text (Claude)** under AI assist. | |
| **Equations render as raw LaTeX** | |
| Make sure MathJax loaded β check the browser console. If you're behind a corporate firewall that blocks `cdn.jsdelivr.net`, MathJax won't load. | |
| **The instructor's face is showing as the slide image** | |
| Click an alternate thumbnail under that slide. The app keeps up to two alternates per scene precisely for this case. | |
| **Edits got lost when I refreshed** | |
| Always **Save edits (JSON)** before refreshing, closing the tab, or stepping away. The editor doesn't persist to localStorage. | |
| **Downloaded HTML won't open** | |
| Some email clients block `.html` attachments. Zip the file or use the Zip bundle output. | |
| **Audio plays in the editor but not in the downloaded file** | |
| The download path inlines audio as data URIs. If you downloaded the zip bundle, the audio is in `static/segment_*.mp3` β check your browser console for missing-file errors. | |
| --- | |
| If you hit something this guide doesn't cover, hand the error message to whoever set up the app for you. Most issues are resolved by lowering the scene-change sensitivity or ticking the **On-screen text (Claude)** toggle. | |