Spaces:

joemartis
/

Video2Guide

Running

App Files Files Community

Video2Guide / USER_GUIDE.md

Claude

Add USER_GUIDE.md: end-user walkthrough of upload, AI options, editor, downloads

2403a32 1 day ago

preview code

raw

history blame contribute delete

13.7 kB

VideoGuideMaker — User Guide

Turn a recorded lecture into an accessible, navigable study guide in a few minutes.

You hand it a video and a transcript. It picks the visually distinct moments (slide changes, demos, board work), pulls the on-screen text, and produces a polished HTML document — one section per slide — with timestamps, narration, key terms, and equations. You can review and edit before publishing.

Quick start
What you'll need before you upload
The upload page, field by field
What happens after you click Generate
The review & edit screen
Output formats
Tips for the best results
Troubleshooting

Quick start

Open the site.
Drop in your video and your transcript.
(Optional) paste an Anthropic API key and tick the AI features you want.
Click Generate study guide →.
Wait for the progress bar.
On the editor screen, fix any rough spots, then click Download final HTML or Download ZIP.

That's it. The rest of this guide is detail.

What you'll need before you upload

You need	What it looks like	Notes
A lecture video	`.mp4`, `.mov`, `.mkv`, `.webm`, `.m4v`, `.avi`	Up to 500 MB. A 30-minute lecture is typical.
A transcript	`.srt` or `.vtt` (subtitle file)	Up to 5 MB. Required — the app aligns narration to slides using your transcript's timestamps.
(Optional) an Anthropic API key	Starts with `sk-ant-…`	Only needed if you want the AI features. The key is used per-job and is never stored.

Don't have a transcript yet? YouTube auto-captions, Descript, Otter, and Zoom all export .srt or .vtt. Quality matters — bad captions = bad section alignment.

The upload page, field by field

The page is divided into numbered sections.

1. Upload

Study guide title — the heading of the finished document. Defaults to "Lecture Study Guide". Change it.

Video file — drag and drop, or click to browse.

Transcript — same.

2. AI assist (optional)

Everything in this section requires an Anthropic API key. Skip this section entirely if you don't have one — the rest of the app still works.

Anthropic API key — paste your key here. Stored only for this job, never logged or saved on the server.

The five toggles below decide what Claude does for each frame. Each adds a small per-frame cost (a few cents for a typical lecture).

Toggle	What Claude produces	Recommendation
Section titles	A short, descriptive title for each segment ("Combining AI models") instead of the default "Segment 3 — 4:25".	Tick this — it makes the table of contents readable.
Alt-text drafts	Accessible alt text describing the purpose of each visual, not just a transcription.	Tick if you'll publish the guide where accessibility matters.
Key terms	A list of named concepts the lecturer is teaching, with short definitions.	Tick for content-rich lectures.
Math equations	LaTeX for equations on the slide, plus a screen-reader description.	Tick if your slides contain math.
On-screen text (Claude)	Replaces the local OCR engine with Claude's reading. Much better on coloured callouts, decorative fonts, and rendered math.	Tick if your slides have any visual flair beyond plain black-on-white text.

You can pick any combination. Toggles are independent.

3. Media extras

Per-segment audio — slices the lecture's narration into one MP3 clip per segment so learners can re-listen. Adds processing time and ~500 KB per clip. No API key needed. Off by default.

4. Output

Pick exactly one format:

Format	Best for	Notes
Review & edit	Most cases.	Opens an in-browser editor; download from there once you're happy.
Single HTML	Quick share, no review.	One self-contained `.html` file with images and audio embedded. Just emails.
Zip bundle	Hosting on a website.	HTML plus a `static/` folder of images and audio.

💡 You can pick "Review & edit" and still get any of the other formats from the editor's toolbar — no need to re-run the pipeline.

Advanced settings (collapsed by default)

Only open this if the defaults don't work for your video. All of these have sane defaults.

Scene-change sensitivity (default 27) How visually different two frames must be before the app counts a new slide. Lower (5–15) catches every animation step — good if your lecturer uses progressive reveals. Higher (35+) only major slide changes — good if you want one frame per slide.

Minimum gap between scenes (default off) Drops new scenes that arrive less than X seconds after the previous one. Useful when a slide builds up over several seconds — set to ~5 s to keep just the final state.

Max frames (default unlimited) Hard cap on how many frames the guide will contain. Useful for very long lectures or to bound API cost.

Instructor-frame face threshold (default 0.12) The app automatically drops frames that look like a talking-head shot of the instructor (no slide content). This slider controls how aggressive that filter is. Lower = drops more. Raise it if your slides have an inset webcam you want to keep.

Skip OCR (off) Faster, but on-screen text won't appear in the guide. Doesn't affect the AI on-screen-text toggle (Claude reads the image directly).

Document language (default en) A BCP-47 code like en, en-US, fr, de. Sets the document language and selects local OCR language packs.

What happens after you click Generate

You'll see a progress bar with stages:

Detecting scenes — finds visually distinct moments in the video.
Filtering instructor frames — drops talking-head shots.
Loading transcript — reads your captions.
Running OCR — extracts on-screen text from each kept frame.
LLM extraction (only if you ticked any AI toggle) — sends each frame to Claude.
Slicing audio (only if you ticked Per-segment audio) — extracts MP3 clips.
Building segments / Rendering — produces the final HTML.

Typical wall time: about 30 s per minute of video without AI; about 60 s per minute with AI features on.

If anything goes wrong, you'll see a red error banner with a message. Click Try again to retry without re-uploading (your form values are preserved).

The review & edit screen

If you picked the Review & edit output, this is where you'll spend most of your time.

The screen has three areas:

The toolbar (top, sticky)

Button	What it does
Save edits (JSON)	Downloads a small `.edits.json` file with everything you've changed. Useful as a checkpoint or for handing the edits to someone else to re-render with the CLI.
Load edits (JSON)	Re-applies a previously-saved `.edits.json` to the current page.
Preview	Opens a clean copy of the guide (no editor chrome) in a new tab so you can see the final look.
Download final HTML	Saves the finished guide as a single self-contained `.html` file. Images and audio inlined.
Download ZIP	Saves the finished guide as a zip with separate asset files. Better for hosting on a website.

Status messages appear next to the buttons.

The document body

Edit anything that's underlined or has a placeholder. Specifically, per segment you can:

Section title

The heading. Editable text field. If you ticked Section titles AI, it'll have Claude's draft; otherwise it's a default like "Segment 3 — 4:25".

Alt-text

The accessible description of the slide image. Editable text field. Drafts marked as "needs review" produce an explicit author-warning banner at the top of the page until you confirm them.

Choose frame (when alternates exist)

A row of thumbnails under the slide. The app captures up to three candidate frames per scene; click any thumbnail to swap which one is the canonical image. The on-screen text panel updates automatically to match the picked frame.

On-screen text from video

A collapsible panel under the slide. The OCR'd (or Claude-read, if you enabled it) text from the slide. Editable — fix mis-reads here.

(Audio clip)

An audio player, only present if you ticked Per-segment audio. Plays the lecturer's narration for just this segment.

Narration

The transcript text aligned to this segment. Editable — clean up speech disfluencies, fix homophone errors, or rewrite into prose.

Key terms (when AI key terms is on)

A list of term + definition rows. Add, edit, or remove. Empty rows are dropped on download.

Math equations (when AI math is on)

A list of LaTeX expressions with optional labels and captions. Live-preview MathJax rendering on the right.

Include in final guide (checkbox per section)

Untick to drop a segment from the final document entirely. Useful for transitions, intro slides, or instructor-only frames the filter let through.

Per-section status

The order on every segment is:

[Slide image]
   ↓
[Choose frame thumbnails]   (if alternates)
   ↓
[On-screen text from video]
   ↓
[🔊 Audio clip]              (if Per-segment audio is on)
   ↓
[Narration]
   ↓
[Key terms]                  (if AI key terms is on)
   ↓
[Math equations]             (if AI math equations is on)

Output formats

Whatever output format you pick (or whatever you click in the editor's toolbar), the result is valid HTML you can email, host, or print. Specifics:

Single HTML

One .html file. Open in any browser.
Images and audio are encoded as base64 data URIs.
File can be 10–50 MB depending on slide count and audio.
Best for: emailing, archiving.

Zip bundle

A .zip containing study-guide.html and a static/ folder.
Images stay as .jpg files; audio as .mp3.
Best for: hosting on a static site, GitHub Pages, an LMS.

Review & edit (in the editor)

The editor page itself. Won't be downloaded as-is.
From here, click any of the toolbar's download buttons.

Tips for the best results

Use a clean transcript. Auto-generated captions are usually fine, but they tend to misspell technical terms. A 5-minute pass over the .srt before uploading saves a lot of editing later.

Set scene-change sensitivity to your lecture style. Hand-drawn whiteboard work needs lower (10–15). Slide-only decks work fine at 27. Talking-head with a single shared screen needs 35+.

Tick AI on-screen text for slides with colour callouts. The local OCR engine struggles with white-text-on-coloured-background. Claude reads them perfectly.

Don't tick every AI feature on the first run. Each costs API tokens. Section titles and alt text are the highest-leverage two — start with those. Add key terms / equations / OCR if specific gaps appear.

The "Include" checkbox is your friend. It's faster to drop bad segments than to re-run the pipeline with stricter settings.

Save the JSON before downloading. If you spend 20 minutes editing and then your browser tab closes, the JSON checkpoint is the only way to recover.

Troubleshooting

"No usable visual segments found" The scene detector found nothing or every frame was filtered as instructor-only. Try (a) lowering the scene-change sensitivity, or (b) raising the instructor-frame face threshold.

"Server is at the 8-job concurrency cap" Someone else's job is currently running on the same server. Wait a minute and click Generate again.

"Video exceeds the 500 MB upload limit" / "Transcript exceeds the 5 MB upload limit" Compress the video before uploading (HandBrake, ffmpeg) or split it. The transcript limit is huge — if you're hitting it, your captions file is likely a binary blob, not a real .srt.

Progress stalls on OCR for a long time Tesseract is slow on the free CPU tier. A 30-min lecture takes 1–2 minutes for OCR. If it's been 5+ minutes, tick Skip OCR in advanced settings and re-run, then optionally turn on the AI On-screen text toggle instead — Claude is much faster.

The OCR text reads like garbage Coloured callouts and decorative fonts confuse Tesseract. Tick On-screen text (Claude) under AI assist.

Equations render as raw LaTeX Make sure MathJax loaded — check the browser console. If you're behind a corporate firewall that blocks cdn.jsdelivr.net, MathJax won't load.

The instructor's face is showing as the slide image Click an alternate thumbnail under that slide. The app keeps up to two alternates per scene precisely for this case.

Edits got lost when I refreshed Always Save edits (JSON) before refreshing, closing the tab, or stepping away. The editor doesn't persist to localStorage.

Downloaded HTML won't open Some email clients block .html attachments. Zip the file or use the Zip bundle output.

Audio plays in the editor but not in the downloaded file The download path inlines audio as data URIs. If you downloaded the zip bundle, the audio is in static/segment_*.mp3 — check your browser console for missing-file errors.

If you hit something this guide doesn't cover, hand the error message to whoever set up the app for you. Most issues are resolved by lowering the scene-change sensitivity or ticking the On-screen text (Claude) toggle.