final / data /README.md
k22056537
feat: UI nav, onboarding, L2CS weights path + torch.load; trim dev files
a75bb5a

data/

Layout

One directory per contributor: collected_<name>/ with one or more .npz files per session.
collect_features.py appends timestamped files when someone records again (e.g. collected_Kexin/ has two sessions).

Each .npz holds:

  • features — N×17 (training uses 10 of these for the face_orientation set; see data_preparation/)
  • labels — 0 = unfocused, 1 = focused (live key presses while recording)
  • feature_names — names for all 17 columns

What we have (pooled)

Roughly 144.8k samples from 10 .npz sessions across 9 people. Session sizes vary a lot (~8.7k–17.6k samples), so the pool isn’t one uniform block — different setups, days, and recording lengths.

Aspect Snapshot
Labels 55.8k unfocused / ~89.0k focused (39% / ~61%)
Temporal mix Hundreds of focus ↔ unfocus transitions in the pooled timeline (not one long stuck label)
Signals Same 10 inference features as in production: head deviation, face/eye scores, horizontal gaze, pitch, EAR (left/avg/right), gaze offset, PERCLOS — pose + eyes + short-window drowsiness

Run data_preparation/data_exploration.ipynb for histograms, label-over-time plots, feature–label correlations, correlation matrix, and the small quality checklist (sample count, class balance band, transition count).

Collect more

python -m models.collect_features --name yourname

Webcam + overlay: 1 = focused, 0 = unfocused, p = pause, q = save and quit.