Spaces:
Running
Running
data/
Layout
One directory per contributor: collected_<name>/ with one or more .npz files per session.collect_features.py appends timestamped files when someone records again (e.g. collected_Kexin/ has two sessions).
Each .npz holds:
features— N×17 (training uses 10 of these for theface_orientationset; seedata_preparation/)labels— 0 = unfocused, 1 = focused (live key presses while recording)feature_names— names for all 17 columns
What we have (pooled)
Roughly 144.8k samples from 10 .npz sessions across 9 people. Session sizes vary a lot (~8.7k–17.6k samples), so the pool isn’t one uniform block — different setups, days, and recording lengths.
| Aspect | Snapshot |
|---|---|
| Labels | |
| Temporal mix | Hundreds of focus ↔ unfocus transitions in the pooled timeline (not one long stuck label) |
| Signals | Same 10 inference features as in production: head deviation, face/eye scores, horizontal gaze, pitch, EAR (left/avg/right), gaze offset, PERCLOS — pose + eyes + short-window drowsiness |
Run data_preparation/data_exploration.ipynb for histograms, label-over-time plots, feature–label correlations, correlation matrix, and the small quality checklist (sample count, class balance band, transition count).
Collect more
python -m models.collect_features --name yourname
Webcam + overlay: 1 = focused, 0 = unfocused, p = pause, q = save and quit.