Spaces:
Sleeping
Sleeping
| # data/ | |
| ## Layout | |
| One directory per contributor: `collected_<name>/` with one or more `.npz` files per session. | |
| `collect_features.py` appends timestamped files when someone records again (e.g. `collected_Kexin/` has two sessions). | |
| Each `.npz` holds: | |
| - `features` — N×17 (training uses **10** of these for the `face_orientation` set; see `data_preparation/`) | |
| - `labels` — 0 = unfocused, 1 = focused (live key presses while recording) | |
| - `feature_names` — names for all 17 columns | |
| ## What we have (pooled) | |
| Roughly **144.8k** samples from **10** `.npz` sessions across **9** people. Session sizes vary a lot (~8.7k–17.6k samples), so the pool isn’t one uniform block — different setups, days, and recording lengths. | |
| | Aspect | Snapshot | | |
| |--------|----------| | |
| | **Labels** | ~55.8k unfocused / ~89.0k focused (~39% / ~61%) | | |
| | **Temporal mix** | Hundreds of focus ↔ unfocus **transitions** in the pooled timeline (not one long stuck label) | | |
| | **Signals** | Same 10 inference features as in production: head deviation, face/eye scores, horizontal gaze, pitch, EAR (left/avg/right), gaze offset, PERCLOS — pose + eyes + short-window drowsiness | | |
| Run **`data_preparation/data_exploration.ipynb`** for histograms, label-over-time plots, feature–label correlations, correlation matrix, and the small quality checklist (sample count, class balance band, transition count). | |
| ## Collect more | |
| ```bash | |
| python -m models.collect_features --name yourname | |
| ``` | |
| Webcam + overlay: **1** = focused, **0** = unfocused, **p** = pause, **q** = save and quit. | |