Spaces:

FocusGuard
/

final

Sleeping

App Files Files Community

final / data /README.md

k22056537

feat: UI nav, onboarding, L2CS weights path + torch.load; trim dev files

a75bb5a 4 days ago

preview code

raw

history blame contribute delete

1.57 kB

	# data/

	## Layout

	One directory per contributor: `collected_<name>/` with one or more `.npz` files per session.
	`collect_features.py` appends timestamped files when someone records again (e.g. `collected_Kexin/` has two sessions).

	Each `.npz` holds:

	- `features` — N×17 (training uses 10 of these for the `face_orientation` set; see `data_preparation/`)
	- `labels` — 0 = unfocused, 1 = focused (live key presses while recording)
	- `feature_names` — names for all 17 columns

	## What we have (pooled)

	Roughly 144.8k samples from 10 `.npz` sessions across 9 people. Session sizes vary a lot (~8.7k–17.6k samples), so the pool isn’t one uniform block — different setups, days, and recording lengths.

	\| Aspect \| Snapshot \|
	\|--------\|----------\|
	\| Labels \| ~55.8k unfocused / ~89.0k focused (~39% / ~61%) \|
	\| Temporal mix \| Hundreds of focus ↔ unfocus transitions in the pooled timeline (not one long stuck label) \|
	\| Signals \| Same 10 inference features as in production: head deviation, face/eye scores, horizontal gaze, pitch, EAR (left/avg/right), gaze offset, PERCLOS — pose + eyes + short-window drowsiness \|

	Run `data_preparation/data_exploration.ipynb` for histograms, label-over-time plots, feature–label correlations, correlation matrix, and the small quality checklist (sample count, class balance band, transition count).

	## Collect more

	```bash
	python -m models.collect_features --name yourname
	```

	Webcam + overlay: 1 = focused, 0 = unfocused, p = pause, q = save and quit.