Spaces:

FocusGuard
/

test_final

Sleeping

App Files Files Community

test_final / evaluation /THRESHOLD_JUSTIFICATION.md

k22056537

feat: sync integration updates across app and ML pipeline

eb4abb8 about 1 month ago

preview code

raw

history blame contribute delete

4.4 kB

Threshold Justification Report

Auto-generated by evaluation/justify_thresholds.py using LOPO cross-validation over 9 participants (~145k samples).

0. Latest random split checkpoints (15% test split)

From the latest training runs:

Model	Accuracy	F1	ROC-AUC
XGBoost	95.87%	0.9585	0.9908
MLP	92.92%	0.9287	0.9714

1. ML Model Decision Thresholds

XGBoost config used for this report: {'n_estimators': 600, 'max_depth': 8, 'learning_rate': 0.1489, 'subsample': 0.9625, 'colsample_bytree': 0.9013, 'reg_alpha': 1.1407, 'reg_lambda': 2.4181, 'eval_metric': 'logloss'}.

Thresholds selected via Youden's J statistic (J = sensitivity + specificity - 1) on pooled LOPO held-out predictions.

Model	LOPO AUC	Optimal Threshold (Youden's J)	F1 @ Optimal	F1 @ 0.50
MLP	0.8624	0.228	0.8578	0.8149
XGBoost	0.8695	0.280	0.8549	0.8324

2. Geometric Pipeline Weights (s_face vs s_eye)

Grid search over face weight alpha in {0.2 ... 0.8}. Eye weight = 1 - alpha. Threshold per fold via Youden's J.

Face Weight (alpha)	Mean LOPO F1
0.2	0.7926
0.3	0.8002
0.4	0.7719
0.5	0.7868
0.6	0.8184
0.7	0.8195 <-- selected
0.8	0.8126

Best: alpha = 0.7 (face 70%, eye 30%)

3. Hybrid Pipeline Weights (MLP vs Geometric)

Grid search over w_mlp in {0.3 ... 0.8}. w_geo = 1 - w_mlp. Geometric sub-score uses same weights as geometric pipeline (face=0.7, eye=0.3). If you change geometric weights, re-run this script — optimal w_mlp can shift.

MLP Weight (w_mlp)	Mean LOPO F1
0.3	0.8409 <-- selected
0.4	0.8246
0.5	0.8164
0.6	0.8106
0.7	0.8039
0.8	0.8016

Best: w_mlp = 0.3 (MLP 30%, geometric 70%)

4. Eye and Mouth Aspect Ratio Thresholds

EAR (Eye Aspect Ratio)

Reference: Soukupova & Cech, "Real-Time Eye Blink Detection Using Facial Landmarks" (2016) established EAR ~ 0.2 as a blink threshold.

Our thresholds define a linear interpolation zone around this established value:

Constant	Value	Justification
`ear_closed`	0.16	Below this, eyes are fully shut. 16.3% of samples fall here.
`EAR_BLINK_THRESH`	0.21	Blink detection point; close to the 0.2 reference. 21.2% of samples below.
`ear_open`	0.30	Above this, eyes are fully open. 70.4% of samples here.

Between 0.16 and 0.30 the _ear_score function linearly interpolates from 0 to 1, providing a smooth transition rather than a hard binary cutoff.

MAR (Mouth Aspect Ratio)

Constant	Value	Justification
`MAR_YAWN_THRESHOLD`	0.55	Only 1.7% of samples exceed this, confirming it captures genuine yawns without false positives.

5. Other Constants

Constant	Value	Rationale
`gaze_max_offset`	0.28	Max iris displacement (normalised) before gaze score drops to zero. Corresponds to ~56% of the eye width; beyond this the iris is at the extreme edge.
`max_angle`	22.0 deg	Head deviation beyond which face score = 0. Based on typical monitor-viewing cone: at 60 cm distance and a 24" monitor, the viewing angle is ~20-25 degrees.
`roll_weight`	0.5	Roll is less indicative of inattention than yaw/pitch (tilting head doesn't mean looking away), so it's down-weighted by 50%.
`EMA alpha`	0.3	Smoothing factor for focus score. Gives ~3-4 frame effective window; balances responsiveness vs flicker.
`grace_frames`	15	~0.5 s at 30 fps before penalising no-face. Allows brief occlusions (e.g. hand gesture) without dropping score.
`PERCLOS_WINDOW`	60 frames	2 s at 30 fps; standard PERCLOS measurement window (Dinges & Grace, 1998).
`BLINK_WINDOW_SEC`	30 s	Blink rate measured over 30 s; typical spontaneous blink rate is 15-20/min (Bentivoglio et al., 1997).