Spaces:

FocusGuard
/

final_test

Sleeping

App Files Files Community

final_test / evaluation /THRESHOLD_JUSTIFICATION.md

Abdelrahman Almatrooshi

Deploy snapshot from main b7a59b11809483dfc959f196f1930240f2662c49

22a6915 about 1 month ago

preview code

raw

history blame contribute delete

4.4 kB

	# Threshold Justification Report

	Auto-generated by `evaluation/justify_thresholds.py` using LOPO cross-validation over 9 participants (~145k samples).

	## 0. Latest random split checkpoints (15% test split)

	From the latest training runs:

	\| Model \| Accuracy \| F1 \| ROC-AUC \|
	\|-------\|----------\|-----\|---------\|
	\| XGBoost \| 95.87% \| 0.9585 \| 0.9908 \|
	\| MLP \| 92.92% \| 0.9287 \| 0.9714 \|

	## 1. ML Model Decision Thresholds

	XGBoost config used for this report: `{'n_estimators': 600, 'max_depth': 8, 'learning_rate': 0.1489, 'subsample': 0.9625, 'colsample_bytree': 0.9013, 'reg_alpha': 1.1407, 'reg_lambda': 2.4181, 'eval_metric': 'logloss'}`.

	Thresholds selected via Youden's J statistic (J = sensitivity + specificity - 1) on pooled LOPO held-out predictions.

	\| Model \| LOPO AUC \| Optimal Threshold (Youden's J) \| F1 @ Optimal \| F1 @ 0.50 \|
	\|-------\|----------\|-------------------------------\|--------------\|-----------\|
	\| MLP \| 0.8624 \| 0.228 \| 0.8578 \| 0.8149 \|
	\| XGBoost \| 0.8695 \| 0.280 \| 0.8549 \| 0.8324 \|

	![MLP ROC](plots/roc_mlp.png)

	![XGBoost ROC](plots/roc_xgboost.png)

	## 2. Geometric Pipeline Weights (s_face vs s_eye)

	Grid search over face weight alpha in {0.2 ... 0.8}. Eye weight = 1 - alpha. Threshold per fold via Youden's J.

	\| Face Weight (alpha) \| Mean LOPO F1 \|
	\|--------------------:\|-------------:\|
	\| 0.2 \| 0.7926 \|
	\| 0.3 \| 0.8002 \|
	\| 0.4 \| 0.7719 \|
	\| 0.5 \| 0.7868 \|
	\| 0.6 \| 0.8184 \|
	\| 0.7 \| 0.8195 <-- selected \|
	\| 0.8 \| 0.8126 \|

	Best: alpha = 0.7 (face 70%, eye 30%)

	![Geometric weight search](plots/geo_weight_search.png)

	## 3. Hybrid Pipeline Weights (MLP vs Geometric)

	Grid search over w_mlp in {0.3 ... 0.8}. w_geo = 1 - w_mlp. Geometric sub-score uses same weights as geometric pipeline (face=0.7, eye=0.3). If you change geometric weights, re-run this script — optimal w_mlp can shift.

	\| MLP Weight (w_mlp) \| Mean LOPO F1 \|
	\|-------------------:\|-------------:\|
	\| 0.3 \| 0.8409 <-- selected \|
	\| 0.4 \| 0.8246 \|
	\| 0.5 \| 0.8164 \|
	\| 0.6 \| 0.8106 \|
	\| 0.7 \| 0.8039 \|
	\| 0.8 \| 0.8016 \|

	Best: w_mlp = 0.3 (MLP 30%, geometric 70%)

	![Hybrid weight search](plots/hybrid_weight_search.png)

	## 4. Eye and Mouth Aspect Ratio Thresholds

	### EAR (Eye Aspect Ratio)

	Reference: Soukupova & Cech, "Real-Time Eye Blink Detection Using Facial Landmarks" (2016) established EAR ~ 0.2 as a blink threshold.

	Our thresholds define a linear interpolation zone around this established value:

	\| Constant \| Value \| Justification \|
	\|----------\|------:\|---------------\|
	\| `ear_closed` \| 0.16 \| Below this, eyes are fully shut. 16.3% of samples fall here. \|
	\| `EAR_BLINK_THRESH` \| 0.21 \| Blink detection point; close to the 0.2 reference. 21.2% of samples below. \|
	\| `ear_open` \| 0.30 \| Above this, eyes are fully open. 70.4% of samples here. \|

	Between 0.16 and 0.30 the `_ear_score` function linearly interpolates from 0 to 1, providing a smooth transition rather than a hard binary cutoff.

	![EAR distribution](plots/ear_distribution.png)

	### MAR (Mouth Aspect Ratio)

	\| Constant \| Value \| Justification \|
	\|----------\|------:\|---------------\|
	\| `MAR_YAWN_THRESHOLD` \| 0.55 \| Only 1.7% of samples exceed this, confirming it captures genuine yawns without false positives. \|

	![MAR distribution](plots/mar_distribution.png)

	## 5. Other Constants

	\| Constant \| Value \| Rationale \|
	\|----------\|------:\|-----------\|
	\| `gaze_max_offset` \| 0.28 \| Max iris displacement (normalised) before gaze score drops to zero. Corresponds to ~56% of the eye width; beyond this the iris is at the extreme edge. \|
	\| `max_angle` \| 22.0 deg \| Head deviation beyond which face score = 0. Based on typical monitor-viewing cone: at 60 cm distance and a 24" monitor, the viewing angle is ~20-25 degrees. \|
	\| `roll_weight` \| 0.5 \| Roll is less indicative of inattention than yaw/pitch (tilting head doesn't mean looking away), so it's down-weighted by 50%. \|
	\| `EMA alpha` \| 0.3 \| Smoothing factor for focus score. Gives ~3-4 frame effective window; balances responsiveness vs flicker. \|
	\| `grace_frames` \| 15 \| ~0.5 s at 30 fps before penalising no-face. Allows brief occlusions (e.g. hand gesture) without dropping score. \|
	\| `PERCLOS_WINDOW` \| 60 frames \| 2 s at 30 fps; standard PERCLOS measurement window (Dinges & Grace, 1998). \|
	\| `BLINK_WINDOW_SEC` \| 30 s \| Blink rate measured over 30 s; typical spontaneous blink rate is 15-20/min (Bentivoglio et al., 1997). \|

	# Threshold Justification Report

	Auto-generated by `evaluation/justify_thresholds.py` using LOPO cross-validation over 9 participants (~145k samples).

	## 0. Latest random split checkpoints (15% test split)

	From the latest training runs:

	\| Model \| Accuracy \| F1 \| ROC-AUC \|
	\|-------\|----------\|-----\|---------\|
	\| XGBoost \| 95.87% \| 0.9585 \| 0.9908 \|
	\| MLP \| 92.92% \| 0.9287 \| 0.9714 \|

	## 1. ML Model Decision Thresholds

	XGBoost config used for this report: `{'n_estimators': 600, 'max_depth': 8, 'learning_rate': 0.1489, 'subsample': 0.9625, 'colsample_bytree': 0.9013, 'reg_alpha': 1.1407, 'reg_lambda': 2.4181, 'eval_metric': 'logloss'}`.

	Thresholds selected via Youden's J statistic (J = sensitivity + specificity - 1) on pooled LOPO held-out predictions.

	\| Model \| LOPO AUC \| Optimal Threshold (Youden's J) \| F1 @ Optimal \| F1 @ 0.50 \|
	\|-------\|----------\|-------------------------------\|--------------\|-----------\|
	\| MLP \| 0.8624 \| 0.228 \| 0.8578 \| 0.8149 \|
	\| XGBoost \| 0.8695 \| 0.280 \| 0.8549 \| 0.8324 \|

	![MLP ROC](plots/roc_mlp.png)

	![XGBoost ROC](plots/roc_xgboost.png)

	## 2. Geometric Pipeline Weights (s_face vs s_eye)

	Grid search over face weight alpha in {0.2 ... 0.8}. Eye weight = 1 - alpha. Threshold per fold via Youden's J.

	\| Face Weight (alpha) \| Mean LOPO F1 \|
	\|--------------------:\|-------------:\|
	\| 0.2 \| 0.7926 \|
	\| 0.3 \| 0.8002 \|
	\| 0.4 \| 0.7719 \|
	\| 0.5 \| 0.7868 \|
	\| 0.6 \| 0.8184 \|
	\| 0.7 \| 0.8195 <-- selected \|
	\| 0.8 \| 0.8126 \|

	Best: alpha = 0.7 (face 70%, eye 30%)

	![Geometric weight search](plots/geo_weight_search.png)

	## 3. Hybrid Pipeline Weights (MLP vs Geometric)

	Grid search over w_mlp in {0.3 ... 0.8}. w_geo = 1 - w_mlp. Geometric sub-score uses same weights as geometric pipeline (face=0.7, eye=0.3). If you change geometric weights, re-run this script — optimal w_mlp can shift.

	\| MLP Weight (w_mlp) \| Mean LOPO F1 \|
	\|-------------------:\|-------------:\|
	\| 0.3 \| 0.8409 <-- selected \|
	\| 0.4 \| 0.8246 \|
	\| 0.5 \| 0.8164 \|
	\| 0.6 \| 0.8106 \|
	\| 0.7 \| 0.8039 \|
	\| 0.8 \| 0.8016 \|

	Best: w_mlp = 0.3 (MLP 30%, geometric 70%)

	![Hybrid weight search](plots/hybrid_weight_search.png)

	## 4. Eye and Mouth Aspect Ratio Thresholds

	### EAR (Eye Aspect Ratio)

	Reference: Soukupova & Cech, "Real-Time Eye Blink Detection Using Facial Landmarks" (2016) established EAR ~ 0.2 as a blink threshold.

	Our thresholds define a linear interpolation zone around this established value:

	\| Constant \| Value \| Justification \|
	\|----------\|------:\|---------------\|
	\| `ear_closed` \| 0.16 \| Below this, eyes are fully shut. 16.3% of samples fall here. \|
	\| `EAR_BLINK_THRESH` \| 0.21 \| Blink detection point; close to the 0.2 reference. 21.2% of samples below. \|
	\| `ear_open` \| 0.30 \| Above this, eyes are fully open. 70.4% of samples here. \|

	Between 0.16 and 0.30 the `_ear_score` function linearly interpolates from 0 to 1, providing a smooth transition rather than a hard binary cutoff.

	![EAR distribution](plots/ear_distribution.png)

	### MAR (Mouth Aspect Ratio)

	\| Constant \| Value \| Justification \|
	\|----------\|------:\|---------------\|
	\| `MAR_YAWN_THRESHOLD` \| 0.55 \| Only 1.7% of samples exceed this, confirming it captures genuine yawns without false positives. \|

	![MAR distribution](plots/mar_distribution.png)

	## 5. Other Constants

	\| Constant \| Value \| Rationale \|
	\|----------\|------:\|-----------\|
	\| `gaze_max_offset` \| 0.28 \| Max iris displacement (normalised) before gaze score drops to zero. Corresponds to ~56% of the eye width; beyond this the iris is at the extreme edge. \|
	\| `max_angle` \| 22.0 deg \| Head deviation beyond which face score = 0. Based on typical monitor-viewing cone: at 60 cm distance and a 24" monitor, the viewing angle is ~20-25 degrees. \|
	\| `roll_weight` \| 0.5 \| Roll is less indicative of inattention than yaw/pitch (tilting head doesn't mean looking away), so it's down-weighted by 50%. \|
	\| `EMA alpha` \| 0.3 \| Smoothing factor for focus score. Gives ~3-4 frame effective window; balances responsiveness vs flicker. \|
	\| `grace_frames` \| 15 \| ~0.5 s at 30 fps before penalising no-face. Allows brief occlusions (e.g. hand gesture) without dropping score. \|
	\| `PERCLOS_WINDOW` \| 60 frames \| 2 s at 30 fps; standard PERCLOS measurement window (Dinges & Grace, 1998). \|
	\| `BLINK_WINDOW_SEC` \| 30 s \| Blink rate measured over 30 s; typical spontaneous blink rate is 15-20/min (Bentivoglio et al., 1997). \|