k22056537 commited on
Commit
8b47064
·
1 Parent(s): 6114098

chore: MLP pipeline, evaluation updates, feature importance, confusion matrices

Browse files
.gitignore CHANGED
@@ -41,3 +41,4 @@ test_focus_guard.db
41
  static/
42
  __pycache__/
43
  docs/
 
 
41
  static/
42
  __pycache__/
43
  docs/
44
+ docs
FOCUS_SCORE_EQUATIONS.md DELETED
@@ -1,147 +0,0 @@
1
- # How the focused/unfocused score is computed
2
-
3
- The system outputs a **focus score** in `[0, 1]` and a binary **focused/unfocused** label. The label is derived from the score and a threshold. The exact equation depends on which pipeline (model) you use.
4
-
5
- ---
6
-
7
- ## 1. Final output (all pipelines)
8
-
9
- - **`raw_score`** (or **`focus_score`** in Hybrid): value in `[0, 1]` after optional smoothing.
10
- - **`is_focused`**: binary label.
11
-
12
- **Equation:**
13
-
14
- ```text
15
- is_focused = (smoothed_score >= threshold)
16
- ```
17
-
18
- - **Smoothed score:** the pipeline may apply an exponential moving average (EMA) to the raw score; that smoothed value is what you see as `raw_score` / `focus_score` in the API.
19
- - **Threshold:** set in the UI (sensitivity) or in pipeline config; typical default **0.5** or **0.55**.
20
-
21
- So: **focus score** is the continuous value; **focused vs unfocused** is **score ≥ threshold** vs **score < threshold**.
22
-
23
- ---
24
-
25
- ## 2. Geometric pipeline (rule-based, no ML)
26
-
27
- **Raw score (before smoothing):**
28
-
29
- ```text
30
- raw = α · s_face + β · s_eye
31
- ```
32
-
33
- - Default: **α = 0.4**, **β = 0.6** (face weight 40%, eye weight 60%).
34
- - If **yawning** (MAR > 0.55): **raw = 0**.
35
-
36
- **Face score `s_face`** (head pose, from `HeadPoseEstimator`):
37
-
38
- - **deviation** = √( yaw² + pitch² + (0.5·roll)² )
39
- - **t** = min( deviation / max_angle , 1 ), with **max_angle = 22°** (default).
40
- - **s_face** = 0.5 · (1 + cos(π · t))
41
- → 1 when head is straight, 0 when deviation ≥ max_angle.
42
-
43
- **Eye score `s_eye`** (from `EyeBehaviourScorer`):
44
-
45
- - **EAR** = Eye Aspect Ratio (from landmarks); use **min(left_ear, right_ear)**.
46
- - **ear_s** = linear map of EAR to [0,1] between `ear_closed=0.16` and `ear_open=0.30`.
47
- - **Gaze:** horizontal/vertical gaze ratios from iris position; **offset** = distance from center (0.5, 0.5).
48
- - **gaze_s** = 0.5 · (1 + cos(π · t)), with **t** = min( offset / gaze_max_offset , 1 ), **gaze_max_offset = 0.28**.
49
- - **s_eye** = ear_s · gaze_s (or just ear_s if ear_s < 0.3).
50
-
51
- Then:
52
-
53
- ```text
54
- smoothed_score = EMA(raw)
55
- is_focused = (smoothed_score >= threshold)
56
- ```
57
-
58
- ---
59
-
60
- ## 3. MLP pipeline
61
-
62
- - Features are extracted (same 17-d feature vector as in training), clipped, then optionally extended (magnitudes, velocities, variances) and scaled with the **training-time scaler**.
63
- - The MLP outputs either:
64
- - **Probability of class 1 (focused):** `mlp_prob = predict_proba(X_sc)[0, 1]`, or
65
- - If no `predict_proba`: **mlp_prob = 1 if predict(X_sc) == 1 else 0**.
66
-
67
- **Equations:**
68
-
69
- ```text
70
- raw_score = mlp_prob (clipped to [0, 1])
71
- smoothed_score = EMA(raw_score)
72
- is_focused = (smoothed_score >= threshold)
73
- ```
74
-
75
- So the **focus score** is the **MLP’s estimated probability of being focused** (after optional smoothing).
76
-
77
- ---
78
-
79
- ## 4. XGBoost pipeline
80
-
81
- - Same feature extraction and clipping; uses the **same feature subset** as in XGBoost training (no runtime magnitude/velocity extension).
82
- - **prob** = `predict_proba(X)[0]` → **[P(unfocused), P(focused)]**.
83
-
84
- **Equations:**
85
-
86
- ```text
87
- raw_score = prob[1] (probability of focused class)
88
- smoothed_score = EMA(raw_score)
89
- is_focused = (smoothed_score >= threshold)
90
- ```
91
-
92
- So the **focus score** is the **XGBoost probability of the focused class**.
93
-
94
- ---
95
-
96
- ## 5. Hybrid pipeline (MLP + geometric)
97
-
98
- Combines the MLP’s probability with a geometric score, then applies a single threshold.
99
-
100
- **Geometric part:**
101
-
102
- ```text
103
- geo_score = geo_face_weight · s_face + geo_eye_weight · s_eye
104
- ```
105
-
106
- - Default: **geo_face_weight = 0.4**, **geo_eye_weight = 0.6**.
107
- - **s_face** and **s_eye** as in the Geometric pipeline (with optional yawn veto: if yawning, **geo_score = 0**).
108
- - **geo_score** is clipped to [0, 1].
109
-
110
- **MLP part:** same as MLP pipeline → **mlp_prob** in [0, 1].
111
-
112
- **Combined focus score (default weights):**
113
-
114
- ```text
115
- focus_score = w_mlp · mlp_prob + w_geo · geo_score
116
- ```
117
-
118
- - Default: **w_mlp = 0.7**, **w_geo = 0.3** (after normalising so weights sum to 1).
119
- - **focus_score** is clipped to [0, 1], then smoothed.
120
-
121
- **Equations:**
122
-
123
- ```text
124
- focus_score = clip( w_mlp · mlp_prob + w_geo · geo_score , 0 , 1 )
125
- smoothed_score = EMA(focus_score)
126
- is_focused = (smoothed_score >= threshold)
127
- ```
128
-
129
- Default **threshold** in hybrid config is **0.55**.
130
-
131
- ---
132
-
133
- ## 6. Summary table
134
-
135
- | Pipeline | Raw score formula | Focused condition |
136
- |-----------|--------------------------------------|-----------------------------|
137
- | Geometric | α·s_face + β·s_eye (0 if yawn) | smoothed ≥ threshold |
138
- | MLP | MLP P(focused) | smoothed ≥ threshold |
139
- | XGBoost | XGB P(focused) | smoothed ≥ threshold |
140
- | Hybrid | w_mlp·mlp_prob + w_geo·geo_score | smoothed ≥ threshold |
141
-
142
- **s_face** = head-pose score (cosine of normalised deviation).
143
- **s_eye** = eye score (EAR × gaze score, or blend with CNN).
144
- **geo_score** = geo_face_weight·s_face + geo_eye_weight·s_eye (with optional yawn veto).
145
- **EMA** = exponential moving average (e.g. α=0.3) for temporal smoothing.
146
-
147
- So: **focus score** is always a number in [0, 1]; **focused vs unfocused** is **score ≥ threshold** in all pipelines.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoints/{scaler_best.joblib → hybrid_combiner.joblib} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:02ed6b4c0d99e0254c6a740a949da2384db58ec7d3e6df6432b9bfcd3a296c71
3
- size 783
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e460c6ca8d2cadf37727456401a0d63028ba23cb6401f0835d869abfa2e053c
3
+ size 965
checkpoints/hybrid_focus_config.json CHANGED
@@ -1,10 +1,14 @@
1
  {
 
2
  "w_mlp": 0.3,
 
3
  "w_geo": 0.7,
4
- "threshold": 0.35,
5
  "use_yawn_veto": true,
6
  "geo_face_weight": 0.7,
7
  "geo_eye_weight": 0.3,
8
  "mar_yawn_threshold": 0.55,
9
- "metric": "f1"
10
- }
 
 
 
1
  {
2
+ "use_xgb": true,
3
  "w_mlp": 0.3,
4
+ "w_xgb": 0.3,
5
  "w_geo": 0.7,
6
+ "threshold": 0.46117913373775393,
7
  "use_yawn_veto": true,
8
  "geo_face_weight": 0.7,
9
  "geo_eye_weight": 0.3,
10
  "mar_yawn_threshold": 0.55,
11
+ "metric": "f1",
12
+ "combiner": "logistic",
13
+ "combiner_path": "/Users/mohammedalketbi22/GAP/Final/checkpoints/hybrid_combiner.joblib"
14
+ }
checkpoints/{model_best.joblib → meta_mlp.npz} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:183f2d4419e0eb1e58704e5a7312eb61e331523566d4dc551054a07b3aac7557
3
- size 5775881
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4771c61cdf0711aa640b4d600a0851d344414cd16c1c2f75afc90e3c6135d14b
3
+ size 840
checkpoints/scaler_mlp.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2038d5b051d4de303c5688b1b861a0b53b1307a52b9447bfa48e8c7ace749329
3
+ size 823
evaluation/README.md CHANGED
@@ -8,7 +8,9 @@ Training logs, threshold analysis, and performance metrics.
8
  logs/ # training run logs (JSON)
9
  plots/ # threshold justification figures (ROC, weight search, EAR/MAR)
10
  justify_thresholds.py # LOPO analysis script
11
- THRESHOLD_JUSTIFICATION.md # report (auto-generated by script)
 
 
12
  ```
13
 
14
  **Logs (when present):**
@@ -64,9 +66,14 @@ From repo root, with venv active. The script runs LOPO over 9 participants (~145
64
 
65
  Takes ~10–15 minutes. Re-run after changing data or pipeline weights (e.g. geometric face/eye); hybrid optimal w_mlp depends on the geometric sub-score weights.
66
 
67
- ## 4. Generated by
 
 
 
 
68
 
69
  - `python -m models.mlp.train` → MLP log in `logs/`
70
  - `python -m models.xgboost.train` → XGBoost log in `logs/`
71
  - `python -m evaluation.justify_thresholds` → plots + THRESHOLD_JUSTIFICATION.md
 
72
  - Notebooks in `notebooks/` can also write logs here
 
8
  logs/ # training run logs (JSON)
9
  plots/ # threshold justification figures (ROC, weight search, EAR/MAR)
10
  justify_thresholds.py # LOPO analysis script
11
+ feature_importance.py # XGBoost importance + leave-one-out ablation
12
+ THRESHOLD_JUSTIFICATION.md # report (auto-generated by justify_thresholds)
13
+ feature_selection_justification.md # report (auto-generated by feature_importance)
14
  ```
15
 
16
  **Logs (when present):**
 
66
 
67
  Takes ~10–15 minutes. Re-run after changing data or pipeline weights (e.g. geometric face/eye); hybrid optimal w_mlp depends on the geometric sub-score weights.
68
 
69
+ ## 4. Feature selection justification
70
+
71
+ Run `python -m evaluation.feature_importance` to compute XGBoost gain-based importance for the 10 face_orientation features and a leave-one-feature-out LOPO ablation. Writes **feature_selection_justification.md** with tables. Use this to justify the 10-of-17 feature set (ablation + importance; see PAPER_AUDIT §2.7).
72
+
73
+ ## 5. Generated by
74
 
75
  - `python -m models.mlp.train` → MLP log in `logs/`
76
  - `python -m models.xgboost.train` → XGBoost log in `logs/`
77
  - `python -m evaluation.justify_thresholds` → plots + THRESHOLD_JUSTIFICATION.md
78
+ - `python -m evaluation.feature_importance` → feature_selection_justification.md
79
  - Notebooks in `notebooks/` can also write logs here
evaluation/THRESHOLD_JUSTIFICATION.md CHANGED
@@ -15,7 +15,92 @@ Thresholds selected via **Youden's J statistic** (J = sensitivity + specificity
15
 
16
  ![XGBoost ROC](plots/roc_xgboost.png)
17
 
18
- ## 2. Geometric Pipeline Weights (s_face vs s_eye)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  Grid search over face weight alpha in {0.2 ... 0.8}. Eye weight = 1 - alpha. Threshold per fold via Youden's J.
21
 
@@ -33,9 +118,9 @@ Grid search over face weight alpha in {0.2 ... 0.8}. Eye weight = 1 - alpha. Thr
33
 
34
  ![Geometric weight search](plots/geo_weight_search.png)
35
 
36
- ## 3. Hybrid Pipeline Weights (MLP vs Geometric)
37
 
38
- Grid search over w_mlp in {0.3 ... 0.8}. w_geo = 1 - w_mlp. Geometric sub-score uses same weights as geometric pipeline (face=0.7, eye=0.3). If you change geometric weights, re-run this script — optimal w_mlp can shift.
39
 
40
  | MLP Weight (w_mlp) | Mean LOPO F1 |
41
  |-------------------:|-------------:|
@@ -46,11 +131,43 @@ Grid search over w_mlp in {0.3 ... 0.8}. w_geo = 1 - w_mlp. Geometric sub-score
46
  | 0.7 | 0.8039 |
47
  | 0.8 | 0.8016 |
48
 
49
- **Best:** w_mlp = 0.3 (MLP 30%, geometric 70%)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
- ![Hybrid weight search](plots/hybrid_weight_search.png)
52
 
53
- ## 4. Eye and Mouth Aspect Ratio Thresholds
54
 
55
  ### EAR (Eye Aspect Ratio)
56
 
@@ -76,7 +193,7 @@ Between 0.16 and 0.30 the `_ear_score` function linearly interpolates from 0 to
76
 
77
  ![MAR distribution](plots/mar_distribution.png)
78
 
79
- ## 5. Other Constants
80
 
81
  | Constant | Value | Rationale |
82
  |----------|------:|-----------|
 
15
 
16
  ![XGBoost ROC](plots/roc_xgboost.png)
17
 
18
+ ## 2. Precision, Recall and Tradeoff
19
+
20
+ At the optimal threshold (Youden's J), pooled over all LOPO held-out predictions:
21
+
22
+ | Model | Threshold | Precision | Recall | F1 | Accuracy |
23
+ |-------|----------:|----------:|-------:|---:|---------:|
24
+ | MLP | 0.228 | 0.8187 | 0.9008 | 0.8578 | 0.8164 |
25
+ | XGBoost | 0.377 | 0.8426 | 0.8750 | 0.8585 | 0.8228 |
26
+
27
+ Higher threshold → fewer positive predictions → higher precision, lower recall. Youden's J picks the threshold that balances sensitivity and specificity (recall for the positive class and true negative rate).
28
+
29
+ ## 3. Confusion Matrix (Pooled LOPO)
30
+
31
+ At optimal threshold. Rows = true label, columns = predicted label (0 = unfocused, 1 = focused).
32
+
33
+ ### MLP
34
+
35
+ | | Pred 0 | Pred 1 |
36
+ |--|-------:|-------:|
37
+ | **True 0** | 38065 (TN) | 17750 (FP) |
38
+ | **True 1** | 8831 (FN) | 80147 (TP) |
39
+
40
+ TN=38065, FP=17750, FN=8831, TP=80147.
41
+
42
+ ### XGBoost
43
+
44
+ | | Pred 0 | Pred 1 |
45
+ |--|-------:|-------:|
46
+ | **True 0** | 41271 (TN) | 14544 (FP) |
47
+ | **True 1** | 11118 (FN) | 77860 (TP) |
48
+
49
+ TN=41271, FP=14544, FN=11118, TP=77860.
50
+
51
+ ![Confusion MLP](plots/confusion_matrix_mlp.png)
52
+
53
+ ![Confusion XGBoost](plots/confusion_matrix_xgb.png)
54
+
55
+ ## 4. Per-Person Performance Variance (LOPO)
56
+
57
+ One fold per left-out person; metrics at optimal threshold.
58
+
59
+ ### MLP — per held-out person
60
+
61
+ | Person | Accuracy | F1 | Precision | Recall |
62
+ |--------|---------:|---:|----------:|-------:|
63
+ | Abdelrahman | 0.8628 | 0.9029 | 0.8760 | 0.9314 |
64
+ | Jarek | 0.8400 | 0.8770 | 0.8909 | 0.8635 |
65
+ | Junhao | 0.8872 | 0.8986 | 0.8354 | 0.9723 |
66
+ | Kexin | 0.7941 | 0.8123 | 0.7965 | 0.8288 |
67
+ | Langyuan | 0.5877 | 0.6169 | 0.4972 | 0.8126 |
68
+ | Mohamed | 0.8432 | 0.8653 | 0.7931 | 0.9519 |
69
+ | Yingtao | 0.8794 | 0.9263 | 0.9217 | 0.9309 |
70
+ | ayten | 0.8307 | 0.8986 | 0.8558 | 0.9459 |
71
+ | saba | 0.9192 | 0.9243 | 0.9260 | 0.9226 |
72
+
73
+ ### XGBoost — per held-out person
74
+
75
+ | Person | Accuracy | F1 | Precision | Recall |
76
+ |--------|---------:|---:|----------:|-------:|
77
+ | Abdelrahman | 0.8601 | 0.8959 | 0.9129 | 0.8795 |
78
+ | Jarek | 0.8680 | 0.8993 | 0.9070 | 0.8917 |
79
+ | Junhao | 0.9099 | 0.9180 | 0.8627 | 0.9810 |
80
+ | Kexin | 0.7363 | 0.7385 | 0.7906 | 0.6928 |
81
+ | Langyuan | 0.6738 | 0.6945 | 0.5625 | 0.9074 |
82
+ | Mohamed | 0.8868 | 0.8988 | 0.8529 | 0.9498 |
83
+ | Yingtao | 0.8711 | 0.9195 | 0.9347 | 0.9048 |
84
+ | ayten | 0.8451 | 0.9070 | 0.8654 | 0.9528 |
85
+ | saba | 0.9393 | 0.9421 | 0.9615 | 0.9235 |
86
+
87
+ ### Summary across persons
88
+
89
+ | Model | Accuracy mean ± std | F1 mean ± std | Precision mean ± std | Recall mean ± std |
90
+ |-------|---------------------|---------------|----------------------|-------------------|
91
+ | MLP | 0.8271 ± 0.0968 | 0.8580 ± 0.0968 | 0.8214 ± 0.1307 | 0.9067 ± 0.0572 |
92
+ | XGBoost | 0.8434 ± 0.0847 | 0.8682 ± 0.0879 | 0.8500 ± 0.1191 | 0.8981 ± 0.0836 |
93
+
94
+ ## 5. Confidence Intervals (95%, LOPO over 9 persons)
95
+
96
+ Mean ± half-width of 95% t-interval (df=8) for each metric across the 9 left-out persons.
97
+
98
+ | Model | F1 | Accuracy | Precision | Recall |
99
+ |-------|---:|--------:|----------:|-------:|
100
+ | MLP | 0.8580 [0.7835, 0.9326] | 0.8271 [0.7526, 0.9017] | 0.8214 [0.7207, 0.9221] | 0.9067 [0.8626, 0.9507] |
101
+ | XGBoost | 0.8682 [0.8005, 0.9358] | 0.8434 [0.7781, 0.9086] | 0.8500 [0.7583, 0.9417] | 0.8981 [0.8338, 0.9625] |
102
+
103
+ ## 6. Geometric Pipeline Weights (s_face vs s_eye)
104
 
105
  Grid search over face weight alpha in {0.2 ... 0.8}. Eye weight = 1 - alpha. Threshold per fold via Youden's J.
106
 
 
118
 
119
  ![Geometric weight search](plots/geo_weight_search.png)
120
 
121
+ ## 7. Hybrid Pipeline: MLP vs Geometric
122
 
123
+ Grid search over w_mlp in {0.3 ... 0.8}. w_geo = 1 - w_mlp. Geometric sub-score uses same weights as geometric pipeline (face=0.7, eye=0.3).
124
 
125
  | MLP Weight (w_mlp) | Mean LOPO F1 |
126
  |-------------------:|-------------:|
 
131
  | 0.7 | 0.8039 |
132
  | 0.8 | 0.8016 |
133
 
134
+ **Best:** w_mlp = 0.3 (MLP 30%, geometric 70%) → mean LOPO F1 = 0.8409
135
+
136
+ ![Hybrid MLP weight search](plots/hybrid_weight_search.png)
137
+
138
+ ## 8. Hybrid Pipeline: XGBoost vs Geometric
139
+
140
+ Same grid over w_xgb in {0.3 ... 0.8}. w_geo = 1 - w_xgb.
141
+
142
+ | XGBoost Weight (w_xgb) | Mean LOPO F1 |
143
+ |-----------------------:|-------------:|
144
+ | 0.3 | 0.8639 **<-- selected** |
145
+ | 0.4 | 0.8552 |
146
+ | 0.5 | 0.8451 |
147
+ | 0.6 | 0.8419 |
148
+ | 0.7 | 0.8382 |
149
+ | 0.8 | 0.8353 |
150
+
151
+ **Best:** w_xgb = 0.3 → mean LOPO F1 = 0.8639
152
+
153
+ ![Hybrid XGBoost weight search](plots/hybrid_xgb_weight_search.png)
154
+
155
+ ### Which hybrid is used in the app?
156
+
157
+ **XGBoost hybrid is better** (F1 = 0.8639 vs MLP hybrid F1 = 0.8409).
158
+
159
+ ### Logistic regression combiner (replaces heuristic weights)
160
+
161
+ Instead of a fixed linear blend (e.g. 0.3·ML + 0.7·geo), a **logistic regression** combines model probability and geometric score: meta-features = [model_prob, geo_score], trained on the same LOPO splits. Threshold from Youden's J on combiner output.
162
+
163
+ | Method | Mean LOPO F1 |
164
+ |--------|-------------:|
165
+ | Heuristic weight grid (best w) | 0.8639 |
166
+ | **LR combiner** | **0.8241** |
167
 
168
+ The app uses the saved LR combiner when `combiner_path` is set in `hybrid_focus_config.json`.
169
 
170
+ ## 5. Eye and Mouth Aspect Ratio Thresholds
171
 
172
  ### EAR (Eye Aspect Ratio)
173
 
 
193
 
194
  ![MAR distribution](plots/mar_distribution.png)
195
 
196
+ ## 10. Other Constants
197
 
198
  | Constant | Value | Rationale |
199
  |----------|------:|-----------|
evaluation/feature_importance.py ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Feature importance and leave-one-feature-out ablation for the 10 face_orientation features.
3
+ Run: python -m evaluation.feature_importance
4
+
5
+ Outputs:
6
+ - XGBoost gain-based importance (from trained checkpoint)
7
+ - Leave-one-feature-out LOPO F1 (ablation): drop each feature in turn, report mean LOPO F1.
8
+ - Writes evaluation/feature_selection_justification.md
9
+ """
10
+
11
+ import os
12
+ import sys
13
+
14
+ import numpy as np
15
+ from sklearn.preprocessing import StandardScaler
16
+ from sklearn.metrics import f1_score
17
+ from xgboost import XGBClassifier
18
+
19
+ _PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
20
+ if _PROJECT_ROOT not in sys.path:
21
+ sys.path.insert(0, _PROJECT_ROOT)
22
+
23
+ from data_preparation.prepare_dataset import load_per_person, SELECTED_FEATURES
24
+
25
+ SEED = 42
26
+ FEATURES = SELECTED_FEATURES["face_orientation"]
27
+
28
+
29
+ def _resolve_xgb_path():
30
+ p = os.path.join(_PROJECT_ROOT, "models", "xgboost", "checkpoints", "face_orientation_best.json")
31
+ if os.path.isfile(p):
32
+ return p
33
+ return os.path.join(_PROJECT_ROOT, "checkpoints", "xgboost_face_orientation_best.json")
34
+
35
+
36
+ def xgb_feature_importance():
37
+ """Load trained XGBoost and return gain-based importance for the 10 features."""
38
+ path = _resolve_xgb_path()
39
+ if not os.path.isfile(path):
40
+ print(f"[WARN] No XGBoost checkpoint at {path}; skip importance.")
41
+ return None
42
+ model = XGBClassifier()
43
+ model.load_model(path)
44
+ imp = model.get_booster().get_score(importance_type="gain")
45
+ # Booster uses f0, f1, ...; we use same order as FEATURES (training order)
46
+ by_idx = {int(k.replace("f", "")): v for k, v in imp.items() if k.startswith("f")}
47
+ order = [by_idx.get(i, 0.0) for i in range(len(FEATURES))]
48
+ return dict(zip(FEATURES, order))
49
+
50
+
51
+ def run_ablation_lopo():
52
+ """Leave-one-feature-out: for each feature, train XGBoost on the other 9 with LOPO, report mean F1."""
53
+ by_person, _, _ = load_per_person("face_orientation")
54
+ persons = sorted(by_person.keys())
55
+ n_folds = len(persons)
56
+
57
+ results = {}
58
+ for drop_feat in FEATURES:
59
+ idx_keep = [i for i, f in enumerate(FEATURES) if f != drop_feat]
60
+ f1s = []
61
+ for held_out in persons:
62
+ train_X = np.concatenate([by_person[p][0] for p in persons if p != held_out])
63
+ train_y = np.concatenate([by_person[p][1] for p in persons if p != held_out])
64
+ X_test, y_test = by_person[held_out]
65
+
66
+ X_tr = train_X[:, idx_keep]
67
+ X_te = X_test[:, idx_keep]
68
+ scaler = StandardScaler().fit(X_tr)
69
+ X_tr_sc = scaler.transform(X_tr)
70
+ X_te_sc = scaler.transform(X_te)
71
+
72
+ xgb = XGBClassifier(
73
+ n_estimators=600, max_depth=8, learning_rate=0.05,
74
+ subsample=0.8, colsample_bytree=0.8,
75
+ reg_alpha=0.1, reg_lambda=1.0,
76
+ use_label_encoder=False, eval_metric="logloss",
77
+ random_state=SEED, verbosity=0,
78
+ )
79
+ xgb.fit(X_tr_sc, train_y)
80
+ pred = xgb.predict(X_te_sc)
81
+ f1s.append(f1_score(y_test, pred, average="weighted"))
82
+ results[drop_feat] = np.mean(f1s)
83
+ return results
84
+
85
+
86
+ def run_baseline_lopo_f1():
87
+ """Full 10-feature LOPO mean F1 for reference."""
88
+ by_person, _, _ = load_per_person("face_orientation")
89
+ persons = sorted(by_person.keys())
90
+ f1s = []
91
+ for held_out in persons:
92
+ train_X = np.concatenate([by_person[p][0] for p in persons if p != held_out])
93
+ train_y = np.concatenate([by_person[p][1] for p in persons if p != held_out])
94
+ X_test, y_test = by_person[held_out]
95
+ scaler = StandardScaler().fit(train_X)
96
+ X_tr_sc = scaler.transform(train_X)
97
+ X_te_sc = scaler.transform(X_test)
98
+ xgb = XGBClassifier(
99
+ n_estimators=600, max_depth=8, learning_rate=0.05,
100
+ subsample=0.8, colsample_bytree=0.8,
101
+ reg_alpha=0.1, reg_lambda=1.0,
102
+ use_label_encoder=False, eval_metric="logloss",
103
+ random_state=SEED, verbosity=0,
104
+ )
105
+ xgb.fit(X_tr_sc, train_y)
106
+ pred = xgb.predict(X_te_sc)
107
+ f1s.append(f1_score(y_test, pred, average="weighted"))
108
+ return np.mean(f1s)
109
+
110
+
111
+ def main():
112
+ print("=== Feature importance (XGBoost gain) ===")
113
+ imp = xgb_feature_importance()
114
+ if imp:
115
+ for name in FEATURES:
116
+ print(f" {name}: {imp.get(name, 0):.2f}")
117
+ order = sorted(imp.items(), key=lambda x: -x[1])
118
+ print(" Top-5 by gain:", [x[0] for x in order[:5]])
119
+
120
+ print("\n=== Leave-one-feature-out ablation (LOPO mean F1) ===")
121
+ baseline = run_baseline_lopo_f1()
122
+ print(f" Baseline (all 10 features) mean LOPO F1: {baseline:.4f}")
123
+ ablation = run_ablation_lopo()
124
+ for feat in FEATURES:
125
+ delta = baseline - ablation[feat]
126
+ print(f" drop {feat}: F1={ablation[feat]:.4f} (Δ={delta:+.4f})")
127
+ worst_drop = min(ablation.items(), key=lambda x: x[1])
128
+ print(f" Largest F1 drop when dropping: {worst_drop[0]} (F1={worst_drop[1]:.4f})")
129
+
130
+ out_dir = os.path.join(_PROJECT_ROOT, "evaluation")
131
+ out_path = os.path.join(out_dir, "feature_selection_justification.md")
132
+ lines = [
133
+ "# Feature selection justification",
134
+ "",
135
+ "The face_orientation model uses 10 of 17 extracted features. This document summarises empirical support.",
136
+ "",
137
+ "## 1. Domain rationale",
138
+ "",
139
+ "The 10 features were chosen to cover three channels:",
140
+ "- **Head pose:** head_deviation, s_face, pitch",
141
+ "- **Eye state:** ear_left, ear_right, ear_avg, perclos",
142
+ "- **Gaze:** h_gaze, gaze_offset, s_eye",
143
+ "",
144
+ "Excluded: v_gaze (noisy), mar (rare events), yaw/roll (redundant with head_deviation/s_face), blink_rate/closure_duration/yawn_duration (temporal overlap with perclos).",
145
+ "",
146
+ "## 2. XGBoost feature importance (gain)",
147
+ "",
148
+ "From the trained XGBoost checkpoint (gain on the 10 features):",
149
+ "",
150
+ "| Feature | Gain |",
151
+ "|---------|------|",
152
+ ]
153
+ if imp:
154
+ for name in FEATURES:
155
+ lines.append(f"| {name} | {imp.get(name, 0):.2f} |")
156
+ order = sorted(imp.items(), key=lambda x: -x[1])
157
+ lines.append("")
158
+ lines.append(f"**Top 5 by gain:** {', '.join(x[0] for x in order[:5])}.")
159
+ else:
160
+ lines.append("(Run with XGBoost checkpoint to populate.)")
161
+ lines.extend([
162
+ "",
163
+ "## 3. Leave-one-feature-out ablation (LOPO)",
164
+ "",
165
+ f"Baseline (all 10 features) mean LOPO F1: **{baseline:.4f}**.",
166
+ "",
167
+ "| Feature dropped | Mean LOPO F1 | Δ vs baseline |",
168
+ "|------------------|--------------|---------------|",
169
+ ])
170
+ for feat in FEATURES:
171
+ delta = baseline - ablation[feat]
172
+ lines.append(f"| {feat} | {ablation[feat]:.4f} | {delta:+.4f} |")
173
+ worst_drop = min(ablation.items(), key=lambda x: x[1])
174
+ lines.append("")
175
+ lines.append(f"Dropping **{worst_drop[0]}** hurts most (F1={worst_drop[1]:.4f}), consistent with it being important.")
176
+ lines.append("")
177
+ lines.append("## 4. Conclusion")
178
+ lines.append("")
179
+ lines.append("Selection is supported by (1) domain rationale (three attention channels), (2) XGBoost gain importance, and (3) leave-one-out ablation. SHAP or correlation-based pruning can be added in future work.")
180
+ lines.append("")
181
+ with open(out_path, "w", encoding="utf-8") as f:
182
+ f.write("\n".join(lines))
183
+ print(f"\nReport written to {out_path}")
184
+
185
+
186
+ if __name__ == "__main__":
187
+ main()
evaluation/feature_selection_justification.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Feature selection justification
2
+
3
+ The face_orientation model uses 10 of 17 extracted features. This document summarises empirical support.
4
+
5
+ ## 1. Domain rationale
6
+
7
+ The 10 features were chosen to cover three channels:
8
+ - **Head pose:** head_deviation, s_face, pitch
9
+ - **Eye state:** ear_left, ear_right, ear_avg, perclos
10
+ - **Gaze:** h_gaze, gaze_offset, s_eye
11
+
12
+ Excluded: v_gaze (noisy), mar (rare events), yaw/roll (redundant with head_deviation/s_face), blink_rate/closure_duration/yawn_duration (temporal overlap with perclos).
13
+
14
+ ## 2. XGBoost feature importance (gain)
15
+
16
+ From the trained XGBoost checkpoint (gain on the 10 features):
17
+
18
+ | Feature | Gain |
19
+ |---------|------|
20
+ | head_deviation | 8.83 |
21
+ | s_face | 10.27 |
22
+ | s_eye | 2.18 |
23
+ | h_gaze | 4.99 |
24
+ | pitch | 4.64 |
25
+ | ear_left | 3.57 |
26
+ | ear_avg | 6.96 |
27
+ | ear_right | 9.54 |
28
+ | gaze_offset | 1.80 |
29
+ | perclos | 5.68 |
30
+
31
+ **Top 5 by gain:** s_face, ear_right, head_deviation, ear_avg, perclos.
32
+
33
+ ## 3. Leave-one-feature-out ablation (LOPO)
34
+
35
+ Baseline (all 10 features) mean LOPO F1: **0.8327**.
36
+
37
+ | Feature dropped | Mean LOPO F1 | Δ vs baseline |
38
+ |------------------|--------------|---------------|
39
+ | head_deviation | 0.8395 | -0.0068 |
40
+ | s_face | 0.8390 | -0.0063 |
41
+ | s_eye | 0.8342 | -0.0015 |
42
+ | h_gaze | 0.8244 | +0.0083 |
43
+ | pitch | 0.8250 | +0.0077 |
44
+ | ear_left | 0.8326 | +0.0001 |
45
+ | ear_avg | 0.8350 | -0.0023 |
46
+ | ear_right | 0.8344 | -0.0017 |
47
+ | gaze_offset | 0.8351 | -0.0024 |
48
+ | perclos | 0.8258 | +0.0069 |
49
+
50
+ Dropping **h_gaze** hurts most (F1=0.8244), consistent with it being important.
51
+
52
+ ## 4. Conclusion
53
+
54
+ Selection is supported by (1) domain rationale (three attention channels), (2) XGBoost gain importance, and (3) leave-one-out ablation. SHAP or correlation-based pruning can be added in future work.
evaluation/justify_thresholds.py CHANGED
@@ -8,9 +8,19 @@ import numpy as np
8
  import matplotlib
9
  matplotlib.use("Agg")
10
  import matplotlib.pyplot as plt
 
 
11
  from sklearn.neural_network import MLPClassifier
12
  from sklearn.preprocessing import StandardScaler
13
- from sklearn.metrics import roc_curve, roc_auc_score, f1_score
 
 
 
 
 
 
 
 
14
  from xgboost import XGBClassifier
15
 
16
  _PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
@@ -56,7 +66,8 @@ def run_lopo_models():
56
  by_person, _, _ = load_per_person("face_orientation")
57
  persons = sorted(by_person.keys())
58
 
59
- results = {"mlp": {"y": [], "p": []}, "xgb": {"y": [], "p": []}}
 
60
 
61
  for i, held_out in enumerate(persons):
62
  X_test, y_test = by_person[held_out]
@@ -77,6 +88,8 @@ def run_lopo_models():
77
  mlp_prob = mlp.predict_proba(X_te_sc)[:, 1]
78
  results["mlp"]["y"].append(y_test)
79
  results["mlp"]["p"].append(mlp_prob)
 
 
80
 
81
  xgb = XGBClassifier(
82
  n_estimators=600, max_depth=8, learning_rate=0.05,
@@ -89,11 +102,14 @@ def run_lopo_models():
89
  xgb_prob = xgb.predict_proba(X_te_sc)[:, 1]
90
  results["xgb"]["y"].append(y_test)
91
  results["xgb"]["p"].append(xgb_prob)
 
 
92
 
93
  print(f" fold {i+1}/{len(persons)}: held out {held_out} "
94
  f"({X_test.shape[0]} samples)")
95
 
96
- for key in results:
 
97
  results[key]["y"] = np.concatenate(results[key]["y"])
98
  results[key]["p"] = np.concatenate(results[key]["p"])
99
 
@@ -126,6 +142,129 @@ def analyse_model_thresholds(results):
126
  return model_stats
127
 
128
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  def run_geo_weight_search():
130
  print("\n=== Geometric weight grid search ===")
131
 
@@ -252,6 +391,191 @@ def run_hybrid_weight_search(lopo_results):
252
  return dict(mean_f1), best_w
253
 
254
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
255
  def plot_distributions():
256
  print("\n=== EAR / MAR distributions ===")
257
  npz_files = sorted(glob.glob(os.path.join(_PROJECT_ROOT, "data", "collected_*", "*.npz")))
@@ -326,7 +650,11 @@ def plot_distributions():
326
  return stats
327
 
328
 
329
- def write_report(model_stats, geo_f1, best_alpha, hybrid_f1, best_w, dist_stats):
 
 
 
 
330
  lines = []
331
  lines.append("# Threshold Justification Report")
332
  lines.append("")
@@ -351,7 +679,91 @@ def write_report(model_stats, geo_f1, best_alpha, hybrid_f1, best_w, dist_stats)
351
  lines.append("![XGBoost ROC](plots/roc_xgboost.png)")
352
  lines.append("")
353
 
354
- lines.append("## 2. Geometric Pipeline Weights (s_face vs s_eye)")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
355
  lines.append("")
356
  lines.append("Grid search over face weight alpha in {0.2 ... 0.8}. "
357
  "Eye weight = 1 - alpha. Threshold per fold via Youden's J.")
@@ -368,25 +780,68 @@ def write_report(model_stats, geo_f1, best_alpha, hybrid_f1, best_w, dist_stats)
368
  lines.append("![Geometric weight search](plots/geo_weight_search.png)")
369
  lines.append("")
370
 
371
- lines.append("## 3. Hybrid Pipeline Weights (MLP vs Geometric)")
372
  lines.append("")
373
  lines.append("Grid search over w_mlp in {0.3 ... 0.8}. w_geo = 1 - w_mlp. "
374
- "Geometric sub-score uses same weights as geometric pipeline (face=0.7, eye=0.3). "
375
- "If you change geometric weights, re-run this script — optimal w_mlp can shift.")
376
  lines.append("")
377
  lines.append("| MLP Weight (w_mlp) | Mean LOPO F1 |")
378
  lines.append("|-------------------:|-------------:|")
379
- for w in sorted(hybrid_f1.keys()):
380
- marker = " **<-- selected**" if w == best_w else ""
381
- lines.append(f"| {w:.1f} | {hybrid_f1[w]:.4f}{marker} |")
 
 
 
 
 
 
 
 
 
 
382
  lines.append("")
383
- lines.append(f"**Best:** w_mlp = {best_w:.1f} (MLP {best_w*100:.0f}%, "
384
- f"geometric {(1-best_w)*100:.0f}%)")
 
 
 
385
  lines.append("")
386
- lines.append("![Hybrid weight search](plots/hybrid_weight_search.png)")
 
 
387
  lines.append("")
388
 
389
- lines.append("## 4. Eye and Mouth Aspect Ratio Thresholds")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
390
  lines.append("")
391
  lines.append("### EAR (Eye Aspect Ratio)")
392
  lines.append("")
@@ -419,7 +874,7 @@ def write_report(model_stats, geo_f1, best_alpha, hybrid_f1, best_w, dist_stats)
419
  lines.append("![MAR distribution](plots/mar_distribution.png)")
420
  lines.append("")
421
 
422
- lines.append("## 5. Other Constants")
423
  lines.append("")
424
  lines.append("| Constant | Value | Rationale |")
425
  lines.append("|----------|------:|-----------|")
@@ -446,16 +901,71 @@ def write_report(model_stats, geo_f1, best_alpha, hybrid_f1, best_w, dist_stats)
446
  print(f"\nReport written to {REPORT_PATH}")
447
 
448
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
449
  def main():
450
  os.makedirs(PLOTS_DIR, exist_ok=True)
451
 
452
  lopo_results = run_lopo_models()
453
  model_stats = analyse_model_thresholds(lopo_results)
 
 
454
  geo_f1, best_alpha = run_geo_weight_search()
455
- hybrid_f1, best_w = run_hybrid_weight_search(lopo_results)
 
456
  dist_stats = plot_distributions()
457
 
458
- write_report(model_stats, geo_f1, best_alpha, hybrid_f1, best_w, dist_stats)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
459
  print("\nDone.")
460
 
461
 
 
8
  import matplotlib
9
  matplotlib.use("Agg")
10
  import matplotlib.pyplot as plt
11
+ import joblib
12
+ from sklearn.linear_model import LogisticRegression
13
  from sklearn.neural_network import MLPClassifier
14
  from sklearn.preprocessing import StandardScaler
15
+ from sklearn.metrics import (
16
+ roc_curve,
17
+ roc_auc_score,
18
+ f1_score,
19
+ precision_score,
20
+ recall_score,
21
+ accuracy_score,
22
+ confusion_matrix,
23
+ )
24
  from xgboost import XGBClassifier
25
 
26
  _PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
 
66
  by_person, _, _ = load_per_person("face_orientation")
67
  persons = sorted(by_person.keys())
68
 
69
+ results = {"mlp": {"y": [], "p": [], "y_folds": [], "p_folds": []},
70
+ "xgb": {"y": [], "p": [], "y_folds": [], "p_folds": []}}
71
 
72
  for i, held_out in enumerate(persons):
73
  X_test, y_test = by_person[held_out]
 
88
  mlp_prob = mlp.predict_proba(X_te_sc)[:, 1]
89
  results["mlp"]["y"].append(y_test)
90
  results["mlp"]["p"].append(mlp_prob)
91
+ results["mlp"]["y_folds"].append(y_test)
92
+ results["mlp"]["p_folds"].append(mlp_prob)
93
 
94
  xgb = XGBClassifier(
95
  n_estimators=600, max_depth=8, learning_rate=0.05,
 
102
  xgb_prob = xgb.predict_proba(X_te_sc)[:, 1]
103
  results["xgb"]["y"].append(y_test)
104
  results["xgb"]["p"].append(xgb_prob)
105
+ results["xgb"]["y_folds"].append(y_test)
106
+ results["xgb"]["p_folds"].append(xgb_prob)
107
 
108
  print(f" fold {i+1}/{len(persons)}: held out {held_out} "
109
  f"({X_test.shape[0]} samples)")
110
 
111
+ results["persons"] = persons
112
+ for key in ("mlp", "xgb"):
113
  results[key]["y"] = np.concatenate(results[key]["y"])
114
  results[key]["p"] = np.concatenate(results[key]["p"])
115
 
 
142
  return model_stats
143
 
144
 
145
+ def _ci_95_t(n):
146
+ """95% CI half-width multiplier (t-distribution, df=n-1). Approximate for small n."""
147
+ if n <= 1:
148
+ return 0.0
149
+ df = n - 1
150
+ t_975 = [0, 12.71, 4.30, 3.18, 2.78, 2.57, 2.45, 2.37, 2.31]
151
+ if df < len(t_975):
152
+ return float(t_975[df])
153
+ if df <= 30:
154
+ return 2.0 + (30 - df) / 100
155
+ return 1.96
156
+
157
+
158
+ def analyse_precision_recall_confusion(results, model_stats):
159
+ """Precision/recall at optimal threshold, pooled confusion matrix, per-fold metrics, 95% CIs."""
160
+ print("\n=== Precision, recall, confusion matrix, per-person variance ===")
161
+ from sklearn.metrics import precision_recall_curve, average_precision_score
162
+
163
+ extended = {}
164
+ persons = results["persons"]
165
+ n_folds = len(persons)
166
+
167
+ for name, label in [("mlp", "MLP"), ("xgb", "XGBoost")]:
168
+ y_all = results[name]["y"]
169
+ p_all = results[name]["p"]
170
+ y_folds = results[name]["y_folds"]
171
+ p_folds = results[name]["p_folds"]
172
+ opt_t = model_stats[name]["opt_threshold"]
173
+
174
+ y_pred = (p_all >= opt_t).astype(int)
175
+ prec_pooled = precision_score(y_all, y_pred, zero_division=0)
176
+ rec_pooled = recall_score(y_all, y_pred, zero_division=0)
177
+ acc_pooled = accuracy_score(y_all, y_pred)
178
+ cm = confusion_matrix(y_all, y_pred)
179
+ if cm.shape == (2, 2):
180
+ tn, fp, fn, tp = cm.ravel()
181
+ else:
182
+ tn = fp = fn = tp = 0
183
+
184
+ prec_folds = []
185
+ rec_folds = []
186
+ acc_folds = []
187
+ f1_folds = []
188
+ per_person = []
189
+ for k, (y_f, p_f) in enumerate(zip(y_folds, p_folds)):
190
+ pred_f = (p_f >= opt_t).astype(int)
191
+ prec_f = precision_score(y_f, pred_f, zero_division=0)
192
+ rec_f = recall_score(y_f, pred_f, zero_division=0)
193
+ acc_f = accuracy_score(y_f, pred_f)
194
+ f1_f = f1_score(y_f, pred_f, zero_division=0)
195
+ prec_folds.append(prec_f)
196
+ rec_folds.append(rec_f)
197
+ acc_folds.append(acc_f)
198
+ f1_folds.append(f1_f)
199
+ per_person.append({
200
+ "person": persons[k],
201
+ "accuracy": acc_f,
202
+ "f1": f1_f,
203
+ "precision": prec_f,
204
+ "recall": rec_f,
205
+ })
206
+
207
+ t_mult = _ci_95_t(n_folds)
208
+ mean_acc = np.mean(acc_folds)
209
+ std_acc = np.std(acc_folds, ddof=1) if n_folds > 1 else 0.0
210
+ mean_f1 = np.mean(f1_folds)
211
+ std_f1 = np.std(f1_folds, ddof=1) if n_folds > 1 else 0.0
212
+ mean_prec = np.mean(prec_folds)
213
+ std_prec = np.std(prec_folds, ddof=1) if n_folds > 1 else 0.0
214
+ mean_rec = np.mean(rec_folds)
215
+ std_rec = np.std(rec_folds, ddof=1) if n_folds > 1 else 0.0
216
+
217
+ extended[name] = {
218
+ "label": label,
219
+ "opt_threshold": opt_t,
220
+ "precision_pooled": prec_pooled,
221
+ "recall_pooled": rec_pooled,
222
+ "accuracy_pooled": acc_pooled,
223
+ "confusion_matrix": cm,
224
+ "tn": int(tn), "fp": int(fp), "fn": int(fn), "tp": int(tp),
225
+ "per_person": per_person,
226
+ "accuracy_mean": mean_acc, "accuracy_std": std_acc,
227
+ "accuracy_ci_half": t_mult * (std_acc / np.sqrt(n_folds)) if n_folds > 1 else 0.0,
228
+ "f1_mean": mean_f1, "f1_std": std_f1,
229
+ "f1_ci_half": t_mult * (std_f1 / np.sqrt(n_folds)) if n_folds > 1 else 0.0,
230
+ "precision_mean": mean_prec, "precision_std": std_prec,
231
+ "precision_ci_half": t_mult * (std_prec / np.sqrt(n_folds)) if n_folds > 1 else 0.0,
232
+ "recall_mean": mean_rec, "recall_std": std_rec,
233
+ "recall_ci_half": t_mult * (std_rec / np.sqrt(n_folds)) if n_folds > 1 else 0.0,
234
+ "n_folds": n_folds,
235
+ }
236
+
237
+ print(f" {label}: precision={prec_pooled:.4f}, recall={rec_pooled:.4f} | "
238
+ f"per-fold F1 mean={mean_f1:.4f} ± {std_f1:.4f} "
239
+ f"(95% CI [{mean_f1 - extended[name]['f1_ci_half']:.4f}, {mean_f1 + extended[name]['f1_ci_half']:.4f}])")
240
+
241
+ return extended
242
+
243
+
244
+ def plot_confusion_matrices(extended_stats):
245
+ """Save confusion matrix heatmaps for MLP and XGBoost."""
246
+ for name in ("mlp", "xgb"):
247
+ s = extended_stats[name]
248
+ cm = s["confusion_matrix"]
249
+ fig, ax = plt.subplots(figsize=(4, 3))
250
+ im = ax.imshow(cm, cmap="Blues")
251
+ ax.set_xticks([0, 1])
252
+ ax.set_yticks([0, 1])
253
+ ax.set_xticklabels(["Pred 0", "Pred 1"])
254
+ ax.set_yticklabels(["True 0", "True 1"])
255
+ ax.set_ylabel("True label")
256
+ ax.set_xlabel("Predicted label")
257
+ for i in range(2):
258
+ for j in range(2):
259
+ ax.text(j, i, str(cm[i, j]), ha="center", va="center", color="white" if cm[i, j] > cm.max() / 2 else "black", fontweight="bold")
260
+ ax.set_title(f"LOPO {s['label']} @ t={s['opt_threshold']:.3f}")
261
+ fig.tight_layout()
262
+ path = os.path.join(PLOTS_DIR, f"confusion_matrix_{name}.png")
263
+ fig.savefig(path, dpi=150)
264
+ plt.close(fig)
265
+ print(f" saved {path}")
266
+
267
+
268
  def run_geo_weight_search():
269
  print("\n=== Geometric weight grid search ===")
270
 
 
391
  return dict(mean_f1), best_w
392
 
393
 
394
+ def run_hybrid_xgb_weight_search(lopo_results):
395
+ """Grid search: XGBoost prob + geometric. Same structure as MLP hybrid."""
396
+ print("\n=== Hybrid XGBoost weight grid search ===")
397
+
398
+ by_person, _, _ = load_per_person("face_orientation")
399
+ persons = sorted(by_person.keys())
400
+ features = SELECTED_FEATURES["face_orientation"]
401
+ sf_idx = features.index("s_face")
402
+ se_idx = features.index("s_eye")
403
+
404
+ GEO_FACE_W = 0.7
405
+ GEO_EYE_W = 0.3
406
+
407
+ w_xgbs = np.arange(0.3, 0.85, 0.1).round(1)
408
+ wmf1 = {w: [] for w in w_xgbs}
409
+ xgb_p = lopo_results["xgb"]["p"]
410
+ offset = 0
411
+ for held_out in persons:
412
+ X_test, y_test = by_person[held_out]
413
+ n = X_test.shape[0]
414
+ xgb_prob_fold = xgb_p[offset : offset + n]
415
+ offset += n
416
+
417
+ sf = X_test[:, sf_idx]
418
+ se = X_test[:, se_idx]
419
+ geo_score = np.clip(GEO_FACE_W * sf + GEO_EYE_W * se, 0, 1)
420
+
421
+ train_X = np.concatenate([by_person[p][0] for p in persons if p != held_out])
422
+ train_y = np.concatenate([by_person[p][1] for p in persons if p != held_out])
423
+ sf_tr = train_X[:, sf_idx]
424
+ se_tr = train_X[:, se_idx]
425
+ geo_tr = np.clip(GEO_FACE_W * sf_tr + GEO_EYE_W * se_tr, 0, 1)
426
+
427
+ scaler = StandardScaler().fit(train_X)
428
+ X_tr_sc = scaler.transform(train_X)
429
+ xgb_tr = XGBClassifier(
430
+ n_estimators=600, max_depth=8, learning_rate=0.05,
431
+ subsample=0.8, colsample_bytree=0.8,
432
+ reg_alpha=0.1, reg_lambda=1.0,
433
+ use_label_encoder=False, eval_metric="logloss",
434
+ random_state=SEED, verbosity=0,
435
+ )
436
+ xgb_tr.fit(X_tr_sc, train_y)
437
+ xgb_prob_tr = xgb_tr.predict_proba(X_tr_sc)[:, 1]
438
+
439
+ for w in w_xgbs:
440
+ combo_tr = w * xgb_prob_tr + (1.0 - w) * geo_tr
441
+ opt_t, *_ = _youdens_j(train_y, combo_tr)
442
+
443
+ combo_te = w * xgb_prob_fold + (1.0 - w) * geo_score
444
+ f1 = _f1_at_threshold(y_test, combo_te, opt_t)
445
+ wmf1[w].append(f1)
446
+
447
+ mean_f1 = {w: np.mean(f1s) for w, f1s in wmf1.items()}
448
+ best_w = max(mean_f1, key=mean_f1.get)
449
+
450
+ fig, ax = plt.subplots(figsize=(7, 4))
451
+ ax.bar([f"{w:.1f}" for w in w_xgbs],
452
+ [mean_f1[w] for w in w_xgbs], color="steelblue")
453
+ ax.set_xlabel("XGBoost weight (w_xgb); geo weight = 1 - w_xgb")
454
+ ax.set_ylabel("Mean LOPO F1")
455
+ ax.set_title("Hybrid Pipeline: XGBoost vs Geometric Weight Search")
456
+ ax.set_ylim(bottom=max(0, min(mean_f1.values()) - 0.05))
457
+ for i, w in enumerate(w_xgbs):
458
+ ax.text(i, mean_f1[w] + 0.003, f"{mean_f1[w]:.3f}",
459
+ ha="center", va="bottom", fontsize=8)
460
+ fig.tight_layout()
461
+ path = os.path.join(PLOTS_DIR, "hybrid_xgb_weight_search.png")
462
+ fig.savefig(path, dpi=150)
463
+ plt.close(fig)
464
+ print(f" saved {path}")
465
+
466
+ print(f" Best w_xgb = {best_w:.1f}, mean LOPO F1 = {mean_f1[best_w]:.4f}")
467
+ return dict(mean_f1), best_w
468
+
469
+
470
+ def run_hybrid_lr_combiner(lopo_results, use_xgb=True):
471
+ """LR combiner: meta-features = [model_prob, geo_score], learned weights instead of grid search."""
472
+ print("\n=== Hybrid LR combiner (LOPO) ===")
473
+ by_person, _, _ = load_per_person("face_orientation")
474
+ persons = sorted(by_person.keys())
475
+ features = SELECTED_FEATURES["face_orientation"]
476
+ sf_idx = features.index("s_face")
477
+ se_idx = features.index("s_eye")
478
+ GEO_FACE_W = 0.7
479
+ GEO_EYE_W = 0.3
480
+
481
+ key = "xgb" if use_xgb else "mlp"
482
+ model_p = lopo_results[key]["p"]
483
+ offset = 0
484
+ fold_f1s = []
485
+ for held_out in persons:
486
+ X_test, y_test = by_person[held_out]
487
+ n = X_test.shape[0]
488
+ prob_fold = model_p[offset : offset + n]
489
+ offset += n
490
+ sf = X_test[:, sf_idx]
491
+ se = X_test[:, se_idx]
492
+ geo_score = np.clip(GEO_FACE_W * sf + GEO_EYE_W * se, 0, 1)
493
+ meta_te = np.column_stack([prob_fold, geo_score])
494
+
495
+ train_X = np.concatenate([by_person[p][0] for p in persons if p != held_out])
496
+ train_y = np.concatenate([by_person[p][1] for p in persons if p != held_out])
497
+ sf_tr = train_X[:, sf_idx]
498
+ se_tr = train_X[:, se_idx]
499
+ geo_tr = np.clip(GEO_FACE_W * sf_tr + GEO_EYE_W * se_tr, 0, 1)
500
+ scaler = StandardScaler().fit(train_X)
501
+ X_tr_sc = scaler.transform(train_X)
502
+ if use_xgb:
503
+ xgb_tr = XGBClassifier(
504
+ n_estimators=600, max_depth=8, learning_rate=0.05,
505
+ subsample=0.8, colsample_bytree=0.8,
506
+ reg_alpha=0.1, reg_lambda=1.0,
507
+ use_label_encoder=False, eval_metric="logloss",
508
+ random_state=SEED, verbosity=0,
509
+ )
510
+ xgb_tr.fit(X_tr_sc, train_y)
511
+ prob_tr = xgb_tr.predict_proba(X_tr_sc)[:, 1]
512
+ else:
513
+ mlp_tr = MLPClassifier(
514
+ hidden_layer_sizes=(64, 32), activation="relu",
515
+ max_iter=200, early_stopping=True, validation_fraction=0.15,
516
+ random_state=SEED, verbose=False,
517
+ )
518
+ mlp_tr.fit(X_tr_sc, train_y)
519
+ prob_tr = mlp_tr.predict_proba(X_tr_sc)[:, 1]
520
+ meta_tr = np.column_stack([prob_tr, geo_tr])
521
+
522
+ lr = LogisticRegression(C=1.0, max_iter=500, random_state=SEED)
523
+ lr.fit(meta_tr, train_y)
524
+ p_tr = lr.predict_proba(meta_tr)[:, 1]
525
+ opt_t, *_ = _youdens_j(train_y, p_tr)
526
+ p_te = lr.predict_proba(meta_te)[:, 1]
527
+ f1 = _f1_at_threshold(y_test, p_te, opt_t)
528
+ fold_f1s.append(f1)
529
+ print(f" fold {held_out}: F1 = {f1:.4f} (threshold = {opt_t:.3f})")
530
+
531
+ mean_f1 = float(np.mean(fold_f1s))
532
+ print(f" LR combiner mean LOPO F1 = {mean_f1:.4f}")
533
+ return mean_f1
534
+
535
+
536
+ def train_and_save_hybrid_combiner(lopo_results, use_xgb, geo_face_weight=0.7, geo_eye_weight=0.3,
537
+ combiner_path=None):
538
+ """Build OOS meta-dataset from LOPO predictions, train one LR, save joblib + optimal threshold."""
539
+ by_person, _, _ = load_per_person("face_orientation")
540
+ persons = sorted(by_person.keys())
541
+ features = SELECTED_FEATURES["face_orientation"]
542
+ sf_idx = features.index("s_face")
543
+ se_idx = features.index("s_eye")
544
+
545
+ key = "xgb" if use_xgb else "mlp"
546
+ model_p = lopo_results[key]["p"]
547
+ meta_y = lopo_results[key]["y"]
548
+ geo_list = []
549
+ offset = 0
550
+ for p in persons:
551
+ X, _ = by_person[p]
552
+ n = X.shape[0]
553
+ sf = X[:, sf_idx]
554
+ se = X[:, se_idx]
555
+ geo_list.append(np.clip(geo_face_weight * sf + geo_eye_weight * se, 0, 1))
556
+ offset += n
557
+ geo_all = np.concatenate(geo_list)
558
+ meta_X = np.column_stack([model_p, geo_all])
559
+
560
+ lr = LogisticRegression(C=1.0, max_iter=500, random_state=SEED)
561
+ lr.fit(meta_X, meta_y)
562
+ p = lr.predict_proba(meta_X)[:, 1]
563
+ opt_threshold, *_ = _youdens_j(meta_y, p)
564
+
565
+ if combiner_path is None:
566
+ combiner_path = os.path.join(_PROJECT_ROOT, "checkpoints", "hybrid_combiner.joblib")
567
+ os.makedirs(os.path.dirname(combiner_path), exist_ok=True)
568
+ joblib.dump({
569
+ "combiner": lr,
570
+ "threshold": float(opt_threshold),
571
+ "use_xgb": bool(use_xgb),
572
+ "geo_face_weight": geo_face_weight,
573
+ "geo_eye_weight": geo_eye_weight,
574
+ }, combiner_path)
575
+ print(f" Saved combiner to {combiner_path} (threshold={opt_threshold:.3f})")
576
+ return opt_threshold, combiner_path
577
+
578
+
579
  def plot_distributions():
580
  print("\n=== EAR / MAR distributions ===")
581
  npz_files = sorted(glob.glob(os.path.join(_PROJECT_ROOT, "data", "collected_*", "*.npz")))
 
650
  return stats
651
 
652
 
653
+ def write_report(model_stats, extended_stats, geo_f1, best_alpha,
654
+ hybrid_mlp_f1, best_w_mlp,
655
+ hybrid_xgb_f1, best_w_xgb,
656
+ use_xgb_for_hybrid, dist_stats,
657
+ lr_combiner_f1=None):
658
  lines = []
659
  lines.append("# Threshold Justification Report")
660
  lines.append("")
 
679
  lines.append("![XGBoost ROC](plots/roc_xgboost.png)")
680
  lines.append("")
681
 
682
+ lines.append("## 2. Precision, Recall and Tradeoff")
683
+ lines.append("")
684
+ lines.append("At the optimal threshold (Youden's J), pooled over all LOPO held-out predictions:")
685
+ lines.append("")
686
+ lines.append("| Model | Threshold | Precision | Recall | F1 | Accuracy |")
687
+ lines.append("|-------|----------:|----------:|-------:|---:|---------:|")
688
+ for key in ("mlp", "xgb"):
689
+ s = extended_stats[key]
690
+ lines.append(f"| {s['label']} | {s['opt_threshold']:.3f} | {s['precision_pooled']:.4f} | "
691
+ f"{s['recall_pooled']:.4f} | {model_stats[key]['f1_opt']:.4f} | {s['accuracy_pooled']:.4f} |")
692
+ lines.append("")
693
+ lines.append("Higher threshold → fewer positive predictions → higher precision, lower recall. "
694
+ "Youden's J picks the threshold that balances sensitivity and specificity (recall for the positive class and true negative rate).")
695
+ lines.append("")
696
+
697
+ lines.append("## 3. Confusion Matrix (Pooled LOPO)")
698
+ lines.append("")
699
+ lines.append("At optimal threshold. Rows = true label, columns = predicted label (0 = unfocused, 1 = focused).")
700
+ lines.append("")
701
+ for key in ("mlp", "xgb"):
702
+ s = extended_stats[key]
703
+ lines.append(f"### {s['label']}")
704
+ lines.append("")
705
+ lines.append("| | Pred 0 | Pred 1 |")
706
+ lines.append("|--|-------:|-------:|")
707
+ cm = s["confusion_matrix"]
708
+ if cm.shape == (2, 2):
709
+ lines.append(f"| **True 0** | {cm[0,0]} (TN) | {cm[0,1]} (FP) |")
710
+ lines.append(f"| **True 1** | {cm[1,0]} (FN) | {cm[1,1]} (TP) |")
711
+ lines.append("")
712
+ lines.append(f"TN={s['tn']}, FP={s['fp']}, FN={s['fn']}, TP={s['tp']}. ")
713
+ lines.append("")
714
+ lines.append("![Confusion MLP](plots/confusion_matrix_mlp.png)")
715
+ lines.append("")
716
+ lines.append("![Confusion XGBoost](plots/confusion_matrix_xgb.png)")
717
+ lines.append("")
718
+
719
+ lines.append("## 4. Per-Person Performance Variance (LOPO)")
720
+ lines.append("")
721
+ lines.append("One fold per left-out person; metrics at optimal threshold.")
722
+ lines.append("")
723
+ for key in ("mlp", "xgb"):
724
+ s = extended_stats[key]
725
+ lines.append(f"### {s['label']} — per held-out person")
726
+ lines.append("")
727
+ lines.append("| Person | Accuracy | F1 | Precision | Recall |")
728
+ lines.append("|--------|---------:|---:|----------:|-------:|")
729
+ for row in s["per_person"]:
730
+ lines.append(f"| {row['person']} | {row['accuracy']:.4f} | {row['f1']:.4f} | {row['precision']:.4f} | {row['recall']:.4f} |")
731
+ lines.append("")
732
+ lines.append("### Summary across persons")
733
+ lines.append("")
734
+ lines.append("| Model | Accuracy mean ± std | F1 mean ± std | Precision mean ± std | Recall mean ± std |")
735
+ lines.append("|-------|---------------------|---------------|----------------------|-------------------|")
736
+ for key in ("mlp", "xgb"):
737
+ s = extended_stats[key]
738
+ lines.append(f"| {s['label']} | {s['accuracy_mean']:.4f} ± {s['accuracy_std']:.4f} | "
739
+ f"{s['f1_mean']:.4f} ± {s['f1_std']:.4f} | "
740
+ f"{s['precision_mean']:.4f} ± {s['precision_std']:.4f} | "
741
+ f"{s['recall_mean']:.4f} ± {s['recall_std']:.4f} |")
742
+ lines.append("")
743
+
744
+ lines.append("## 5. Confidence Intervals (95%, LOPO over 9 persons)")
745
+ lines.append("")
746
+ lines.append("Mean ± half-width of 95% t-interval (df=8) for each metric across the 9 left-out persons.")
747
+ lines.append("")
748
+ lines.append("| Model | F1 | Accuracy | Precision | Recall |")
749
+ lines.append("|-------|---:|--------:|----------:|-------:|")
750
+ for key in ("mlp", "xgb"):
751
+ s = extended_stats[key]
752
+ f1_lo = s["f1_mean"] - s["f1_ci_half"]
753
+ f1_hi = s["f1_mean"] + s["f1_ci_half"]
754
+ acc_lo = s["accuracy_mean"] - s["accuracy_ci_half"]
755
+ acc_hi = s["accuracy_mean"] + s["accuracy_ci_half"]
756
+ prec_lo = s["precision_mean"] - s["precision_ci_half"]
757
+ prec_hi = s["precision_mean"] + s["precision_ci_half"]
758
+ rec_lo = s["recall_mean"] - s["recall_ci_half"]
759
+ rec_hi = s["recall_mean"] + s["recall_ci_half"]
760
+ lines.append(f"| {s['label']} | {s['f1_mean']:.4f} [{f1_lo:.4f}, {f1_hi:.4f}] | "
761
+ f"{s['accuracy_mean']:.4f} [{acc_lo:.4f}, {acc_hi:.4f}] | "
762
+ f"{s['precision_mean']:.4f} [{prec_lo:.4f}, {prec_hi:.4f}] | "
763
+ f"{s['recall_mean']:.4f} [{rec_lo:.4f}, {rec_hi:.4f}] |")
764
+ lines.append("")
765
+
766
+ lines.append("## 6. Geometric Pipeline Weights (s_face vs s_eye)")
767
  lines.append("")
768
  lines.append("Grid search over face weight alpha in {0.2 ... 0.8}. "
769
  "Eye weight = 1 - alpha. Threshold per fold via Youden's J.")
 
780
  lines.append("![Geometric weight search](plots/geo_weight_search.png)")
781
  lines.append("")
782
 
783
+ lines.append("## 7. Hybrid Pipeline: MLP vs Geometric")
784
  lines.append("")
785
  lines.append("Grid search over w_mlp in {0.3 ... 0.8}. w_geo = 1 - w_mlp. "
786
+ "Geometric sub-score uses same weights as geometric pipeline (face=0.7, eye=0.3).")
 
787
  lines.append("")
788
  lines.append("| MLP Weight (w_mlp) | Mean LOPO F1 |")
789
  lines.append("|-------------------:|-------------:|")
790
+ for w in sorted(hybrid_mlp_f1.keys()):
791
+ marker = " **<-- selected**" if w == best_w_mlp else ""
792
+ lines.append(f"| {w:.1f} | {hybrid_mlp_f1[w]:.4f}{marker} |")
793
+ lines.append("")
794
+ lines.append(f"**Best:** w_mlp = {best_w_mlp:.1f} (MLP {best_w_mlp*100:.0f}%, "
795
+ f"geometric {(1-best_w_mlp)*100:.0f}%) → mean LOPO F1 = {hybrid_mlp_f1[best_w_mlp]:.4f}")
796
+ lines.append("")
797
+ lines.append("![Hybrid MLP weight search](plots/hybrid_weight_search.png)")
798
+ lines.append("")
799
+
800
+ lines.append("## 8. Hybrid Pipeline: XGBoost vs Geometric")
801
+ lines.append("")
802
+ lines.append("Same grid over w_xgb in {0.3 ... 0.8}. w_geo = 1 - w_xgb.")
803
  lines.append("")
804
+ lines.append("| XGBoost Weight (w_xgb) | Mean LOPO F1 |")
805
+ lines.append("|-----------------------:|-------------:|")
806
+ for w in sorted(hybrid_xgb_f1.keys()):
807
+ marker = " **<-- selected**" if w == best_w_xgb else ""
808
+ lines.append(f"| {w:.1f} | {hybrid_xgb_f1[w]:.4f}{marker} |")
809
  lines.append("")
810
+ lines.append(f"**Best:** w_xgb = {best_w_xgb:.1f} → mean LOPO F1 = {hybrid_xgb_f1[best_w_xgb]:.4f}")
811
+ lines.append("")
812
+ lines.append("![Hybrid XGBoost weight search](plots/hybrid_xgb_weight_search.png)")
813
  lines.append("")
814
 
815
+ f1_mlp = hybrid_mlp_f1[best_w_mlp]
816
+ f1_xgb = hybrid_xgb_f1[best_w_xgb]
817
+ lines.append("### Which hybrid is used in the app?")
818
+ lines.append("")
819
+ if use_xgb_for_hybrid:
820
+ lines.append(f"**XGBoost hybrid is better** (F1 = {f1_xgb:.4f} vs MLP hybrid F1 = {f1_mlp:.4f}).")
821
+ else:
822
+ lines.append(f"**MLP hybrid is better** (F1 = {f1_mlp:.4f} vs XGBoost hybrid F1 = {f1_xgb:.4f}).")
823
+ lines.append("")
824
+ if lr_combiner_f1 is not None:
825
+ lines.append("### Logistic regression combiner (replaces heuristic weights)")
826
+ lines.append("")
827
+ lines.append("Instead of a fixed linear blend (e.g. 0.3·ML + 0.7·geo), a **logistic regression** "
828
+ "combines model probability and geometric score: meta-features = [model_prob, geo_score], "
829
+ "trained on the same LOPO splits. Threshold from Youden's J on combiner output.")
830
+ lines.append("")
831
+ lines.append(f"| Method | Mean LOPO F1 |")
832
+ lines.append("|--------|-------------:|")
833
+ lines.append(f"| Heuristic weight grid (best w) | {(f1_xgb if use_xgb_for_hybrid else f1_mlp):.4f} |")
834
+ lines.append(f"| **LR combiner** | **{lr_combiner_f1:.4f}** |")
835
+ lines.append("")
836
+ lines.append("The app uses the saved LR combiner when `combiner_path` is set in `hybrid_focus_config.json`.")
837
+ lines.append("")
838
+ else:
839
+ if use_xgb_for_hybrid:
840
+ lines.append("The app uses **XGBoost + geometric** with the weights above.")
841
+ else:
842
+ lines.append("The app uses **MLP + geometric** with the weights above.")
843
+ lines.append("")
844
+ lines.append("## 5. Eye and Mouth Aspect Ratio Thresholds")
845
  lines.append("")
846
  lines.append("### EAR (Eye Aspect Ratio)")
847
  lines.append("")
 
874
  lines.append("![MAR distribution](plots/mar_distribution.png)")
875
  lines.append("")
876
 
877
+ lines.append("## 10. Other Constants")
878
  lines.append("")
879
  lines.append("| Constant | Value | Rationale |")
880
  lines.append("|----------|------:|-----------|")
 
901
  print(f"\nReport written to {REPORT_PATH}")
902
 
903
 
904
+ def write_hybrid_config(use_xgb, best_w_mlp, best_w_xgb, config_path,
905
+ combiner_path=None, combiner_threshold=None):
906
+ """Write hybrid_focus_config.json. If combiner_path set, app uses LR combiner instead of heuristic weights."""
907
+ import json
908
+ if use_xgb:
909
+ w_xgb = round(float(best_w_xgb), 2)
910
+ w_geo = round(1.0 - best_w_xgb, 2)
911
+ w_mlp = 0.3
912
+ else:
913
+ w_mlp = round(float(best_w_mlp), 2)
914
+ w_geo = round(1.0 - best_w_mlp, 2)
915
+ w_xgb = 0.0
916
+ cfg = {
917
+ "use_xgb": bool(use_xgb),
918
+ "w_mlp": w_mlp,
919
+ "w_xgb": w_xgb,
920
+ "w_geo": w_geo,
921
+ "threshold": float(combiner_threshold) if combiner_threshold is not None else 0.35,
922
+ "use_yawn_veto": True,
923
+ "geo_face_weight": 0.7,
924
+ "geo_eye_weight": 0.3,
925
+ "mar_yawn_threshold": 0.55,
926
+ "metric": "f1",
927
+ }
928
+ if combiner_path:
929
+ cfg["combiner"] = "logistic"
930
+ cfg["combiner_path"] = os.path.normpath(combiner_path)
931
+ with open(config_path, "w", encoding="utf-8") as f:
932
+ json.dump(cfg, f, indent=2)
933
+ print(f" Written {config_path} (use_xgb={cfg['use_xgb']}, combiner={cfg.get('combiner', 'heuristic')})")
934
+
935
+
936
  def main():
937
  os.makedirs(PLOTS_DIR, exist_ok=True)
938
 
939
  lopo_results = run_lopo_models()
940
  model_stats = analyse_model_thresholds(lopo_results)
941
+ extended_stats = analyse_precision_recall_confusion(lopo_results, model_stats)
942
+ plot_confusion_matrices(extended_stats)
943
  geo_f1, best_alpha = run_geo_weight_search()
944
+ hybrid_mlp_f1, best_w_mlp = run_hybrid_weight_search(lopo_results)
945
+ hybrid_xgb_f1, best_w_xgb = run_hybrid_xgb_weight_search(lopo_results)
946
  dist_stats = plot_distributions()
947
 
948
+ f1_mlp = hybrid_mlp_f1[best_w_mlp]
949
+ f1_xgb = hybrid_xgb_f1[best_w_xgb]
950
+ use_xgb_for_hybrid = f1_xgb > f1_mlp
951
+ print(f"\n Hybrid comparison: MLP F1 = {f1_mlp:.4f}, XGBoost F1 = {f1_xgb:.4f} → "
952
+ f"use {'XGBoost' if use_xgb_for_hybrid else 'MLP'}")
953
+
954
+ lr_combiner_f1 = run_hybrid_lr_combiner(lopo_results, use_xgb=use_xgb_for_hybrid)
955
+ combiner_threshold, combiner_path = train_and_save_hybrid_combiner(
956
+ lopo_results, use_xgb_for_hybrid,
957
+ combiner_path=os.path.join(_PROJECT_ROOT, "checkpoints", "hybrid_combiner.joblib"),
958
+ )
959
+
960
+ config_path = os.path.join(_PROJECT_ROOT, "checkpoints", "hybrid_focus_config.json")
961
+ write_hybrid_config(use_xgb_for_hybrid, best_w_mlp, best_w_xgb, config_path,
962
+ combiner_path=combiner_path, combiner_threshold=combiner_threshold)
963
+
964
+ write_report(model_stats, extended_stats, geo_f1, best_alpha,
965
+ hybrid_mlp_f1, best_w_mlp,
966
+ hybrid_xgb_f1, best_w_xgb,
967
+ use_xgb_for_hybrid, dist_stats,
968
+ lr_combiner_f1=lr_combiner_f1)
969
  print("\nDone.")
970
 
971
 
evaluation/plots/confusion_matrix_mlp.png ADDED
evaluation/plots/confusion_matrix_xgb.png ADDED
evaluation/plots/hybrid_xgb_weight_search.png ADDED
models/mlp/train.py CHANGED
@@ -1,14 +1,15 @@
1
  import json
2
- import os, sys
3
  import random
4
 
5
  import numpy as np
 
6
  import torch
7
  import torch.nn as nn
8
  import torch.optim as optim
9
  from sklearn.metrics import f1_score, roc_auc_score
10
 
11
- from data_preparation.prepare_dataset import get_dataloaders
12
 
13
  USE_CLEARML = False
14
 
@@ -227,6 +228,13 @@ def main():
227
 
228
  print(f"[LOG] Training history saved to: {log_path}")
229
 
 
 
 
 
 
 
 
230
 
231
  if __name__ == "__main__":
232
  main()
 
1
  import json
2
+ import os
3
  import random
4
 
5
  import numpy as np
6
+ import joblib
7
  import torch
8
  import torch.nn as nn
9
  import torch.optim as optim
10
  from sklearn.metrics import f1_score, roc_auc_score
11
 
12
+ from data_preparation.prepare_dataset import get_dataloaders, SELECTED_FEATURES
13
 
14
  USE_CLEARML = False
15
 
 
228
 
229
  print(f"[LOG] Training history saved to: {log_path}")
230
 
231
+ # Save scaler and feature names for inference (ui/pipeline.py)
232
+ scaler_path = os.path.join(ckpt_dir, "scaler_mlp.joblib")
233
+ joblib.dump(scaler, scaler_path)
234
+ meta_path = os.path.join(ckpt_dir, "meta_mlp.npz")
235
+ np.savez(meta_path, feature_names=np.array(SELECTED_FEATURES["face_orientation"]))
236
+ print(f"[LOG] Scaler and meta saved to {ckpt_dir}")
237
+
238
 
239
  if __name__ == "__main__":
240
  main()
requirements.txt CHANGED
@@ -8,6 +8,7 @@ opencv-contrib-python>=4.8.0
8
  numpy>=1.24.0
9
  scikit-learn>=1.2.0
10
  joblib>=1.2.0
 
11
  fastapi>=0.104.0
12
  uvicorn[standard]>=0.24.0
13
  aiosqlite>=0.19.0
 
8
  numpy>=1.24.0
9
  scikit-learn>=1.2.0
10
  joblib>=1.2.0
11
+ torch>=2.0.0
12
  fastapi>=0.104.0
13
  uvicorn[standard]>=0.24.0
14
  aiosqlite>=0.19.0
ui/README.md CHANGED
@@ -14,7 +14,7 @@ Live camera demo and real-time inference pipeline.
14
  | Pipeline | Features | Model | Source |
15
  |----------|----------|-------|--------|
16
  | `FaceMeshPipeline` | Head pose + eye geometry | Rule-based fusion | `models/head_pose.py`, `models/eye_scorer.py` |
17
- | `MLPPipeline` | 10 selected features | PyTorch MLP | `checkpoints/model_best.joblib` |
18
  | `XGBoostPipeline` | 10 selected features | XGBoost | `models/xgboost/checkpoints/face_orientation_best.json` |
19
 
20
  ## 3. Running
 
14
  | Pipeline | Features | Model | Source |
15
  |----------|----------|-------|--------|
16
  | `FaceMeshPipeline` | Head pose + eye geometry | Rule-based fusion | `models/head_pose.py`, `models/eye_scorer.py` |
17
+ | `MLPPipeline` | 10 selected features | PyTorch MLP (10→64→32→2) | `checkpoints/mlp_best.pt` + `scaler_mlp.joblib` |
18
  | `XGBoostPipeline` | 10 selected features | XGBoost | `models/xgboost/checkpoints/face_orientation_best.json` |
19
 
20
  ## 3. Running
ui/live_demo.py CHANGED
@@ -13,7 +13,7 @@ if _PROJECT_ROOT not in sys.path:
13
 
14
  from ui.pipeline import (
15
  FaceMeshPipeline, MLPPipeline, HybridFocusPipeline,
16
- XGBoostPipeline, _latest_model_artifacts,
17
  )
18
  from models.face_mesh import FaceMeshDetector
19
 
@@ -149,16 +149,15 @@ def main():
149
  )
150
  available_modes.append(MODE_GEO)
151
 
152
- # 2. MLP & Hybrid
153
- mlp_path, _, _ = _latest_model_artifacts(model_dir)
154
- if mlp_path is None and not args.mlp_dir:
155
- # Fallback to MLP/models
156
  alt_dir = os.path.join(_PROJECT_ROOT, "MLP", "models")
157
- mlp_path, _, _ = _latest_model_artifacts(alt_dir)
158
- if mlp_path:
159
  model_dir = alt_dir
 
160
 
161
- if mlp_path is not None:
162
  try:
163
  pipelines[MODE_MLP] = MLPPipeline(model_dir=model_dir, detector=detector)
164
  available_modes.append(MODE_MLP)
 
13
 
14
  from ui.pipeline import (
15
  FaceMeshPipeline, MLPPipeline, HybridFocusPipeline,
16
+ XGBoostPipeline, _mlp_artifacts_available,
17
  )
18
  from models.face_mesh import FaceMeshDetector
19
 
 
149
  )
150
  available_modes.append(MODE_GEO)
151
 
152
+ # 2. MLP & Hybrid (PyTorch MLP from mlp_best.pt + scaler_mlp.joblib)
153
+ mlp_available = _mlp_artifacts_available(model_dir)
154
+ if not mlp_available and not args.mlp_dir:
 
155
  alt_dir = os.path.join(_PROJECT_ROOT, "MLP", "models")
156
+ if _mlp_artifacts_available(alt_dir):
 
157
  model_dir = alt_dir
158
+ mlp_available = True
159
 
160
+ if mlp_available:
161
  try:
162
  pipelines[MODE_MLP] = MLPPipeline(model_dir=model_dir, detector=detector)
163
  available_modes.append(MODE_MLP)
ui/pipeline.py CHANGED
@@ -7,6 +7,8 @@ import sys
7
 
8
  import numpy as np
9
  import joblib
 
 
10
 
11
  _PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
12
  if _PROJECT_ROOT not in sys.path:
@@ -72,13 +74,17 @@ class _OutputSmoother:
72
 
73
 
74
  DEFAULT_HYBRID_CONFIG = {
 
75
  "w_mlp": 0.3,
 
76
  "w_geo": 0.7,
77
  "threshold": 0.35,
78
  "use_yawn_veto": True,
79
  "geo_face_weight": 0.7,
80
  "geo_eye_weight": 0.3,
81
  "mar_yawn_threshold": float(MAR_YAWN_THRESHOLD),
 
 
82
  }
83
 
84
 
@@ -237,23 +243,45 @@ class FaceMeshPipeline:
237
  self.close()
238
 
239
 
240
- def _latest_model_artifacts(model_dir):
241
- model_files = sorted(glob.glob(os.path.join(model_dir, "model_*.joblib")))
242
- if not model_files:
243
- model_files = sorted(glob.glob(os.path.join(model_dir, "mlp_*.joblib")))
244
- if not model_files:
245
- return None, None, None
246
- basename = os.path.basename(model_files[-1])
247
- tag = ""
248
- for prefix in ("model_", "mlp_"):
249
- if basename.startswith(prefix):
250
- tag = basename[len(prefix) :].replace(".joblib", "")
251
- break
252
- scaler_path = os.path.join(model_dir, f"scaler_{tag}.joblib")
253
- meta_path = os.path.join(model_dir, f"meta_{tag}.npz")
254
- if not os.path.isfile(scaler_path) or not os.path.isfile(meta_path):
255
- return None, None, None
256
- return model_files[-1], scaler_path, meta_path
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
257
 
258
 
259
  def _load_hybrid_config(model_dir: str, config_path: str | None = None):
@@ -270,18 +298,29 @@ def _load_hybrid_config(model_dir: str, config_path: str | None = None):
270
  if key in file_cfg:
271
  cfg[key] = file_cfg[key]
272
 
273
- cfg["w_mlp"] = float(cfg["w_mlp"])
 
 
274
  cfg["w_geo"] = float(cfg["w_geo"])
275
- weight_sum = cfg["w_mlp"] + cfg["w_geo"]
276
- if weight_sum <= 0:
277
- raise ValueError("[HYBRID] Invalid config: w_mlp + w_geo must be > 0")
278
- cfg["w_mlp"] /= weight_sum
279
- cfg["w_geo"] /= weight_sum
 
 
 
 
 
 
 
280
  cfg["threshold"] = float(cfg["threshold"])
281
  cfg["use_yawn_veto"] = bool(cfg["use_yawn_veto"])
282
  cfg["geo_face_weight"] = float(cfg["geo_face_weight"])
283
  cfg["geo_eye_weight"] = float(cfg["geo_eye_weight"])
284
  cfg["mar_yawn_threshold"] = float(cfg["mar_yawn_threshold"])
 
 
285
 
286
  print(f"[HYBRID] Loaded config: {resolved}")
287
  return cfg, resolved
@@ -290,18 +329,11 @@ def _load_hybrid_config(model_dir: str, config_path: str | None = None):
290
  class MLPPipeline:
291
  def __init__(self, model_dir=None, detector=None, threshold=0.23):
292
  if model_dir is None:
293
- # Check primary location
294
  model_dir = os.path.join(_PROJECT_ROOT, "MLP", "models")
295
  if not os.path.exists(model_dir):
296
  model_dir = os.path.join(_PROJECT_ROOT, "checkpoints")
297
 
298
- mlp_path, scaler_path, meta_path = _latest_model_artifacts(model_dir)
299
- if mlp_path is None:
300
- raise FileNotFoundError(f"No MLP artifacts in {model_dir}")
301
- self._mlp = joblib.load(mlp_path)
302
- self._scaler = joblib.load(scaler_path)
303
- meta = np.load(meta_path, allow_pickle=True)
304
- self._feature_names = list(meta["feature_names"])
305
  self._indices = [FEATURE_NAMES.index(n) for n in self._feature_names]
306
 
307
  self._detector = detector or FaceMeshDetector()
@@ -312,7 +344,7 @@ class MLPPipeline:
312
  self._temporal = TemporalTracker()
313
  self._smoother = _OutputSmoother()
314
  self._threshold = threshold
315
- print(f"[MLP] Loaded {mlp_path} | {len(self._feature_names)} features | threshold={threshold}")
316
 
317
  def process_frame(self, bgr_frame):
318
  landmarks = self._detector.process(bgr_frame)
@@ -344,12 +376,13 @@ class MLPPipeline:
344
  out["s_eye"] = float(vec[_FEAT_IDX["s_eye"]])
345
  out["mar"] = float(vec[_FEAT_IDX["mar"]])
346
 
347
- X = vec[self._indices].reshape(1, -1).astype(np.float64)
348
  X_sc = self._scaler.transform(X)
349
- if hasattr(self._mlp, "predict_proba"):
350
- mlp_prob = float(self._mlp.predict_proba(X_sc)[0, 1])
351
- else:
352
- mlp_prob = float(self._mlp.predict(X_sc)[0] == 1)
 
353
  out["mlp_prob"] = float(np.clip(mlp_prob, 0.0, 1.0))
354
  out["raw_score"] = self._smoother.update(out["mlp_prob"], True)
355
  out["is_focused"] = out["raw_score"] >= self._threshold
@@ -370,6 +403,13 @@ class MLPPipeline:
370
  self.close()
371
 
372
 
 
 
 
 
 
 
 
373
  class HybridFocusPipeline:
374
  def __init__(
375
  self,
@@ -380,17 +420,8 @@ class HybridFocusPipeline:
380
  ):
381
  if model_dir is None:
382
  model_dir = os.path.join(_PROJECT_ROOT, "checkpoints")
383
- mlp_path, scaler_path, meta_path = _latest_model_artifacts(model_dir)
384
- if mlp_path is None:
385
- raise FileNotFoundError(f"No MLP artifacts in {model_dir}")
386
-
387
- self._mlp = joblib.load(mlp_path)
388
- self._scaler = joblib.load(scaler_path)
389
- meta = np.load(meta_path, allow_pickle=True)
390
- self._feature_names = list(meta["feature_names"])
391
- self._indices = [FEATURE_NAMES.index(n) for n in self._feature_names]
392
-
393
  self._cfg, self._cfg_path = _load_hybrid_config(model_dir=model_dir, config_path=config_path)
 
394
 
395
  self._detector = detector or FaceMeshDetector()
396
  self._owns_detector = detector is None
@@ -400,11 +431,41 @@ class HybridFocusPipeline:
400
  self.head_pose = self._head_pose
401
  self._smoother = _OutputSmoother()
402
 
403
- print(
404
- f"[HYBRID] Loaded {mlp_path} | {len(self._feature_names)} features | "
405
- f"w_mlp={self._cfg['w_mlp']:.2f}, w_geo={self._cfg['w_geo']:.2f}, "
406
- f"threshold={self._cfg['threshold']:.2f}"
407
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
408
 
409
  @property
410
  def config(self) -> dict:
@@ -465,15 +526,32 @@ class HybridFocusPipeline:
465
  }
466
  vec = extract_features(landmarks, w, h, self._head_pose, self._eye_scorer, self._temporal, _pre=pre)
467
  vec = _clip_features(vec)
468
- X = vec[self._indices].reshape(1, -1).astype(np.float64)
469
- X_sc = self._scaler.transform(X)
470
- if hasattr(self._mlp, "predict_proba"):
471
- mlp_prob = float(self._mlp.predict_proba(X_sc)[0, 1])
 
 
 
 
 
 
 
472
  else:
473
- mlp_prob = float(self._mlp.predict(X_sc)[0] == 1)
474
- out["mlp_prob"] = float(np.clip(mlp_prob, 0.0, 1.0))
 
 
 
 
 
 
 
 
 
 
 
475
 
476
- focus_score = self._cfg["w_mlp"] * out["mlp_prob"] + self._cfg["w_geo"] * out["geo_score"]
477
  out["focus_score"] = self._smoother.update(float(np.clip(focus_score, 0.0, 1.0)), True)
478
  out["raw_score"] = out["focus_score"]
479
  out["is_focused"] = out["focus_score"] >= self._cfg["threshold"]
 
7
 
8
  import numpy as np
9
  import joblib
10
+ import torch
11
+ import torch.nn as nn
12
 
13
  _PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
14
  if _PROJECT_ROOT not in sys.path:
 
74
 
75
 
76
  DEFAULT_HYBRID_CONFIG = {
77
+ "use_xgb": False,
78
  "w_mlp": 0.3,
79
+ "w_xgb": 0.0,
80
  "w_geo": 0.7,
81
  "threshold": 0.35,
82
  "use_yawn_veto": True,
83
  "geo_face_weight": 0.7,
84
  "geo_eye_weight": 0.3,
85
  "mar_yawn_threshold": float(MAR_YAWN_THRESHOLD),
86
+ "combiner": None,
87
+ "combiner_path": None,
88
  }
89
 
90
 
 
243
  self.close()
244
 
245
 
246
+ # PyTorch MLP matching models/mlp/train.py BaseModel (10 -> 64 -> 32 -> 2)
247
+ class _FocusMLP(nn.Module):
248
+ def __init__(self, num_features: int, num_classes: int = 2):
249
+ super().__init__()
250
+ self.network = nn.Sequential(
251
+ nn.Linear(num_features, 64),
252
+ nn.ReLU(),
253
+ nn.Linear(64, 32),
254
+ nn.ReLU(),
255
+ nn.Linear(32, num_classes),
256
+ )
257
+
258
+ def forward(self, x):
259
+ return self.network(x)
260
+
261
+
262
+ def _mlp_artifacts_available(model_dir: str) -> bool:
263
+ pt_path = os.path.join(model_dir, "mlp_best.pt")
264
+ scaler_path = os.path.join(model_dir, "scaler_mlp.joblib")
265
+ return os.path.isfile(pt_path) and os.path.isfile(scaler_path)
266
+
267
+
268
+ def _load_mlp_artifacts(model_dir: str):
269
+ """Load PyTorch MLP + scaler from checkpoints. Returns (model, scaler, feature_names)."""
270
+ pt_path = os.path.join(model_dir, "mlp_best.pt")
271
+ scaler_path = os.path.join(model_dir, "scaler_mlp.joblib")
272
+ if not os.path.isfile(pt_path):
273
+ raise FileNotFoundError(f"No MLP checkpoint at {pt_path}")
274
+ if not os.path.isfile(scaler_path):
275
+ raise FileNotFoundError(f"No scaler at {scaler_path}")
276
+
277
+ num_features = len(MLP_FEATURE_NAMES)
278
+ num_classes = 2
279
+ model = _FocusMLP(num_features, num_classes)
280
+ model.load_state_dict(torch.load(pt_path, map_location="cpu", weights_only=True))
281
+ model.eval()
282
+
283
+ scaler = joblib.load(scaler_path)
284
+ return model, scaler, list(MLP_FEATURE_NAMES)
285
 
286
 
287
  def _load_hybrid_config(model_dir: str, config_path: str | None = None):
 
298
  if key in file_cfg:
299
  cfg[key] = file_cfg[key]
300
 
301
+ cfg["use_xgb"] = bool(cfg.get("use_xgb", False))
302
+ cfg["w_mlp"] = float(cfg.get("w_mlp", 0.3))
303
+ cfg["w_xgb"] = float(cfg.get("w_xgb", 0.0))
304
  cfg["w_geo"] = float(cfg["w_geo"])
305
+ if cfg["use_xgb"]:
306
+ weight_sum = cfg["w_xgb"] + cfg["w_geo"]
307
+ if weight_sum <= 0:
308
+ raise ValueError("[HYBRID] Invalid config: w_xgb + w_geo must be > 0")
309
+ cfg["w_xgb"] /= weight_sum
310
+ cfg["w_geo"] /= weight_sum
311
+ else:
312
+ weight_sum = cfg["w_mlp"] + cfg["w_geo"]
313
+ if weight_sum <= 0:
314
+ raise ValueError("[HYBRID] Invalid config: w_mlp + w_geo must be > 0")
315
+ cfg["w_mlp"] /= weight_sum
316
+ cfg["w_geo"] /= weight_sum
317
  cfg["threshold"] = float(cfg["threshold"])
318
  cfg["use_yawn_veto"] = bool(cfg["use_yawn_veto"])
319
  cfg["geo_face_weight"] = float(cfg["geo_face_weight"])
320
  cfg["geo_eye_weight"] = float(cfg["geo_eye_weight"])
321
  cfg["mar_yawn_threshold"] = float(cfg["mar_yawn_threshold"])
322
+ cfg["combiner"] = cfg.get("combiner") or None
323
+ cfg["combiner_path"] = cfg.get("combiner_path") or None
324
 
325
  print(f"[HYBRID] Loaded config: {resolved}")
326
  return cfg, resolved
 
329
  class MLPPipeline:
330
  def __init__(self, model_dir=None, detector=None, threshold=0.23):
331
  if model_dir is None:
 
332
  model_dir = os.path.join(_PROJECT_ROOT, "MLP", "models")
333
  if not os.path.exists(model_dir):
334
  model_dir = os.path.join(_PROJECT_ROOT, "checkpoints")
335
 
336
+ self._mlp, self._scaler, self._feature_names = _load_mlp_artifacts(model_dir)
 
 
 
 
 
 
337
  self._indices = [FEATURE_NAMES.index(n) for n in self._feature_names]
338
 
339
  self._detector = detector or FaceMeshDetector()
 
344
  self._temporal = TemporalTracker()
345
  self._smoother = _OutputSmoother()
346
  self._threshold = threshold
347
+ print(f"[MLP] Loaded PyTorch MLP from {model_dir} | {len(self._feature_names)} features | threshold={threshold}")
348
 
349
  def process_frame(self, bgr_frame):
350
  landmarks = self._detector.process(bgr_frame)
 
376
  out["s_eye"] = float(vec[_FEAT_IDX["s_eye"]])
377
  out["mar"] = float(vec[_FEAT_IDX["mar"]])
378
 
379
+ X = vec[self._indices].reshape(1, -1).astype(np.float32)
380
  X_sc = self._scaler.transform(X)
381
+ with torch.no_grad():
382
+ x_t = torch.from_numpy(X_sc).float()
383
+ logits = self._mlp(x_t)
384
+ probs = torch.softmax(logits, dim=1)
385
+ mlp_prob = float(probs[0, 1])
386
  out["mlp_prob"] = float(np.clip(mlp_prob, 0.0, 1.0))
387
  out["raw_score"] = self._smoother.update(out["mlp_prob"], True)
388
  out["is_focused"] = out["raw_score"] >= self._threshold
 
403
  self.close()
404
 
405
 
406
+ def _resolve_xgb_path():
407
+ p = os.path.join(_PROJECT_ROOT, "models", "xgboost", "checkpoints", "face_orientation_best.json")
408
+ if os.path.isfile(p):
409
+ return p
410
+ return os.path.join(_PROJECT_ROOT, "checkpoints", "xgboost_face_orientation_best.json")
411
+
412
+
413
  class HybridFocusPipeline:
414
  def __init__(
415
  self,
 
420
  ):
421
  if model_dir is None:
422
  model_dir = os.path.join(_PROJECT_ROOT, "checkpoints")
 
 
 
 
 
 
 
 
 
 
423
  self._cfg, self._cfg_path = _load_hybrid_config(model_dir=model_dir, config_path=config_path)
424
+ self._use_xgb = self._cfg["use_xgb"]
425
 
426
  self._detector = detector or FaceMeshDetector()
427
  self._owns_detector = detector is None
 
431
  self.head_pose = self._head_pose
432
  self._smoother = _OutputSmoother()
433
 
434
+ self._combiner = None
435
+ combiner_path = self._cfg.get("combiner_path")
436
+ if combiner_path and self._cfg.get("combiner") == "logistic":
437
+ resolved_combiner = combiner_path if os.path.isabs(combiner_path) else os.path.join(model_dir, combiner_path)
438
+ if not os.path.isfile(resolved_combiner):
439
+ resolved_combiner = os.path.join(_PROJECT_ROOT, combiner_path)
440
+ if os.path.isfile(resolved_combiner):
441
+ blob = joblib.load(resolved_combiner)
442
+ self._combiner = blob.get("combiner")
443
+ if self._combiner is None:
444
+ self._combiner = blob
445
+ print(f"[HYBRID] LR combiner loaded from {resolved_combiner}")
446
+ else:
447
+ print(f"[HYBRID] combiner_path not found: {resolved_combiner}, using heuristic weights")
448
+ if self._use_xgb:
449
+ from xgboost import XGBClassifier
450
+ xgb_path = _resolve_xgb_path()
451
+ if not os.path.isfile(xgb_path):
452
+ raise FileNotFoundError(f"No XGBoost checkpoint at {xgb_path}")
453
+ self._xgb_model = XGBClassifier()
454
+ self._xgb_model.load_model(xgb_path)
455
+ self._xgb_indices = [FEATURE_NAMES.index(n) for n in XGBoostPipeline.SELECTED]
456
+ self._mlp = None
457
+ self._scaler = None
458
+ self._indices = None
459
+ self._feature_names = list(XGBoostPipeline.SELECTED)
460
+ mode = "LR combiner" if self._combiner else f"w_xgb={self._cfg['w_xgb']:.2f}, w_geo={self._cfg['w_geo']:.2f}"
461
+ print(f"[HYBRID] XGBoost+geo | {xgb_path} | {mode}, threshold={self._cfg['threshold']:.2f}")
462
+ else:
463
+ self._mlp, self._scaler, self._feature_names = _load_mlp_artifacts(model_dir)
464
+ self._indices = [FEATURE_NAMES.index(n) for n in self._feature_names]
465
+ self._xgb_model = None
466
+ self._xgb_indices = None
467
+ mode = "LR combiner" if self._combiner else f"w_mlp={self._cfg['w_mlp']:.2f}, w_geo={self._cfg['w_geo']:.2f}"
468
+ print(f"[HYBRID] MLP+geo | {len(self._feature_names)} features | {mode}, threshold={self._cfg['threshold']:.2f}")
469
 
470
  @property
471
  def config(self) -> dict:
 
526
  }
527
  vec = extract_features(landmarks, w, h, self._head_pose, self._eye_scorer, self._temporal, _pre=pre)
528
  vec = _clip_features(vec)
529
+
530
+ if self._use_xgb:
531
+ X = vec[self._xgb_indices].reshape(1, -1).astype(np.float32)
532
+ prob = self._xgb_model.predict_proba(X)[0]
533
+ model_prob = float(np.clip(prob[1], 0.0, 1.0))
534
+ out["mlp_prob"] = model_prob
535
+ if self._combiner is not None:
536
+ meta = np.array([[model_prob, out["geo_score"]]], dtype=np.float32)
537
+ focus_score = float(self._combiner.predict_proba(meta)[0, 1])
538
+ else:
539
+ focus_score = self._cfg["w_xgb"] * model_prob + self._cfg["w_geo"] * out["geo_score"]
540
  else:
541
+ X = vec[self._indices].reshape(1, -1).astype(np.float32)
542
+ X_sc = self._scaler.transform(X)
543
+ with torch.no_grad():
544
+ x_t = torch.from_numpy(X_sc).float()
545
+ logits = self._mlp(x_t)
546
+ probs = torch.softmax(logits, dim=1)
547
+ mlp_prob = float(probs[0, 1])
548
+ out["mlp_prob"] = float(np.clip(mlp_prob, 0.0, 1.0))
549
+ if self._combiner is not None:
550
+ meta = np.array([[out["mlp_prob"], out["geo_score"]]], dtype=np.float32)
551
+ focus_score = float(self._combiner.predict_proba(meta)[0, 1])
552
+ else:
553
+ focus_score = self._cfg["w_mlp"] * out["mlp_prob"] + self._cfg["w_geo"] * out["geo_score"]
554
 
 
555
  out["focus_score"] = self._smoother.update(float(np.clip(focus_score, 0.0, 1.0)), True)
556
  out["raw_score"] = out["focus_score"]
557
  out["is_focused"] = out["focus_score"] >= self._cfg["threshold"]