final / evaluation /feature_selection_justification.md
k22056537
chore: MLP pipeline, evaluation updates, feature importance, confusion matrices
8b47064

Feature selection justification

The face_orientation model uses 10 of 17 extracted features. This document summarises empirical support.

1. Domain rationale

The 10 features were chosen to cover three channels:

  • Head pose: head_deviation, s_face, pitch
  • Eye state: ear_left, ear_right, ear_avg, perclos
  • Gaze: h_gaze, gaze_offset, s_eye

Excluded: v_gaze (noisy), mar (rare events), yaw/roll (redundant with head_deviation/s_face), blink_rate/closure_duration/yawn_duration (temporal overlap with perclos).

2. XGBoost feature importance (gain)

From the trained XGBoost checkpoint (gain on the 10 features):

Feature Gain
head_deviation 8.83
s_face 10.27
s_eye 2.18
h_gaze 4.99
pitch 4.64
ear_left 3.57
ear_avg 6.96
ear_right 9.54
gaze_offset 1.80
perclos 5.68

Top 5 by gain: s_face, ear_right, head_deviation, ear_avg, perclos.

3. Leave-one-feature-out ablation (LOPO)

Baseline (all 10 features) mean LOPO F1: 0.8327.

Feature dropped Mean LOPO F1 Δ vs baseline
head_deviation 0.8395 -0.0068
s_face 0.8390 -0.0063
s_eye 0.8342 -0.0015
h_gaze 0.8244 +0.0083
pitch 0.8250 +0.0077
ear_left 0.8326 +0.0001
ear_avg 0.8350 -0.0023
ear_right 0.8344 -0.0017
gaze_offset 0.8351 -0.0024
perclos 0.8258 +0.0069

Dropping h_gaze hurts most (F1=0.8244), consistent with it being important.

4. Conclusion

Selection is supported by (1) domain rationale (three attention channels), (2) XGBoost gain importance, and (3) leave-one-out ablation. SHAP or correlation-based pruning can be added in future work.