final_test / README.md
Abdelrahman Almatrooshi
Deploy snapshot from main b7a59b11809483dfc959f196f1930240f2662c49
22a6915
---
title: FocusGuard
emoji: 👁️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Real-time webcam focus detection via MediaPipe + MLP/XGBoost
---
# FocusGuard
Real-time webcam-based visual attention estimation. MediaPipe Face Mesh extracts 17 features (EAR, gaze ratios, head pose, PERCLOS) per frame, selects 10, and routes them through MLP or XGBoost for binary focused/unfocused classification. Includes a local OpenCV demo and a full React + FastAPI web app with WebSocket/WebRTC video streaming.
![Real-time focus detection with face mesh and XGBoost classification](assets/focusguard-demo.gif)
---
## Team
**Team name:** FocusGuards (5CCSAGAP Large Group Project)
**Members:** Yingao Zheng, Mohamed Alketbi, Abdelrahman Almatrooshi, Junhao Zhou, Kexin Wang, Langyuan Huang, Saba Al-Gafri, Ayten Arab, Jaroslav Rakoto-Miklas
---
## Links
### Project access
- Git repository: [GAP_Large_project](https://github.kcl.ac.uk/k23172173/GAP_Large_project)
- Deployed app (Hugging Face): [FocusGuard/final_v2](https://huggingface.co/spaces/FocusGuard/final_v2)
- ClearML experiments: [FocusGuards Large Group Project](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments)
### Data and checkpoints
- Checkpoints (Google Drive): [Download folder](https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link)
- Dataset (Google Drive): [Dataset folder](https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing)
- Data consent form (PDF): [Consent document](https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link)
The deployed app contains the full feature set (session history, L2CS calibration, model selector, achievements).
---
## Trained models
Model checkpoints are **not included** in the submission archive. Download them before running inference.
### Option 1: Hugging Face Space
Pre-trained checkpoints are available in the Hugging Face Space files:
```
https://huggingface.co/spaces/FocusGuard/final_v2/tree/main/checkpoints
```
Download and place into `checkpoints/`:
| File | Description |
|------|-------------|
| `mlp_best.pt` | PyTorch MLP (10-64-32-2, ~2,850 params) |
| `xgboost_face_orientation_best.json` | XGBoost (600 trees, depth 8, lr 0.1489) |
| `scaler_mlp.joblib` | StandardScaler fit on training data |
| `hybrid_focus_config.json` | Hybrid pipeline fusion weights |
| `hybrid_combiner.joblib` | Hybrid combiner |
| `L2CSNet_gaze360.pkl` | L2CS-Net ResNet50 gaze weights (96 MB) |
### Option 2: ClearML
Models are registered as ClearML OutputModels under project "FocusGuards Large Group Project".
| Model | Task ID | Model ID |
|-------|---------|----------|
| MLP | `3899b5aa0c3348b28213a3194322cdf7` | `56f94b799f624bdc845fa50c4d0606fe` |
| XGBoost | `c0ceb8e7e8194a51a7a31078cc47775c` | `6727b8de334f4ca0961c46b436f6fb7c` |
**UI:** Open a task on the [experiments page](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments), go to Artifacts > Output Models, and download.
**Python:**
```python
from clearml import Model
mlp = Model(model_id="56f94b799f624bdc845fa50c4d0606fe")
mlp_path = mlp.get_local_copy() # downloads .pt
xgb = Model(model_id="6727b8de334f4ca0961c46b436f6fb7c")
xgb_path = xgb.get_local_copy() # downloads .json
```
Copy the downloaded files into `checkpoints/`.
### Option 3: Google Drive (submission fallback)
If ClearML access is restricted, download checkpoints from:
https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link
Place all files under `checkpoints/`.
### Option 4: Retrain from scratch
```bash
python -m models.mlp.train
python -m models.xgboost.train
```
This regenerates `checkpoints/mlp_best.pt`, `checkpoints/xgboost_face_orientation_best.json`, and scalers. Requires training data under `data/collected_*/`.
---
## Project layout
```
config/
default.yaml hyperparameters, thresholds, app settings
__init__.py config loader + ClearML flattener
clearml_enrich.py ClearML task enrichment + artifact upload
data_preparation/
prepare_dataset.py load/split/scale .npz files (pooled + LOPO)
data_exploration.ipynb EDA: distributions, class balance, correlations
models/
face_mesh.py MediaPipe 478-point face landmarks
head_pose.py yaw/pitch/roll via solvePnP, face-orientation score
eye_scorer.py EAR, MAR, gaze ratios, PERCLOS
collect_features.py real-time feature extraction + webcam labelling CLI
gaze_calibration.py 9-point polynomial gaze calibration
gaze_eye_fusion.py fuses calibrated gaze with eye openness
mlp/ MLP training, eval, Optuna sweep
xgboost/ XGBoost training, eval, ClearML + Optuna sweeps
L2CS-Net/ vendored L2CS-Net (ResNet50, Gaze360)
checkpoints/ (excluded from archive; see download instructions above)
notebooks/
mlp.ipynb MLP training + LOPO in Jupyter
xgboost.ipynb XGBoost training + LOPO in Jupyter
evaluation/
justify_thresholds.py LOPO threshold + weight grid search
feature_importance.py XGBoost gain + leave-one-feature-out ablation
grouped_split_benchmark.py pooled vs LOPO comparison
plots/ ROC curves, confusion matrices, weight searches
logs/ JSON training logs
tests/
test_*.py unit + integration tests (pytest)
.coveragerc coverage config
ui/
pipeline.py all 5 pipeline classes + output smoothing
live_demo.py OpenCV webcam demo
src/ React (Vite) frontend source
static/ built frontend assets (after npm build)
main.py FastAPI application entry point
package.json frontend package manifest
requirements.txt
pytest.ini
```
---
## Setup
Recommended versions:
- Python 3.10-3.11
- Node.js 18+ (needed only for frontend rebuild/dev)
```bash
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
```
Then download checkpoints (see above).
If you need to rebuild frontend assets locally:
```bash
npm install
npm run build
mkdir -p static && cp -r dist/* static/
```
---
## Run
### Local OpenCV demo
```bash
python ui/live_demo.py
python ui/live_demo.py --xgb # XGBoost
```
Controls: `m` cycle mesh overlay, `1-5` switch pipeline mode, `q` quit.
### Web app (without Docker)
```bash
source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860
```
Open http://localhost:7860
### Web app (Docker)
```bash
docker-compose up # serves on port 7860
```
---
## Data collection
```bash
python -m models.collect_features --name <participant>
```
Records webcam sessions with real-time binary labelling (spacebar toggles focused/unfocused). Saves per-frame feature vectors to `data/collected_<participant>/` as `.npz` files. Raw video is never stored.
9 participants recorded 5-10 min sessions across varied environments (144,793 frames total, 61.5% focused / 38.5% unfocused). All participants provided informed consent. Dataset files are not included in this repository.
Consent document: https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link
Raw participant dataset is excluded from this submission (coursework policy and privacy constraints). It can be shared with module staff on request: https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing
---
## Pipeline
```
Webcam frame
--> MediaPipe Face Mesh (478 landmarks)
--> Head pose (solvePnP): yaw, pitch, roll, s_face, head_deviation
--> Eye scorer: EAR_left, EAR_right, EAR_avg, s_eye, MAR
--> Gaze ratios: h_gaze, v_gaze, gaze_offset
--> Temporal tracker: PERCLOS, blink_rate, closure_dur, yawn_dur
--> 17 features --> select 10 --> clip to physiological bounds
--> ML model (MLP / XGBoost) or geometric scorer
--> Asymmetric EMA smoothing (alpha_up=0.55, alpha_down=0.45)
--> FOCUSED / UNFOCUSED
```
Five runtime modes share the same feature extraction backbone:
| Mode | Description |
|------|-------------|
| **Geometric** | Deterministic scoring: 0.7 * s_face + 0.3 * s_eye, cosine-decay with max_angle=22 deg |
| **XGBoost** | 600-tree gradient-boosted ensemble, threshold 0.28 (LOPO-optimal) |
| **MLP** | PyTorch 10-64-32-2 perceptron, threshold 0.23 (LOPO-optimal) |
| **Hybrid** | 30% MLP + 70% geometric ensemble (LOPO F1 = 0.841) |
| **L2CS** | Deep gaze estimation via L2CS-Net (ResNet50, Gaze360 pretrained) |
Any mode can be combined with L2CS Boost mode (35% base + 65% L2CS, fused threshold 0.52). Off-screen gaze produces near-zero L2CS score via cosine decay, acting as a soft veto.
---
## Training
Both scripts read all hyperparameters from `config/default.yaml`.
```bash
python -m models.mlp.train
python -m models.xgboost.train
```
Outputs: `checkpoints/` (model + scaler) and `evaluation/logs/` (CSVs, JSON summaries).
### ClearML experiment tracking
```bash
USE_CLEARML=1 python -m models.mlp.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
USE_CLEARML=1 python -m evaluation.justify_thresholds --clearml
```
Logs hyperparameters, per-epoch scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts (config YAML, requirements.txt, git SHA).
Reference experiment IDs:
| Model | ClearML experiment ID |
|-------|------------------------|
| MLP (`models.mlp.train`) | `3899b5aa0c3348b28213a3194322cdf7` |
| XGBoost (`models.xgboost.train`) | `c0ceb8e7e8194a51a7a31078cc47775c` |
---
## Evaluation
```bash
python -m evaluation.justify_thresholds # LOPO threshold + weight search
python -m evaluation.grouped_split_benchmark # pooled vs LOPO comparison
python -m evaluation.feature_importance # XGBoost gain + LOFO ablation
```
### Results (pooled random split, 15% test)
| Model | Accuracy | F1 | ROC-AUC |
|-------|----------|----|---------|
| XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
| MLP (64-32) | 92.92% | 0.929 | 0.971 |
### Results (LOPO, 9 participants)
| Model | LOPO AUC | Best threshold (Youden's J) | F1 at best threshold |
|-------|----------|-----------------------------|----------------------|
| MLP | 0.862 | 0.228 | 0.858 |
| XGBoost | 0.870 | 0.280 | 0.855 |
Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.820).
Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.841).
The ~12 pp drop from pooled to LOPO reflects temporal data leakage and confirms LOPO as the primary generalisation metric.
### Feature ablation
| Channel subset | Mean LOPO F1 |
|----------------|-------------|
| All 10 features | 0.829 |
| Eye state only | 0.807 |
| Head pose only | 0.748 |
| Gaze only | 0.726 |
Top-5 XGBoost gain: `s_face` (10.27), `ear_right` (9.54), `head_deviation` (8.83), `ear_avg` (6.96), `perclos` (5.68).
---
## L2CS Gaze Tracking
L2CS-Net predicts where your eyes are looking, not just where your head is pointed, catching the scenario where the head faces the screen but eyes wander.
**Standalone mode:** Select L2CS as the model.
**Boost mode:** Select any other model, then enable the GAZE toggle. L2CS runs alongside the base model with score-level fusion (35% base / 65% L2CS). Off-screen gaze triggers a soft veto.
**Calibration:** Click Calibrate during a session. A fullscreen overlay shows 9 target dots (3x3 grid). After all 9 points, a degree-2 polynomial maps gaze angles to screen coordinates with IQR outlier filtering and centre-point bias correction.
L2CS weight lookup order in runtime:
1. `checkpoints/L2CSNet_gaze360.pkl`
2. `models/L2CS-Net/models/L2CSNet_gaze360.pkl`
3. `models/L2CSNet_gaze360.pkl`
---
## Config
All hyperparameters and app settings are in `config/default.yaml`. Override with `FOCUSGUARD_CONFIG=/path/to/custom.yaml`.
---
## Tests
Included checks:
- data prep helpers and real split consistency (`test_data_preparation.py`; split test **skips** if `data/collected_*/*.npz` is absent)
- feature clipping (`test_models_clip_features.py`)
- pipeline integration (`test_pipeline_integration.py`)
- gaze calibration / fusion diagnostics (`test_gaze_pipeline.py`)
- FastAPI health, settings, sessions (`test_health_endpoint.py`, `test_api_settings.py`, `test_api_sessions.py`)
```bash
pytest
```
Coverage is enabled by default via `pytest.ini` (`--cov` / term report). For HTML coverage: `pytest --cov-report=html`.
**Stack:** Python, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net, FastAPI, React/Vite, SQLite, Docker, ClearML, pytest.