Spaces:
Sleeping
Sleeping
| title: FocusGuard | |
| emoji: 👁️ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| short_description: Real-time webcam focus detection via MediaPipe + MLP/XGBoost | |
| # FocusGuard | |
| Real-time webcam-based visual attention estimation. MediaPipe Face Mesh extracts 17 features (EAR, gaze ratios, head pose, PERCLOS) per frame, selects 10, and routes them through MLP or XGBoost for binary focused/unfocused classification. Includes a local OpenCV demo and a full React + FastAPI web app with WebSocket/WebRTC video streaming. | |
|  | |
| --- | |
| ## Team | |
| **Team name:** FocusGuards (5CCSAGAP Large Group Project) | |
| **Members:** Yingao Zheng, Mohamed Alketbi, Abdelrahman Almatrooshi, Junhao Zhou, Kexin Wang, Langyuan Huang, Saba Al-Gafri, Ayten Arab, Jaroslav Rakoto-Miklas | |
| --- | |
| ## Links | |
| ### Project access | |
| - Git repository: [GAP_Large_project](https://github.kcl.ac.uk/k23172173/GAP_Large_project) | |
| - Deployed app (Hugging Face): [FocusGuard/final_v2](https://huggingface.co/spaces/FocusGuard/final_v2) | |
| - ClearML experiments: [FocusGuards Large Group Project](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments) | |
| ### Data and checkpoints | |
| - Checkpoints (Google Drive): [Download folder](https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link) | |
| - Dataset (Google Drive): [Dataset folder](https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing) | |
| - Data consent form (PDF): [Consent document](https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link) | |
| The deployed app contains the full feature set (session history, L2CS calibration, model selector, achievements). | |
| --- | |
| ## Trained models | |
| Model checkpoints are **not included** in the submission archive. Download them before running inference. | |
| ### Option 1: Hugging Face Space | |
| Pre-trained checkpoints are available in the Hugging Face Space files: | |
| ``` | |
| https://huggingface.co/spaces/FocusGuard/final_v2/tree/main/checkpoints | |
| ``` | |
| Download and place into `checkpoints/`: | |
| | File | Description | | |
| |------|-------------| | |
| | `mlp_best.pt` | PyTorch MLP (10-64-32-2, ~2,850 params) | | |
| | `xgboost_face_orientation_best.json` | XGBoost (600 trees, depth 8, lr 0.1489) | | |
| | `scaler_mlp.joblib` | StandardScaler fit on training data | | |
| | `hybrid_focus_config.json` | Hybrid pipeline fusion weights | | |
| | `hybrid_combiner.joblib` | Hybrid combiner | | |
| | `L2CSNet_gaze360.pkl` | L2CS-Net ResNet50 gaze weights (96 MB) | | |
| ### Option 2: ClearML | |
| Models are registered as ClearML OutputModels under project "FocusGuards Large Group Project". | |
| | Model | Task ID | Model ID | | |
| |-------|---------|----------| | |
| | MLP | `3899b5aa0c3348b28213a3194322cdf7` | `56f94b799f624bdc845fa50c4d0606fe` | | |
| | XGBoost | `c0ceb8e7e8194a51a7a31078cc47775c` | `6727b8de334f4ca0961c46b436f6fb7c` | | |
| **UI:** Open a task on the [experiments page](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments), go to Artifacts > Output Models, and download. | |
| **Python:** | |
| ```python | |
| from clearml import Model | |
| mlp = Model(model_id="56f94b799f624bdc845fa50c4d0606fe") | |
| mlp_path = mlp.get_local_copy() # downloads .pt | |
| xgb = Model(model_id="6727b8de334f4ca0961c46b436f6fb7c") | |
| xgb_path = xgb.get_local_copy() # downloads .json | |
| ``` | |
| Copy the downloaded files into `checkpoints/`. | |
| ### Option 3: Google Drive (submission fallback) | |
| If ClearML access is restricted, download checkpoints from: | |
| https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link | |
| Place all files under `checkpoints/`. | |
| ### Option 4: Retrain from scratch | |
| ```bash | |
| python -m models.mlp.train | |
| python -m models.xgboost.train | |
| ``` | |
| This regenerates `checkpoints/mlp_best.pt`, `checkpoints/xgboost_face_orientation_best.json`, and scalers. Requires training data under `data/collected_*/`. | |
| --- | |
| ## Project layout | |
| ``` | |
| config/ | |
| default.yaml hyperparameters, thresholds, app settings | |
| __init__.py config loader + ClearML flattener | |
| clearml_enrich.py ClearML task enrichment + artifact upload | |
| data_preparation/ | |
| prepare_dataset.py load/split/scale .npz files (pooled + LOPO) | |
| data_exploration.ipynb EDA: distributions, class balance, correlations | |
| models/ | |
| face_mesh.py MediaPipe 478-point face landmarks | |
| head_pose.py yaw/pitch/roll via solvePnP, face-orientation score | |
| eye_scorer.py EAR, MAR, gaze ratios, PERCLOS | |
| collect_features.py real-time feature extraction + webcam labelling CLI | |
| gaze_calibration.py 9-point polynomial gaze calibration | |
| gaze_eye_fusion.py fuses calibrated gaze with eye openness | |
| mlp/ MLP training, eval, Optuna sweep | |
| xgboost/ XGBoost training, eval, ClearML + Optuna sweeps | |
| L2CS-Net/ vendored L2CS-Net (ResNet50, Gaze360) | |
| checkpoints/ (excluded from archive; see download instructions above) | |
| notebooks/ | |
| mlp.ipynb MLP training + LOPO in Jupyter | |
| xgboost.ipynb XGBoost training + LOPO in Jupyter | |
| evaluation/ | |
| justify_thresholds.py LOPO threshold + weight grid search | |
| feature_importance.py XGBoost gain + leave-one-feature-out ablation | |
| grouped_split_benchmark.py pooled vs LOPO comparison | |
| plots/ ROC curves, confusion matrices, weight searches | |
| logs/ JSON training logs | |
| tests/ | |
| test_*.py unit + integration tests (pytest) | |
| .coveragerc coverage config | |
| ui/ | |
| pipeline.py all 5 pipeline classes + output smoothing | |
| live_demo.py OpenCV webcam demo | |
| src/ React (Vite) frontend source | |
| static/ built frontend assets (after npm build) | |
| main.py FastAPI application entry point | |
| package.json frontend package manifest | |
| requirements.txt | |
| pytest.ini | |
| ``` | |
| --- | |
| ## Setup | |
| Recommended versions: | |
| - Python 3.10-3.11 | |
| - Node.js 18+ (needed only for frontend rebuild/dev) | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate # Windows: venv\Scripts\activate | |
| pip install -r requirements.txt | |
| ``` | |
| Then download checkpoints (see above). | |
| If you need to rebuild frontend assets locally: | |
| ```bash | |
| npm install | |
| npm run build | |
| mkdir -p static && cp -r dist/* static/ | |
| ``` | |
| --- | |
| ## Run | |
| ### Local OpenCV demo | |
| ```bash | |
| python ui/live_demo.py | |
| python ui/live_demo.py --xgb # XGBoost | |
| ``` | |
| Controls: `m` cycle mesh overlay, `1-5` switch pipeline mode, `q` quit. | |
| ### Web app (without Docker) | |
| ```bash | |
| source venv/bin/activate | |
| python -m uvicorn main:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| Open http://localhost:7860 | |
| ### Web app (Docker) | |
| ```bash | |
| docker-compose up # serves on port 7860 | |
| ``` | |
| --- | |
| ## Data collection | |
| ```bash | |
| python -m models.collect_features --name <participant> | |
| ``` | |
| Records webcam sessions with real-time binary labelling (spacebar toggles focused/unfocused). Saves per-frame feature vectors to `data/collected_<participant>/` as `.npz` files. Raw video is never stored. | |
| 9 participants recorded 5-10 min sessions across varied environments (144,793 frames total, 61.5% focused / 38.5% unfocused). All participants provided informed consent. Dataset files are not included in this repository. | |
| Consent document: https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link | |
| Raw participant dataset is excluded from this submission (coursework policy and privacy constraints). It can be shared with module staff on request: https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing | |
| --- | |
| ## Pipeline | |
| ``` | |
| Webcam frame | |
| --> MediaPipe Face Mesh (478 landmarks) | |
| --> Head pose (solvePnP): yaw, pitch, roll, s_face, head_deviation | |
| --> Eye scorer: EAR_left, EAR_right, EAR_avg, s_eye, MAR | |
| --> Gaze ratios: h_gaze, v_gaze, gaze_offset | |
| --> Temporal tracker: PERCLOS, blink_rate, closure_dur, yawn_dur | |
| --> 17 features --> select 10 --> clip to physiological bounds | |
| --> ML model (MLP / XGBoost) or geometric scorer | |
| --> Asymmetric EMA smoothing (alpha_up=0.55, alpha_down=0.45) | |
| --> FOCUSED / UNFOCUSED | |
| ``` | |
| Five runtime modes share the same feature extraction backbone: | |
| | Mode | Description | | |
| |------|-------------| | |
| | **Geometric** | Deterministic scoring: 0.7 * s_face + 0.3 * s_eye, cosine-decay with max_angle=22 deg | | |
| | **XGBoost** | 600-tree gradient-boosted ensemble, threshold 0.28 (LOPO-optimal) | | |
| | **MLP** | PyTorch 10-64-32-2 perceptron, threshold 0.23 (LOPO-optimal) | | |
| | **Hybrid** | 30% MLP + 70% geometric ensemble (LOPO F1 = 0.841) | | |
| | **L2CS** | Deep gaze estimation via L2CS-Net (ResNet50, Gaze360 pretrained) | | |
| Any mode can be combined with L2CS Boost mode (35% base + 65% L2CS, fused threshold 0.52). Off-screen gaze produces near-zero L2CS score via cosine decay, acting as a soft veto. | |
| --- | |
| ## Training | |
| Both scripts read all hyperparameters from `config/default.yaml`. | |
| ```bash | |
| python -m models.mlp.train | |
| python -m models.xgboost.train | |
| ``` | |
| Outputs: `checkpoints/` (model + scaler) and `evaluation/logs/` (CSVs, JSON summaries). | |
| ### ClearML experiment tracking | |
| ```bash | |
| USE_CLEARML=1 python -m models.mlp.train | |
| USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train | |
| USE_CLEARML=1 python -m evaluation.justify_thresholds --clearml | |
| ``` | |
| Logs hyperparameters, per-epoch scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts (config YAML, requirements.txt, git SHA). | |
| Reference experiment IDs: | |
| | Model | ClearML experiment ID | | |
| |-------|------------------------| | |
| | MLP (`models.mlp.train`) | `3899b5aa0c3348b28213a3194322cdf7` | | |
| | XGBoost (`models.xgboost.train`) | `c0ceb8e7e8194a51a7a31078cc47775c` | | |
| --- | |
| ## Evaluation | |
| ```bash | |
| python -m evaluation.justify_thresholds # LOPO threshold + weight search | |
| python -m evaluation.grouped_split_benchmark # pooled vs LOPO comparison | |
| python -m evaluation.feature_importance # XGBoost gain + LOFO ablation | |
| ``` | |
| ### Results (pooled random split, 15% test) | |
| | Model | Accuracy | F1 | ROC-AUC | | |
| |-------|----------|----|---------| | |
| | XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 | | |
| | MLP (64-32) | 92.92% | 0.929 | 0.971 | | |
| ### Results (LOPO, 9 participants) | |
| | Model | LOPO AUC | Best threshold (Youden's J) | F1 at best threshold | | |
| |-------|----------|-----------------------------|----------------------| | |
| | MLP | 0.862 | 0.228 | 0.858 | | |
| | XGBoost | 0.870 | 0.280 | 0.855 | | |
| Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.820). | |
| Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.841). | |
| The ~12 pp drop from pooled to LOPO reflects temporal data leakage and confirms LOPO as the primary generalisation metric. | |
| ### Feature ablation | |
| | Channel subset | Mean LOPO F1 | | |
| |----------------|-------------| | |
| | All 10 features | 0.829 | | |
| | Eye state only | 0.807 | | |
| | Head pose only | 0.748 | | |
| | Gaze only | 0.726 | | |
| Top-5 XGBoost gain: `s_face` (10.27), `ear_right` (9.54), `head_deviation` (8.83), `ear_avg` (6.96), `perclos` (5.68). | |
| --- | |
| ## L2CS Gaze Tracking | |
| L2CS-Net predicts where your eyes are looking, not just where your head is pointed, catching the scenario where the head faces the screen but eyes wander. | |
| **Standalone mode:** Select L2CS as the model. | |
| **Boost mode:** Select any other model, then enable the GAZE toggle. L2CS runs alongside the base model with score-level fusion (35% base / 65% L2CS). Off-screen gaze triggers a soft veto. | |
| **Calibration:** Click Calibrate during a session. A fullscreen overlay shows 9 target dots (3x3 grid). After all 9 points, a degree-2 polynomial maps gaze angles to screen coordinates with IQR outlier filtering and centre-point bias correction. | |
| L2CS weight lookup order in runtime: | |
| 1. `checkpoints/L2CSNet_gaze360.pkl` | |
| 2. `models/L2CS-Net/models/L2CSNet_gaze360.pkl` | |
| 3. `models/L2CSNet_gaze360.pkl` | |
| --- | |
| ## Config | |
| All hyperparameters and app settings are in `config/default.yaml`. Override with `FOCUSGUARD_CONFIG=/path/to/custom.yaml`. | |
| --- | |
| ## Tests | |
| Included checks: | |
| - data prep helpers and real split consistency (`test_data_preparation.py`; split test **skips** if `data/collected_*/*.npz` is absent) | |
| - feature clipping (`test_models_clip_features.py`) | |
| - pipeline integration (`test_pipeline_integration.py`) | |
| - gaze calibration / fusion diagnostics (`test_gaze_pipeline.py`) | |
| - FastAPI health, settings, sessions (`test_health_endpoint.py`, `test_api_settings.py`, `test_api_sessions.py`) | |
| ```bash | |
| pytest | |
| ``` | |
| Coverage is enabled by default via `pytest.ini` (`--cov` / term report). For HTML coverage: `pytest --cov-report=html`. | |
| **Stack:** Python, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net, FastAPI, React/Vite, SQLite, Docker, ClearML, pytest. | |