Spaces:

FocusGuard
/

final_test

Sleeping

App Files Files Community

final_test / README.md

Abdelrahman Almatrooshi

Deploy snapshot from main b7a59b11809483dfc959f196f1930240f2662c49

22a6915 about 1 month ago

preview code

raw

history blame contribute delete

12.9 kB

	---
	title: FocusGuard
	emoji: 👁️
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	app_port: 7860
	pinned: false
	short_description: Real-time webcam focus detection via MediaPipe + MLP/XGBoost
	---

	# FocusGuard

	Real-time webcam-based visual attention estimation. MediaPipe Face Mesh extracts 17 features (EAR, gaze ratios, head pose, PERCLOS) per frame, selects 10, and routes them through MLP or XGBoost for binary focused/unfocused classification. Includes a local OpenCV demo and a full React + FastAPI web app with WebSocket/WebRTC video streaming.

	![Real-time focus detection with face mesh and XGBoost classification](assets/focusguard-demo.gif)

	---

	## Team

	Team name: FocusGuards (5CCSAGAP Large Group Project)

	Members: Yingao Zheng, Mohamed Alketbi, Abdelrahman Almatrooshi, Junhao Zhou, Kexin Wang, Langyuan Huang, Saba Al-Gafri, Ayten Arab, Jaroslav Rakoto-Miklas

	---

	## Links

	### Project access

	- Git repository: [GAP_Large_project](https://github.kcl.ac.uk/k23172173/GAP_Large_project)
	- Deployed app (Hugging Face): [FocusGuard/final_v2](https://huggingface.co/spaces/FocusGuard/final_v2)
	- ClearML experiments: [FocusGuards Large Group Project](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments)

	### Data and checkpoints

	- Checkpoints (Google Drive): [Download folder](https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link)
	- Dataset (Google Drive): [Dataset folder](https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing)
	- Data consent form (PDF): [Consent document](https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link)

	The deployed app contains the full feature set (session history, L2CS calibration, model selector, achievements).

	---

	## Trained models

	Model checkpoints are not included in the submission archive. Download them before running inference.

	### Option 1: Hugging Face Space

	Pre-trained checkpoints are available in the Hugging Face Space files:

	```
	https://huggingface.co/spaces/FocusGuard/final_v2/tree/main/checkpoints
	```

	Download and place into `checkpoints/`:

	\| File \| Description \|
	\|------\|-------------\|
	\| `mlp_best.pt` \| PyTorch MLP (10-64-32-2, ~2,850 params) \|
	\| `xgboost_face_orientation_best.json` \| XGBoost (600 trees, depth 8, lr 0.1489) \|
	\| `scaler_mlp.joblib` \| StandardScaler fit on training data \|
	\| `hybrid_focus_config.json` \| Hybrid pipeline fusion weights \|
	\| `hybrid_combiner.joblib` \| Hybrid combiner \|
	\| `L2CSNet_gaze360.pkl` \| L2CS-Net ResNet50 gaze weights (96 MB) \|

	### Option 2: ClearML

	Models are registered as ClearML OutputModels under project "FocusGuards Large Group Project".

	\| Model \| Task ID \| Model ID \|
	\|-------\|---------\|----------\|
	\| MLP \| `3899b5aa0c3348b28213a3194322cdf7` \| `56f94b799f624bdc845fa50c4d0606fe` \|
	\| XGBoost \| `c0ceb8e7e8194a51a7a31078cc47775c` \| `6727b8de334f4ca0961c46b436f6fb7c` \|

	UI: Open a task on the [experiments page](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments), go to Artifacts > Output Models, and download.

	Python:

	```python
	from clearml import Model

	mlp = Model(model_id="56f94b799f624bdc845fa50c4d0606fe")
	mlp_path = mlp.get_local_copy() # downloads .pt

	xgb = Model(model_id="6727b8de334f4ca0961c46b436f6fb7c")
	xgb_path = xgb.get_local_copy() # downloads .json
	```

	Copy the downloaded files into `checkpoints/`.

	### Option 3: Google Drive (submission fallback)

	If ClearML access is restricted, download checkpoints from:
	https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link

	Place all files under `checkpoints/`.

	### Option 4: Retrain from scratch

	```bash
	python -m models.mlp.train
	python -m models.xgboost.train
	```

	This regenerates `checkpoints/mlp_best.pt`, `checkpoints/xgboost_face_orientation_best.json`, and scalers. Requires training data under `data/collected_*/`.

	---

	## Project layout

	```
	config/
	default.yaml hyperparameters, thresholds, app settings
	__init__.py config loader + ClearML flattener
	clearml_enrich.py ClearML task enrichment + artifact upload
	data_preparation/
	prepare_dataset.py load/split/scale .npz files (pooled + LOPO)
	data_exploration.ipynb EDA: distributions, class balance, correlations
	models/
	face_mesh.py MediaPipe 478-point face landmarks
	head_pose.py yaw/pitch/roll via solvePnP, face-orientation score
	eye_scorer.py EAR, MAR, gaze ratios, PERCLOS
	collect_features.py real-time feature extraction + webcam labelling CLI
	gaze_calibration.py 9-point polynomial gaze calibration
	gaze_eye_fusion.py fuses calibrated gaze with eye openness
	mlp/ MLP training, eval, Optuna sweep
	xgboost/ XGBoost training, eval, ClearML + Optuna sweeps
	L2CS-Net/ vendored L2CS-Net (ResNet50, Gaze360)
	checkpoints/ (excluded from archive; see download instructions above)
	notebooks/
	mlp.ipynb MLP training + LOPO in Jupyter
	xgboost.ipynb XGBoost training + LOPO in Jupyter
	evaluation/
	justify_thresholds.py LOPO threshold + weight grid search
	feature_importance.py XGBoost gain + leave-one-feature-out ablation
	grouped_split_benchmark.py pooled vs LOPO comparison
	plots/ ROC curves, confusion matrices, weight searches
	logs/ JSON training logs
	tests/
	test_*.py unit + integration tests (pytest)
	.coveragerc coverage config
	ui/
	pipeline.py all 5 pipeline classes + output smoothing
	live_demo.py OpenCV webcam demo
	src/ React (Vite) frontend source
	static/ built frontend assets (after npm build)
	main.py FastAPI application entry point
	package.json frontend package manifest
	requirements.txt
	pytest.ini
	```

	---

	## Setup

	Recommended versions:

	- Python 3.10-3.11
	- Node.js 18+ (needed only for frontend rebuild/dev)

	```bash
	python -m venv venv
	source venv/bin/activate # Windows: venv\Scripts\activate
	pip install -r requirements.txt
	```

	Then download checkpoints (see above).

	If you need to rebuild frontend assets locally:

	```bash
	npm install
	npm run build
	mkdir -p static && cp -r dist/* static/
	```

	---

	## Run

	### Local OpenCV demo

	```bash
	python ui/live_demo.py
	python ui/live_demo.py --xgb # XGBoost
	```

	Controls: `m` cycle mesh overlay, `1-5` switch pipeline mode, `q` quit.

	### Web app (without Docker)

	```bash
	source venv/bin/activate
	python -m uvicorn main:app --host 0.0.0.0 --port 7860
	```

	Open http://localhost:7860


	### Web app (Docker)

	```bash
	docker-compose up # serves on port 7860
	```

	---

	## Data collection

	```bash
	python -m models.collect_features --name <participant>
	```

	Records webcam sessions with real-time binary labelling (spacebar toggles focused/unfocused). Saves per-frame feature vectors to `data/collected_<participant>/` as `.npz` files. Raw video is never stored.

	9 participants recorded 5-10 min sessions across varied environments (144,793 frames total, 61.5% focused / 38.5% unfocused). All participants provided informed consent. Dataset files are not included in this repository.

	Consent document: https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link
	Raw participant dataset is excluded from this submission (coursework policy and privacy constraints). It can be shared with module staff on request: https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing

	---

	## Pipeline

	```
	Webcam frame
	--> MediaPipe Face Mesh (478 landmarks)
	--> Head pose (solvePnP): yaw, pitch, roll, s_face, head_deviation
	--> Eye scorer: EAR_left, EAR_right, EAR_avg, s_eye, MAR
	--> Gaze ratios: h_gaze, v_gaze, gaze_offset
	--> Temporal tracker: PERCLOS, blink_rate, closure_dur, yawn_dur
	--> 17 features --> select 10 --> clip to physiological bounds
	--> ML model (MLP / XGBoost) or geometric scorer
	--> Asymmetric EMA smoothing (alpha_up=0.55, alpha_down=0.45)
	--> FOCUSED / UNFOCUSED
	```

	Five runtime modes share the same feature extraction backbone:

	\| Mode \| Description \|
	\|------\|-------------\|
	\| Geometric \| Deterministic scoring: 0.7 * s_face + 0.3 * s_eye, cosine-decay with max_angle=22 deg \|
	\| XGBoost \| 600-tree gradient-boosted ensemble, threshold 0.28 (LOPO-optimal) \|
	\| MLP \| PyTorch 10-64-32-2 perceptron, threshold 0.23 (LOPO-optimal) \|
	\| Hybrid \| 30% MLP + 70% geometric ensemble (LOPO F1 = 0.841) \|
	\| L2CS \| Deep gaze estimation via L2CS-Net (ResNet50, Gaze360 pretrained) \|

	Any mode can be combined with L2CS Boost mode (35% base + 65% L2CS, fused threshold 0.52). Off-screen gaze produces near-zero L2CS score via cosine decay, acting as a soft veto.

	---

	## Training

	Both scripts read all hyperparameters from `config/default.yaml`.

	```bash
	python -m models.mlp.train
	python -m models.xgboost.train
	```

	Outputs: `checkpoints/` (model + scaler) and `evaluation/logs/` (CSVs, JSON summaries).

	### ClearML experiment tracking

	```bash
	USE_CLEARML=1 python -m models.mlp.train
	USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
	USE_CLEARML=1 python -m evaluation.justify_thresholds --clearml
	```

	Logs hyperparameters, per-epoch scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts (config YAML, requirements.txt, git SHA).

	Reference experiment IDs:

	\| Model \| ClearML experiment ID \|
	\|-------\|------------------------\|
	\| MLP (`models.mlp.train`) \| `3899b5aa0c3348b28213a3194322cdf7` \|
	\| XGBoost (`models.xgboost.train`) \| `c0ceb8e7e8194a51a7a31078cc47775c` \|

	---

	## Evaluation

	```bash
	python -m evaluation.justify_thresholds # LOPO threshold + weight search
	python -m evaluation.grouped_split_benchmark # pooled vs LOPO comparison
	python -m evaluation.feature_importance # XGBoost gain + LOFO ablation
	```

	### Results (pooled random split, 15% test)

	\| Model \| Accuracy \| F1 \| ROC-AUC \|
	\|-------\|----------\|----\|---------\|
	\| XGBoost (600 trees, depth 8) \| 95.87% \| 0.959 \| 0.991 \|
	\| MLP (64-32) \| 92.92% \| 0.929 \| 0.971 \|

	### Results (LOPO, 9 participants)

	\| Model \| LOPO AUC \| Best threshold (Youden's J) \| F1 at best threshold \|
	\|-------\|----------\|-----------------------------\|----------------------\|
	\| MLP \| 0.862 \| 0.228 \| 0.858 \|
	\| XGBoost \| 0.870 \| 0.280 \| 0.855 \|

	Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.820).
	Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.841).

	The ~12 pp drop from pooled to LOPO reflects temporal data leakage and confirms LOPO as the primary generalisation metric.

	### Feature ablation

	\| Channel subset \| Mean LOPO F1 \|
	\|----------------\|-------------\|
	\| All 10 features \| 0.829 \|
	\| Eye state only \| 0.807 \|
	\| Head pose only \| 0.748 \|
	\| Gaze only \| 0.726 \|

	Top-5 XGBoost gain: `s_face` (10.27), `ear_right` (9.54), `head_deviation` (8.83), `ear_avg` (6.96), `perclos` (5.68).

	---

	## L2CS Gaze Tracking

	L2CS-Net predicts where your eyes are looking, not just where your head is pointed, catching the scenario where the head faces the screen but eyes wander.

	Standalone mode: Select L2CS as the model.

	Boost mode: Select any other model, then enable the GAZE toggle. L2CS runs alongside the base model with score-level fusion (35% base / 65% L2CS). Off-screen gaze triggers a soft veto.

	Calibration: Click Calibrate during a session. A fullscreen overlay shows 9 target dots (3x3 grid). After all 9 points, a degree-2 polynomial maps gaze angles to screen coordinates with IQR outlier filtering and centre-point bias correction.

	L2CS weight lookup order in runtime:

	1. `checkpoints/L2CSNet_gaze360.pkl`
	2. `models/L2CS-Net/models/L2CSNet_gaze360.pkl`
	3. `models/L2CSNet_gaze360.pkl`

	---

	## Config

	All hyperparameters and app settings are in `config/default.yaml`. Override with `FOCUSGUARD_CONFIG=/path/to/custom.yaml`.

	---

	## Tests

	Included checks:

	- data prep helpers and real split consistency (`test_data_preparation.py`; split test skips if `data/collected_/.npz` is absent)
	- feature clipping (`test_models_clip_features.py`)
	- pipeline integration (`test_pipeline_integration.py`)
	- gaze calibration / fusion diagnostics (`test_gaze_pipeline.py`)
	- FastAPI health, settings, sessions (`test_health_endpoint.py`, `test_api_settings.py`, `test_api_sessions.py`)

	```bash
	pytest
	```

	Coverage is enabled by default via `pytest.ini` (`--cov` / term report). For HTML coverage: `pytest --cov-report=html`.

	Stack: Python, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net, FastAPI, React/Vite, SQLite, Docker, ClearML, pytest.