Spaces:

FocusGuard
/

test_final

Sleeping

App Files Files Community

test_final / README.md

Abdelrahman Almatrooshi

fix: quote HF short_description for valid YAML

2e034be about 1 month ago

preview code

raw

history blame contribute delete

7.84 kB

	---
	title: Focus Guard Final v2
	emoji: 🎯
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	app_port: 7860
	pinned: false
	short_description: "Focus detection — MediaPipe, MLP/XGB, L2CS, FastAPI"
	---

	# FocusGuard

	Webcam-based focus detection: MediaPipe face mesh → 17 features (EAR, gaze, head pose, PERCLOS, etc.) → MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video.

	Repository: [KCL GAP project](https://github.kcl.ac.uk) (internal) — adjust link if you publish a public mirror.

	## Project layout

	```
	├── data/ collected_<name>/*.npz
	├── data_preparation/ loaders, split, scale
	├── notebooks/ MLP/XGB training + LOPO
	├── models/ face_mesh, head_pose, eye_scorer, train scripts
	│ ├── gaze_calibration.py 9-point polynomial gaze calibration
	│ ├── gaze_eye_fusion.py Fuses calibrated gaze with eye openness
	│ └── L2CS-Net/ In-tree L2CS-Net repo with Gaze360 weights
	├── checkpoints/ mlp_best.pt, xgboost_*_best.json, scalers
	├── evaluation/ logs, plots, justify_thresholds
	├── ui/ pipeline.py, live_demo.py
	├── src/ React frontend
	│ ├── components/
	│ │ ├── FocusPageLocal.jsx Main focus page (camera, controls, model selector)
	│ │ └── CalibrationOverlay.jsx Fullscreen calibration UI
	│ └── utils/
	│ └── VideoManagerLocal.js WebSocket client, frame capture, canvas rendering
	├── static/ built frontend (after npm run build)
	├── main.py, app.py FastAPI backend
	├── requirements.txt
	└── package.json
	```

	## Config

	Hyperparameters and app settings live in `config/default.yaml` (learning rates, batch size, thresholds, L2CS weights, etc.). Override with env `FOCUSGUARD_CONFIG` pointing to another YAML.

	## Setup

	```bash
	python -m venv venv
	source venv/bin/activate
	pip install -r requirements.txt
	```

	To rebuild the frontend after changes:

	```bash
	npm install
	npm run build
	mkdir -p static && cp -r dist/* static/
	```

	## Run

	Web app: Use the venv and run uvicorn via Python so it picks up your deps (otherwise you get `ModuleNotFoundError: aiosqlite`):

	```bash
	source venv/bin/activate
	python -m uvicorn main:app --host 0.0.0.0 --port 7860
	```

	Then open http://localhost:7860.

	Frontend dev server (optional, for React development):

	```bash
	npm run dev
	```

	OpenCV demo:

	```bash
	python ui/live_demo.py
	python ui/live_demo.py --xgb
	```

	Train:

	```bash
	python -m models.mlp.train
	python -m models.xgboost.train
	```

	### ClearML experiment tracking

	All training and evaluation config (from `config/default.yaml`) is exposed as ClearML task parameters. Enable logging with `USE_CLEARML=1`; optionally run on a remote GPU agent instead of locally:

	```bash
	USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.mlp.train
	USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
	USE_CLEARML=1 CLEARML_QUEUE=gpu python -m evaluation.justify_thresholds --clearml
	```

	The script enqueues the task and exits; a `clearml-agent` listening on the named queue (e.g. `gpu`) runs the same command with the same parameters. Start an agent with:

	```bash
	clearml-agent daemon --queue gpu
	```

	Logged to ClearML: parameters (full flattened config), scalars (loss, accuracy, F1, ROC-AUC, per-class precision/recall/F1, dataset sizes and class counts), artifacts (best checkpoint, training log JSON), and plots (confusion matrix, ROC curves in evaluation).

	## Data

	9 participants, 144,793 samples, 10 features, binary labels. Collect with `python -m models.collect_features --name <name>`. Data lives in `data/collected_<name>/`.

	Train/val/test split: All pooled training and evaluation use the same split for reproducibility. The test set is held out before any preprocessing; `StandardScaler` is fit on the training set only, then applied to val and test. Split ratios and random seed come from `config/default.yaml` (`data.split_ratios`, `mlp.seed`) via `data_preparation.prepare_dataset.get_default_split_config()`. MLP train, XGBoost train, eval_accuracy scripts, and benchmarks all use this single source so reported test accuracy is on the same held-out set.

	## Models

	\| Model \| What it uses \| Best for \|
	\|-------\|-------------\|----------\|
	\| Geometric \| Head pose angles + eye aspect ratio (EAR) \| Fast, no ML needed \|
	\| XGBoost \| Trained classifier on head/eye features (600 trees, depth 8) \| Balanced accuracy/speed \|
	\| MLP \| Neural network on same features (64→32) \| Higher accuracy \|
	\| Hybrid \| Weighted MLP + Geometric ensemble \| Best head-pose accuracy \|
	\| L2CS \| Deep gaze estimation (ResNet50, Gaze360 weights) \| Detects eye-only gaze shifts \|

	## Model numbers (15% test split)

	\| Model \| Accuracy \| F1 \| ROC-AUC \|
	\|-------\|----------\|-----\|---------\|
	\| XGBoost (600 trees, depth 8) \| 95.87% \| 0.959 \| 0.991 \|
	\| MLP (64→32) \| 92.92% \| 0.929 \| 0.971 \|

	## Model numbers (LOPO, 9 participants)

	\| Model \| LOPO AUC \| Best threshold (Youden's J) \| F1 @ best threshold \| F1 @ 0.50 \|
	\|-------\|----------\|------------------------------\|---------------------\|------------\|
	\| MLP \| 0.8624 \| 0.228 \| 0.8578 \| 0.8149 \|
	\| XGBoost \| 0.8695 \| 0.280 \| 0.8549 \| 0.8324 \|

	From the latest `python -m evaluation.justify_thresholds` run:

	- Best geometric face weight (`alpha`) = `0.7` (mean LOPO F1 = `0.8195`)
	- Best hybrid MLP weight (`w_mlp`) = `0.3` (mean LOPO F1 = `0.8409`)

	## Grouped vs pooled benchmark

	Latest quick benchmark (`python -m evaluation.grouped_split_benchmark --quick`) shows the expected gap between pooled random split and person-held-out LOPO:

	\| Protocol \| Accuracy \| F1 (weighted) \| ROC-AUC \|
	\|----------\|---------:\|--------------:\|--------:\|
	\| Pooled random split \| 0.9510 \| 0.9507 \| 0.9869 \|
	\| Grouped LOPO (9 folds) \| 0.8303 \| 0.8304 \| 0.8801 \|

	This is why LOPO is the primary generalisation metric for reporting.

	## Feature ablation snapshot

	Latest quick feature-selection run (`python -m evaluation.feature_importance --quick --skip-lofo`):

	\| Subset \| Mean LOPO F1 \|
	\|--------\|--------------\|
	\| all_10 \| 0.8286 \|
	\| eye_state \| 0.8071 \|
	\| head_pose \| 0.7480 \|
	\| gaze \| 0.7260 \|

	Top-5 XGBoost gain features: `s_face`, `ear_right`, `head_deviation`, `ear_avg`, `perclos`.
	For full leave-one-feature-out ablation, run `python -m evaluation.feature_importance` (slower).

	## L2CS Gaze Tracking

	L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander.

	### Standalone mode
	Select L2CS as the model — it handles everything.

	### Boost mode
	Select any other model, then click the GAZE toggle. L2CS runs alongside the base model:

	- Base model handles head pose and eye openness (35% weight)
	- L2CS handles gaze direction (65% weight)
	- If L2CS detects gaze is clearly off-screen, it vetoes the base model regardless of score

	### Calibration
	After enabling L2CS or Gaze Boost, click Calibrate while a session is running:

	1. A fullscreen overlay shows 9 target dots (3×3 grid)
	2. Look at each dot as the progress ring fills
	3. The first dot (centre) sets your baseline gaze offset
	4. After all 9 points, a polynomial model maps your gaze angles to screen coordinates
	5. A cyan tracking dot appears on the video showing where you're looking

	## Pipeline

	1. Face mesh (MediaPipe 478 pts)
	2. Head pose → yaw, pitch, roll, scores, gaze offset
	3. Eye scorer → EAR, gaze ratio, MAR
	4. Temporal → PERCLOS, blink rate, yawn
	5. 10-d vector → MLP or XGBoost → focused / unfocused

	Stack: FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net.