Spaces:
Sleeping
Sleeping
| title: Focus Guard Final v2 | |
| emoji: π― | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| short_description: "Focus detection β MediaPipe, MLP/XGB, L2CS, FastAPI" | |
| # FocusGuard | |
| Webcam-based focus detection: MediaPipe face mesh β 17 features (EAR, gaze, head pose, PERCLOS, etc.) β MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video. | |
| **Repository:** [KCL GAP project](https://github.kcl.ac.uk) (internal) β adjust link if you publish a public mirror. | |
| ## Project layout | |
| ``` | |
| βββ data/ collected_<name>/*.npz | |
| βββ data_preparation/ loaders, split, scale | |
| βββ notebooks/ MLP/XGB training + LOPO | |
| βββ models/ face_mesh, head_pose, eye_scorer, train scripts | |
| β βββ gaze_calibration.py 9-point polynomial gaze calibration | |
| β βββ gaze_eye_fusion.py Fuses calibrated gaze with eye openness | |
| β βββ L2CS-Net/ In-tree L2CS-Net repo with Gaze360 weights | |
| βββ checkpoints/ mlp_best.pt, xgboost_*_best.json, scalers | |
| βββ evaluation/ logs, plots, justify_thresholds | |
| βββ ui/ pipeline.py, live_demo.py | |
| βββ src/ React frontend | |
| β βββ components/ | |
| β β βββ FocusPageLocal.jsx Main focus page (camera, controls, model selector) | |
| β β βββ CalibrationOverlay.jsx Fullscreen calibration UI | |
| β βββ utils/ | |
| β βββ VideoManagerLocal.js WebSocket client, frame capture, canvas rendering | |
| βββ static/ built frontend (after npm run build) | |
| βββ main.py, app.py FastAPI backend | |
| βββ requirements.txt | |
| βββ package.json | |
| ``` | |
| ## Config | |
| Hyperparameters and app settings live in `config/default.yaml` (learning rates, batch size, thresholds, L2CS weights, etc.). Override with env `FOCUSGUARD_CONFIG` pointing to another YAML. | |
| ## Setup | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| To rebuild the frontend after changes: | |
| ```bash | |
| npm install | |
| npm run build | |
| mkdir -p static && cp -r dist/* static/ | |
| ``` | |
| ## Run | |
| **Web app:** Use the venv and run uvicorn via Python so it picks up your deps (otherwise you get `ModuleNotFoundError: aiosqlite`): | |
| ```bash | |
| source venv/bin/activate | |
| python -m uvicorn main:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| Then open http://localhost:7860. | |
| **Frontend dev server (optional, for React development):** | |
| ```bash | |
| npm run dev | |
| ``` | |
| **OpenCV demo:** | |
| ```bash | |
| python ui/live_demo.py | |
| python ui/live_demo.py --xgb | |
| ``` | |
| **Train:** | |
| ```bash | |
| python -m models.mlp.train | |
| python -m models.xgboost.train | |
| ``` | |
| ### ClearML experiment tracking | |
| All training and evaluation config (from `config/default.yaml`) is exposed as ClearML task parameters. Enable logging with `USE_CLEARML=1`; optionally run on a **remote GPU agent** instead of locally: | |
| ```bash | |
| USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.mlp.train | |
| USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train | |
| USE_CLEARML=1 CLEARML_QUEUE=gpu python -m evaluation.justify_thresholds --clearml | |
| ``` | |
| The script enqueues the task and exits; a `clearml-agent` listening on the named queue (e.g. `gpu`) runs the same command with the same parameters. Start an agent with: | |
| ```bash | |
| clearml-agent daemon --queue gpu | |
| ``` | |
| Logged to ClearML: **parameters** (full flattened config), **scalars** (loss, accuracy, F1, ROC-AUC, per-class precision/recall/F1, dataset sizes and class counts), **artifacts** (best checkpoint, training log JSON), and **plots** (confusion matrix, ROC curves in evaluation). | |
| ## Data | |
| 9 participants, 144,793 samples, 10 features, binary labels. Collect with `python -m models.collect_features --name <name>`. Data lives in `data/collected_<name>/`. | |
| **Train/val/test split:** All pooled training and evaluation use the same split for reproducibility. The test set is held out before any preprocessing; `StandardScaler` is fit on the training set only, then applied to val and test. Split ratios and random seed come from `config/default.yaml` (`data.split_ratios`, `mlp.seed`) via `data_preparation.prepare_dataset.get_default_split_config()`. MLP train, XGBoost train, eval_accuracy scripts, and benchmarks all use this single source so reported test accuracy is on the same held-out set. | |
| ## Models | |
| | Model | What it uses | Best for | | |
| |-------|-------------|----------| | |
| | **Geometric** | Head pose angles + eye aspect ratio (EAR) | Fast, no ML needed | | |
| | **XGBoost** | Trained classifier on head/eye features (600 trees, depth 8) | Balanced accuracy/speed | | |
| | **MLP** | Neural network on same features (64β32) | Higher accuracy | | |
| | **Hybrid** | Weighted MLP + Geometric ensemble | Best head-pose accuracy | | |
| | **L2CS** | Deep gaze estimation (ResNet50, Gaze360 weights) | Detects eye-only gaze shifts | | |
| ## Model numbers (15% test split) | |
| | Model | Accuracy | F1 | ROC-AUC | | |
| |-------|----------|-----|---------| | |
| | XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 | | |
| | MLP (64β32) | 92.92% | 0.929 | 0.971 | | |
| ## Model numbers (LOPO, 9 participants) | |
| | Model | LOPO AUC | Best threshold (Youden's J) | F1 @ best threshold | F1 @ 0.50 | | |
| |-------|----------|------------------------------|---------------------|------------| | |
| | MLP | 0.8624 | 0.228 | 0.8578 | 0.8149 | | |
| | XGBoost | 0.8695 | 0.280 | 0.8549 | 0.8324 | | |
| From the latest `python -m evaluation.justify_thresholds` run: | |
| - Best geometric face weight (`alpha`) = `0.7` (mean LOPO F1 = `0.8195`) | |
| - Best hybrid MLP weight (`w_mlp`) = `0.3` (mean LOPO F1 = `0.8409`) | |
| ## Grouped vs pooled benchmark | |
| Latest quick benchmark (`python -m evaluation.grouped_split_benchmark --quick`) shows the expected gap between pooled random split and person-held-out LOPO: | |
| | Protocol | Accuracy | F1 (weighted) | ROC-AUC | | |
| |----------|---------:|--------------:|--------:| | |
| | Pooled random split | 0.9510 | 0.9507 | 0.9869 | | |
| | Grouped LOPO (9 folds) | 0.8303 | 0.8304 | 0.8801 | | |
| This is why LOPO is the primary generalisation metric for reporting. | |
| ## Feature ablation snapshot | |
| Latest quick feature-selection run (`python -m evaluation.feature_importance --quick --skip-lofo`): | |
| | Subset | Mean LOPO F1 | | |
| |--------|--------------| | |
| | all_10 | 0.8286 | | |
| | eye_state | 0.8071 | | |
| | head_pose | 0.7480 | | |
| | gaze | 0.7260 | | |
| Top-5 XGBoost gain features: `s_face`, `ear_right`, `head_deviation`, `ear_avg`, `perclos`. | |
| For full leave-one-feature-out ablation, run `python -m evaluation.feature_importance` (slower). | |
| ## L2CS Gaze Tracking | |
| L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander. | |
| ### Standalone mode | |
| Select **L2CS** as the model β it handles everything. | |
| ### Boost mode | |
| Select any other model, then click the **GAZE** toggle. L2CS runs alongside the base model: | |
| - Base model handles head pose and eye openness (35% weight) | |
| - L2CS handles gaze direction (65% weight) | |
| - If L2CS detects gaze is clearly off-screen, it **vetoes** the base model regardless of score | |
| ### Calibration | |
| After enabling L2CS or Gaze Boost, click **Calibrate** while a session is running: | |
| 1. A fullscreen overlay shows 9 target dots (3Γ3 grid) | |
| 2. Look at each dot as the progress ring fills | |
| 3. The first dot (centre) sets your baseline gaze offset | |
| 4. After all 9 points, a polynomial model maps your gaze angles to screen coordinates | |
| 5. A cyan tracking dot appears on the video showing where you're looking | |
| ## Pipeline | |
| 1. Face mesh (MediaPipe 478 pts) | |
| 2. Head pose β yaw, pitch, roll, scores, gaze offset | |
| 3. Eye scorer β EAR, gaze ratio, MAR | |
| 4. Temporal β PERCLOS, blink rate, yawn | |
| 5. 10-d vector β MLP or XGBoost β focused / unfocused | |
| **Stack:** FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net. | |