Spaces:
Running
title: FocusGuard
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
FocusGuard
Webcam-based focus detection: MediaPipe face mesh -> 17 features (EAR, gaze, head pose, PERCLOS, etc.) -> MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video.
Project layout
βββ data/ collected_<name>/*.npz
βββ data_preparation/ loaders, split, scale
βββ notebooks/ MLP/XGB training + LOPO
βββ models/ face_mesh, head_pose, eye_scorer, train scripts
β βββ gaze_calibration.py 9-point polynomial gaze calibration
β βββ gaze_eye_fusion.py Fuses calibrated gaze with eye openness
β βββ L2CS-Net/ In-tree L2CS-Net repo with Gaze360 weights
βββ checkpoints/ mlp_best.pt, xgboost_*_best.json, scalers
βββ evaluation/ logs, plots, justify_thresholds
βββ ui/ pipeline.py, live_demo.py
βββ src/ React frontend
β βββ components/
β β βββ FocusPageLocal.jsx Main focus page (camera, controls, model selector)
β β βββ CalibrationOverlay.jsx Fullscreen calibration UI
β βββ utils/
β βββ VideoManagerLocal.js WebSocket client, frame capture, canvas rendering
βββ static/ built frontend (after npm run build)
βββ main.py, app.py FastAPI backend
βββ requirements.txt
βββ package.json
Setup
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
To rebuild the frontend after changes:
npm install
npm run build
mkdir -p static && cp -r dist/* static/
Run
Web app: Use the venv and run uvicorn via Python so it picks up your deps (otherwise you get ModuleNotFoundError: aiosqlite):
source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860
Then open http://localhost:7860.
Frontend dev server (optional, for React development):
npm run dev
OpenCV demo:
python ui/live_demo.py
python ui/live_demo.py --xgb
Train:
python -m models.mlp.train
python -m models.xgboost.train
Data
9 participants, 144,793 samples, 10 features, binary labels. Collect with python -m models.collect_features --name <name>. Data lives in data/collected_<name>/.
Models
| Model | What it uses | Best for |
|---|---|---|
| Geometric | Head pose angles + eye aspect ratio (EAR) | Fast, no ML needed |
| XGBoost | Trained classifier on head/eye features (600 trees, depth 8) | Balanced accuracy/speed |
| MLP | Neural network on same features (64->32) | Higher accuracy |
| Hybrid | Weighted MLP + Geometric ensemble | Best head-pose accuracy |
| L2CS | Deep gaze estimation (ResNet50, Gaze360 weights) | Detects eye-only gaze shifts |
Model numbers (15% test split)
| Model | Accuracy | F1 | ROC-AUC |
|---|---|---|---|
| XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
| MLP (64->32) | 92.92% | 0.929 | 0.971 |
L2CS Gaze Tracking
L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander.
Standalone mode
Select L2CS as the model - it handles everything.
Boost mode
Select any other model, then click the GAZE toggle. L2CS runs alongside the base model:
- Base model handles head pose and eye openness (35% weight)
- L2CS handles gaze direction (65% weight)
- If L2CS detects gaze is clearly off-screen, it vetoes the base model regardless of score
Calibration
After enabling L2CS or Gaze Boost, click Calibrate while a session is running:
- A fullscreen overlay shows 9 target dots (3x3 grid)
- Look at each dot as the progress ring fills
- The first dot (centre) sets your baseline gaze offset
- After all 9 points, a polynomial model maps your gaze angles to screen coordinates
- A cyan tracking dot appears on the video showing where you're looking
Pipeline
- Face mesh (MediaPipe 478 pts)
- Head pose -> yaw, pitch, roll, scores, gaze offset
- Eye scorer -> EAR, gaze ratio, MAR
- Temporal -> PERCLOS, blink rate, yawn
- 10-d vector -> MLP or XGBoost -> focused / unfocused
Stack: FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net.