--- title: FocusGuard colorFrom: indigo colorTo: purple sdk: docker pinned: false --- # FocusGuard Webcam-based focus detection: MediaPipe face mesh -> 17 features (EAR, gaze, head pose, PERCLOS, etc.) -> MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video. ## Project layout ``` ├── data/ collected_/*.npz ├── data_preparation/ loaders, split, scale ├── notebooks/ MLP/XGB training + LOPO ├── models/ face_mesh, head_pose, eye_scorer, train scripts │ ├── gaze_calibration.py 9-point polynomial gaze calibration │ ├── gaze_eye_fusion.py Fuses calibrated gaze with eye openness │ └── L2CS-Net/ In-tree L2CS-Net repo with Gaze360 weights ├── checkpoints/ mlp_best.pt, xgboost_*_best.json, scalers ├── evaluation/ logs, plots, justify_thresholds ├── ui/ pipeline.py, live_demo.py ├── src/ React frontend │ ├── components/ │ │ ├── FocusPageLocal.jsx Main focus page (camera, controls, model selector) │ │ └── CalibrationOverlay.jsx Fullscreen calibration UI │ └── utils/ │ └── VideoManagerLocal.js WebSocket client, frame capture, canvas rendering ├── static/ built frontend (after npm run build) ├── main.py, app.py FastAPI backend ├── requirements.txt └── package.json ``` ## Setup ```bash python -m venv venv source venv/bin/activate pip install -r requirements.txt ``` To rebuild the frontend after changes: ```bash npm install npm run build mkdir -p static && cp -r dist/* static/ ``` ## Run **Web app:** Use the venv and run uvicorn via Python so it picks up your deps (otherwise you get `ModuleNotFoundError: aiosqlite`): ```bash source venv/bin/activate python -m uvicorn main:app --host 0.0.0.0 --port 7860 ``` Then open http://localhost:7860. **Frontend dev server (optional, for React development):** ```bash npm run dev ``` **OpenCV demo:** ```bash python ui/live_demo.py python ui/live_demo.py --xgb ``` **Train:** ```bash python -m models.mlp.train python -m models.xgboost.train ``` ## Data 9 participants, 144,793 samples, 10 features, binary labels. Collect with `python -m models.collect_features --name `. Data lives in `data/collected_/`. ## Models | Model | What it uses | Best for | |-------|-------------|----------| | **Geometric** | Head pose angles + eye aspect ratio (EAR) | Fast, no ML needed | | **XGBoost** | Trained classifier on head/eye features (600 trees, depth 8) | Balanced accuracy/speed | | **MLP** | Neural network on same features (64->32) | Higher accuracy | | **Hybrid** | Weighted MLP + Geometric ensemble | Best head-pose accuracy | | **L2CS** | Deep gaze estimation (ResNet50, Gaze360 weights) | Detects eye-only gaze shifts | ## Model numbers (15% test split) | Model | Accuracy | F1 | ROC-AUC | |-------|----------|-----|---------| | XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 | | MLP (64->32) | 92.92% | 0.929 | 0.971 | ## L2CS Gaze Tracking L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander. ### Standalone mode Select **L2CS** as the model - it handles everything. ### Boost mode Select any other model, then click the **GAZE** toggle. L2CS runs alongside the base model: - Base model handles head pose and eye openness (35% weight) - L2CS handles gaze direction (65% weight) - If L2CS detects gaze is clearly off-screen, it **vetoes** the base model regardless of score ### Calibration After enabling L2CS or Gaze Boost, click **Calibrate** while a session is running: 1. A fullscreen overlay shows 9 target dots (3x3 grid) 2. Look at each dot as the progress ring fills 3. The first dot (centre) sets your baseline gaze offset 4. After all 9 points, a polynomial model maps your gaze angles to screen coordinates 5. A cyan tracking dot appears on the video showing where you're looking ## Pipeline 1. Face mesh (MediaPipe 478 pts) 2. Head pose -> yaw, pitch, roll, scores, gaze offset 3. Eye scorer -> EAR, gaze ratio, MAR 4. Temporal -> PERCLOS, blink rate, yawn 5. 10-d vector -> MLP or XGBoost -> focused / unfocused **Stack:** FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net.