final / README.md
k22056537
chore: add HF Space docker metadata
0da4f80
---
title: FocusGuard
sdk: docker
app_port: 7860
---
# FocusGuard
Webcam-based focus detection: MediaPipe face mesh -> 17 features (EAR, gaze, head pose, PERCLOS, etc.) -> MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video.
## Project layout
```
β”œβ”€β”€ data/ collected_<name>/*.npz
β”œβ”€β”€ data_preparation/ loaders, split, scale
β”œβ”€β”€ notebooks/ MLP/XGB training + LOPO
β”œβ”€β”€ models/ face_mesh, head_pose, eye_scorer, train scripts
β”‚ β”œβ”€β”€ gaze_calibration.py 9-point polynomial gaze calibration
β”‚ β”œβ”€β”€ gaze_eye_fusion.py Fuses calibrated gaze with eye openness
β”‚ └── L2CS-Net/ In-tree L2CS-Net repo with Gaze360 weights
β”œβ”€β”€ checkpoints/ mlp_best.pt, xgboost_*_best.json, scalers
β”œβ”€β”€ evaluation/ logs, plots, justify_thresholds
β”œβ”€β”€ ui/ pipeline.py, live_demo.py
β”œβ”€β”€ src/ React frontend
β”‚ β”œβ”€β”€ components/
β”‚ β”‚ β”œβ”€β”€ FocusPageLocal.jsx Main focus page (camera, controls, model selector)
β”‚ β”‚ └── CalibrationOverlay.jsx Fullscreen calibration UI
β”‚ └── utils/
β”‚ └── VideoManagerLocal.js WebSocket client, frame capture, canvas rendering
β”œβ”€β”€ static/ built frontend (after npm run build)
β”œβ”€β”€ main.py, app.py FastAPI backend
β”œβ”€β”€ requirements.txt
└── package.json
```
## Setup
```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
To rebuild the frontend after changes:
```bash
npm install
npm run build
mkdir -p static && cp -r dist/* static/
```
## Run
**Web app:** Use the venv and run uvicorn via Python so it picks up your deps (otherwise you get `ModuleNotFoundError: aiosqlite`):
```bash
source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860
```
Then open http://localhost:7860.
**Frontend dev server (optional, for React development):**
```bash
npm run dev
```
**OpenCV demo:**
```bash
python ui/live_demo.py
python ui/live_demo.py --xgb
```
**Train:**
```bash
python -m models.mlp.train
python -m models.xgboost.train
```
## Data
9 participants, 144,793 samples, 10 features, binary labels. Collect with `python -m models.collect_features --name <name>`. Data lives in `data/collected_<name>/`.
## Models
| Model | What it uses | Best for |
|-------|-------------|----------|
| **Geometric** | Head pose angles + eye aspect ratio (EAR) | Fast, no ML needed |
| **XGBoost** | Trained classifier on head/eye features (600 trees, depth 8) | Balanced accuracy/speed |
| **MLP** | Neural network on same features (64->32) | Higher accuracy |
| **Hybrid** | Weighted MLP + Geometric ensemble | Best head-pose accuracy |
| **L2CS** | Deep gaze estimation (ResNet50, Gaze360 weights) | Detects eye-only gaze shifts |
## Model numbers (15% test split)
| Model | Accuracy | F1 | ROC-AUC |
|-------|----------|-----|---------|
| XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
| MLP (64->32) | 92.92% | 0.929 | 0.971 |
## L2CS Gaze Tracking
L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander.
### Standalone mode
Select **L2CS** as the model - it handles everything.
### Boost mode
Select any other model, then click the **GAZE** toggle. L2CS runs alongside the base model:
- Base model handles head pose and eye openness (35% weight)
- L2CS handles gaze direction (65% weight)
- If L2CS detects gaze is clearly off-screen, it **vetoes** the base model regardless of score
### Calibration
After enabling L2CS or Gaze Boost, click **Calibrate** while a session is running:
1. A fullscreen overlay shows 9 target dots (3x3 grid)
2. Look at each dot as the progress ring fills
3. The first dot (centre) sets your baseline gaze offset
4. After all 9 points, a polynomial model maps your gaze angles to screen coordinates
5. A cyan tracking dot appears on the video showing where you're looking
## Pipeline
1. Face mesh (MediaPipe 478 pts)
2. Head pose -> yaw, pitch, roll, scores, gaze offset
3. Eye scorer -> EAR, gaze ratio, MAR
4. Temporal -> PERCLOS, blink rate, yawn
5. 10-d vector -> MLP or XGBoost -> focused / unfocused
**Stack:** FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net.