Spaces:

FocusGuard
/

focus_Guard_test

Sleeping

Abdelrahman Almatrooshi commited on Mar 17

Commit

bb2a2db

1 Parent(s): 6d9eb2d

Integrate L2CS-Net gaze estimation

- Add L2CS-Net in-tree (models/L2CS-Net/) with Gaze360 weights via Git LFS
- L2CSPipeline: ResNet50 gaze + MediaPipe head pose, roll de-rotation, cosine scoring
- 9-point polynomial gaze calibration with bias correction and IQR outlier filtering
- Gaze-eye fusion: calibrated screen coords + EAR for focus detection
- L2CS Boost mode: runs gaze alongside any base model (35/65 weight, veto at 0.38)
- Calibration UI: fullscreen overlay, auto-advance, progress ring
- Frontend: GAZE toggle, Calibrate button, gaze pointer dot on canvas
- Bumped capture resolution to 640x480 @ JPEG 0.75
- Dockerfile: added git, CPU-only torch for HF Space deployment

Files changed (29) hide show

Dockerfile +8 -1
README.md +46 -7
checkpoints/L2CSNet_gaze360.pkl +3 -0
download_l2cs_weights.py +37 -0
main.py +338 -37
models/L2CS-Net/.gitignore +140 -0
models/L2CS-Net/LICENSE +21 -0
models/L2CS-Net/README.md +148 -0
models/L2CS-Net/demo.py +87 -0
models/L2CS-Net/l2cs/__init__.py +21 -0
models/L2CS-Net/l2cs/datasets.py +157 -0
models/L2CS-Net/l2cs/model.py +73 -0
models/L2CS-Net/l2cs/pipeline.py +133 -0
models/L2CS-Net/l2cs/results.py +11 -0
models/L2CS-Net/l2cs/utils.py +145 -0
models/L2CS-Net/l2cs/vis.py +64 -0
models/L2CS-Net/leave_one_out_eval.py +54 -0
models/L2CS-Net/models/L2CSNet_gaze360.pkl +3 -0
models/L2CS-Net/models/README.md +1 -0
models/L2CS-Net/pyproject.toml +44 -0
models/L2CS-Net/test.py +284 -0
models/L2CS-Net/train.py +384 -0
models/gaze_calibration.py +146 -0
models/gaze_eye_fusion.py +66 -0
requirements.txt +2 -0
src/components/CalibrationOverlay.jsx +146 -0
src/components/FocusPageLocal.jsx +140 -2
src/utils/VideoManagerLocal.js +97 -3
ui/pipeline.py +149 -5

Dockerfile CHANGED Viewed

@@ -7,7 +7,14 @@ ENV PYTHONUNBUFFERED=1
 WORKDIR /app
-RUN apt-get update && apt-get install -y --no-install-recommends libglib2.0-0 libsm6 libxrender1 libxext6 libxcb1 libgl1 libgomp1 ffmpeg libavcodec-dev libavformat-dev libavutil-dev libswscale-dev libavdevice-dev libopus-dev libvpx-dev libsrtp2-dev build-essential nodejs npm && rm -rf /var/lib/apt/lists/*
 COPY requirements.txt ./
 RUN pip install --no-cache-dir -r requirements.txt

 WORKDIR /app
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    libglib2.0-0 libsm6 libxrender1 libxext6 libxcb1 libgl1 libgomp1 \
+    ffmpeg libavcodec-dev libavformat-dev libavutil-dev libswscale-dev \
+    libavdevice-dev libopus-dev libvpx-dev libsrtp2-dev \
+    build-essential nodejs npm git \
+    && rm -rf /var/lib/apt/lists/*
+RUN pip install --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/cpu
 COPY requirements.txt ./
 RUN pip install --no-cache-dir -r requirements.txt

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # FocusGuard
-Webcam-based focus detection: MediaPipe face mesh → 17 features (EAR, gaze, head pose, PERCLOS, etc.) → MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video.
 ## Project layout
@@ -9,10 +9,18 @@ Webcam-based focus detection: MediaPipe face mesh → 17 features (EAR, gaze, he
 ├── data_preparation/     loaders, split, scale
 ├── notebooks/            MLP/XGB training + LOPO
 ├── models/               face_mesh, head_pose, eye_scorer, train scripts
 ├── checkpoints/          mlp_best.pt, xgboost_*_best.json, scalers
 ├── evaluation/           logs, plots, justify_thresholds
 ├── ui/                   pipeline.py, live_demo.py
 ├── src/                  React frontend
 ├── static/               built frontend (after npm run build)
 ├── main.py, app.py       FastAPI backend
 ├── requirements.txt
@@ -70,19 +78,50 @@ python -m models.xgboost.train
 9 participants, 144,793 samples, 10 features, binary labels. Collect with `python -m models.collect_features --name <name>`. Data lives in `data/collected_<name>/`.
 ## Model numbers (15% test split)
 | Model | Accuracy | F1 | ROC-AUC |
 |-------|----------|-----|---------|
 | XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
-| MLP (64→32) | 92.92% | 0.929 | 0.971 |
 ## Pipeline
 1. Face mesh (MediaPipe 478 pts)
-2. Head pose → yaw, pitch, roll, scores, gaze offset
-3. Eye scorer → EAR, gaze ratio, MAR
-4. Temporal → PERCLOS, blink rate, yawn
-5. 10-d vector → MLP or XGBoost → focused / unfocused
-**Stack:** FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV.

 # FocusGuard
+Webcam-based focus detection: MediaPipe face mesh -> 17 features (EAR, gaze, head pose, PERCLOS, etc.) -> MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video.
 ## Project layout
 ├── data_preparation/     loaders, split, scale
 ├── notebooks/            MLP/XGB training + LOPO
 ├── models/               face_mesh, head_pose, eye_scorer, train scripts
+│   ├── gaze_calibration.py   9-point polynomial gaze calibration
+│   ├── gaze_eye_fusion.py    Fuses calibrated gaze with eye openness
+│   └── L2CS-Net/              In-tree L2CS-Net repo with Gaze360 weights
 ├── checkpoints/          mlp_best.pt, xgboost_*_best.json, scalers
 ├── evaluation/           logs, plots, justify_thresholds
 ├── ui/                   pipeline.py, live_demo.py
 ├── src/                  React frontend
+│   ├── components/
+│   │   ├── FocusPageLocal.jsx      Main focus page (camera, controls, model selector)
+│   │   └── CalibrationOverlay.jsx  Fullscreen calibration UI
+│   └── utils/
+│       └── VideoManagerLocal.js    WebSocket client, frame capture, canvas rendering
 ├── static/               built frontend (after npm run build)
 ├── main.py, app.py       FastAPI backend
 ├── requirements.txt
 9 participants, 144,793 samples, 10 features, binary labels. Collect with `python -m models.collect_features --name <name>`. Data lives in `data/collected_<name>/`.
+## Models
+| Model | What it uses | Best for |
+|-------|-------------|----------|
+| **Geometric** | Head pose angles + eye aspect ratio (EAR) | Fast, no ML needed |
+| **XGBoost** | Trained classifier on head/eye features (600 trees, depth 8) | Balanced accuracy/speed |
+| **MLP** | Neural network on same features (64->32) | Higher accuracy |
+| **Hybrid** | Weighted MLP + Geometric ensemble | Best head-pose accuracy |
+| **L2CS** | Deep gaze estimation (ResNet50, Gaze360 weights) | Detects eye-only gaze shifts |
 ## Model numbers (15% test split)
 | Model | Accuracy | F1 | ROC-AUC |
 |-------|----------|-----|---------|
 | XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
+| MLP (64->32) | 92.92% | 0.929 | 0.971 |
+## L2CS Gaze Tracking
+L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander.
+### Standalone mode
+Select **L2CS** as the model - it handles everything.
+### Boost mode
+Select any other model, then click the **GAZE** toggle. L2CS runs alongside the base model:
+- Base model handles head pose and eye openness (35% weight)
+- L2CS handles gaze direction (65% weight)
+- If L2CS detects gaze is clearly off-screen, it **vetoes** the base model regardless of score
+### Calibration
+After enabling L2CS or Gaze Boost, click **Calibrate** while a session is running:
+1. A fullscreen overlay shows 9 target dots (3x3 grid)
+2. Look at each dot as the progress ring fills
+3. The first dot (centre) sets your baseline gaze offset
+4. After all 9 points, a polynomial model maps your gaze angles to screen coordinates
+5. A cyan tracking dot appears on the video showing where you're looking
 ## Pipeline
 1. Face mesh (MediaPipe 478 pts)
+2. Head pose -> yaw, pitch, roll, scores, gaze offset
+3. Eye scorer -> EAR, gaze ratio, MAR
+4. Temporal -> PERCLOS, blink rate, yawn
+5. 10-d vector -> MLP or XGBoost -> focused / unfocused
+**Stack:** FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net.

checkpoints/L2CSNet_gaze360.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8a7f3480d868dd48261e1d59f915b0ef0bb33ea12ea00938fb2168f212080665
+size 95849977

download_l2cs_weights.py ADDED Viewed

	@@ -0,0 +1,37 @@

+#!/usr/bin/env python3
+# Downloads L2CS-Net Gaze360 weights into checkpoints/
+import os
+import sys
+CHECKPOINTS_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "checkpoints")
+DEST = os.path.join(CHECKPOINTS_DIR, "L2CSNet_gaze360.pkl")
+GDRIVE_ID = "1dL2Jokb19_SBSHAhKHOxJsmYs5-GoyLo"
+def main():
+    if os.path.isfile(DEST):
+        print(f"[OK] Weights already at {DEST}")
+        return
+    try:
+        import gdown
+    except ImportError:
+        print("gdown not installed. Run: pip install gdown")
+        sys.exit(1)
+    os.makedirs(CHECKPOINTS_DIR, exist_ok=True)
+    print(f"Downloading L2CS-Net weights to {DEST} ...")
+    gdown.download(f"https://drive.google.com/uc?id={GDRIVE_ID}", DEST, quiet=False)
+    if os.path.isfile(DEST):
+        print(f"[OK] Downloaded ({os.path.getsize(DEST) / 1024 / 1024:.1f} MB)")
+    else:
+        print("[ERR] Download failed. Manual download:")
+        print("  https://drive.google.com/drive/folders/17p6ORr-JQJcw-eYtG2WGNiuS_qVKwdWd")
+        print(f"  Place L2CSNet_gaze360.pkl in {CHECKPOINTS_DIR}/")
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

main.py CHANGED Viewed

@@ -25,7 +25,10 @@ from aiortc import RTCPeerConnection, RTCSessionDescription, VideoStreamTrack
 from av import VideoFrame
 from mediapipe.tasks.python.vision import FaceLandmarksConnections
-from ui.pipeline import FaceMeshPipeline, MLPPipeline, HybridFocusPipeline, XGBoostPipeline
 from models.face_mesh import FaceMeshDetector
 # ================ FACE MESH DRAWING (server-side, for WebRTC) ================
@@ -212,17 +215,7 @@ app.add_middleware(
 db_path = "focus_guard.db"
 pcs = set()
 _cached_model_name = "mlp"
-pipelines = {
-    "geometric": None,
-    "mlp": None,
-    "hybrid": None,
-    "xgboost": None,
-}
-_inference_executor = concurrent.futures.ThreadPoolExecutor(
-    max_workers=4,
-    thread_name_prefix="inference",
-)
-_pipeline_locks = {name: threading.Lock() for name in ("geometric", "mlp", "hybrid", "xgboost")}
 async def _wait_for_ice_gathering(pc: RTCPeerConnection):
     if pc.iceGatheringState == "complete":
@@ -302,6 +295,7 @@ class SettingsUpdate(BaseModel):
     notification_threshold: Optional[int] = None
     frame_rate: Optional[int] = None
     model_name: Optional[str] = None
 class VideoTransformTrack(VideoStreamTrack):
     def __init__(self, track, session_id: int, get_channel: Callable[[], Any]):
@@ -329,6 +323,8 @@ class VideoTransformTrack(VideoStreamTrack):
             self.last_inference_time = now
             model_name = _cached_model_name
             if model_name not in pipelines or pipelines.get(model_name) is None:
                 model_name = 'mlp'
             active_pipeline = pipelines.get(model_name)
@@ -513,10 +509,56 @@ class _EventBuffer:
         except Exception as e:
             print(f"[DB] Flush error: {e}")
-def _process_frame_safe(pipeline, frame, model_name: str):
     with _pipeline_locks[model_name]:
         return pipeline.process_frame(frame)
 def _first_available_pipeline_name(preferred: str | None = None) -> str | None:
     if preferred and preferred in pipelines and pipelines.get(preferred) is not None:
         return preferred
@@ -525,6 +567,96 @@ def _first_available_pipeline_name(preferred: str | None = None) -> str | None:
             return name
     return None
 # ================ WEBRTC SIGNALING ================
 @app.post("/api/webrtc/offer")
@@ -590,14 +722,19 @@ async def webrtc_offer(offer: dict):
 @app.websocket("/ws/video")
 async def websocket_endpoint(websocket: WebSocket):
     await websocket.accept()
     session_id = None
     frame_count = 0
     running = True
     event_buffer = _EventBuffer(flush_interval=2.0)
-    # Latest frame slot: keep only the newest frame and drop stale ones.
-    # Using a dict so nested functions can mutate without nonlocal issues.
     _slot = {"frame": None}
     _frame_ready = asyncio.Event()
@@ -628,7 +765,6 @@ async def websocket_endpoint(websocket: WebSocket):
                 data = json.loads(text)
                 if data["type"] == "frame":
-                    # Legacy base64 path (fallback)
                     _slot["frame"] = base64.b64decode(data["image"])
                     _frame_ready.set()
@@ -647,6 +783,47 @@ async def websocket_endpoint(websocket: WebSocket):
                         if summary:
                             await websocket.send_json({"type": "session_ended", "summary": summary})
                         session_id = None
         except WebSocketDisconnect:
             running = False
             _frame_ready.set()
@@ -665,7 +842,6 @@ async def websocket_endpoint(websocket: WebSocket):
             if not running:
                 return
-            # Grab latest frame and clear slot
             raw = _slot["frame"]
             _slot["frame"] = None
             if raw is None:
@@ -678,36 +854,87 @@ async def websocket_endpoint(websocket: WebSocket):
                     continue
                 frame = cv2.resize(frame, (640, 480))
-                model_name = _first_available_pipeline_name(_cached_model_name)
-                active_pipeline = pipelines.get(model_name) if model_name is not None else None
                 landmarks_list = None
                 if active_pipeline is not None:
-                    out = await loop.run_in_executor(
-                        _inference_executor,
-                        _process_frame_safe,
-                        active_pipeline,
-                        frame,
-                        model_name,
-                    )
                     is_focused = out["is_focused"]
                     confidence = out.get("mlp_prob", out.get("raw_score", 0.0))
                     lm = out.get("landmarks")
                     if lm is not None:
-                        # Send all 478 landmarks as flat array for tessellation drawing
                         landmarks_list = [
                             [round(float(lm[i, 0]), 3), round(float(lm[i, 1]), 3)]
                             for i in range(lm.shape[0])
                         ]
                     if session_id:
-                        event_buffer.add(session_id, is_focused, confidence, {
                             "s_face": out.get("s_face", 0.0),
                             "s_eye": out.get("s_eye", 0.0),
                             "mar": out.get("mar", 0.0),
                             "model": model_name,
-                        })
                 else:
                     is_focused = False
                     confidence = 0.0
@@ -721,8 +948,7 @@ async def websocket_endpoint(websocket: WebSocket):
                     "fc": frame_count,
                     "frame_count": frame_count,
                 }
-                if active_pipeline is not None:
-                    # Send detailed metrics for HUD
                     if out.get("yaw") is not None:
                         resp["yaw"] = round(out["yaw"], 1)
                         resp["pitch"] = round(out["pitch"], 1)
@@ -731,6 +957,24 @@ async def websocket_endpoint(websocket: WebSocket):
                         resp["mar"] = round(out["mar"], 3)
                     resp["sf"] = round(out.get("s_face", 0), 3)
                     resp["se"] = round(out.get("s_eye", 0), 3)
                 if landmarks_list is not None:
                     resp["lm"] = landmarks_list
                 await websocket.send_json(resp)
@@ -863,8 +1107,9 @@ async def get_settings():
         db.row_factory = aiosqlite.Row
         cursor = await db.execute("SELECT * FROM user_settings WHERE id = 1")
         row = await cursor.fetchone()
-        if row: return dict(row)
-        else: return {'sensitivity': 6, 'notification_enabled': True, 'notification_threshold': 30, 'frame_rate': 30, 'model_name': 'mlp'}
 @app.put("/api/settings")
 async def update_settings(settings: SettingsUpdate):
@@ -889,12 +1134,28 @@ async def update_settings(settings: SettingsUpdate):
         if settings.frame_rate is not None:
             updates.append("frame_rate = ?")
             params.append(max(5, min(60, settings.frame_rate)))
-        if settings.model_name is not None and settings.model_name in pipelines and pipelines[settings.model_name] is not None:
             updates.append("model_name = ?")
             params.append(settings.model_name)
             global _cached_model_name
             _cached_model_name = settings.model_name
         if updates:
             query = f"UPDATE user_settings SET {', '.join(updates)} WHERE id = 1"
             await db.execute(query, params)
@@ -946,15 +1207,55 @@ async def get_stats_summary():
 @app.get("/api/models")
 async def get_available_models():
-    """Return list of loaded model names and which is currently active."""
-    available = [name for name, p in pipelines.items() if p is not None]
     async with aiosqlite.connect(db_path) as db:
         cursor = await db.execute("SELECT model_name FROM user_settings WHERE id = 1")
         row = await cursor.fetchone()
         current = row[0] if row else "mlp"
         if current not in available and available:
             current = available[0]
-    return {"available": available, "current": current}
 @app.get("/api/mesh-topology")
 async def get_mesh_topology():

 from av import VideoFrame
 from mediapipe.tasks.python.vision import FaceLandmarksConnections
+from ui.pipeline import (
+    FaceMeshPipeline, MLPPipeline, HybridFocusPipeline, XGBoostPipeline,
+    L2CSPipeline, is_l2cs_weights_available,
+)
 from models.face_mesh import FaceMeshDetector
 # ================ FACE MESH DRAWING (server-side, for WebRTC) ================
 db_path = "focus_guard.db"
 pcs = set()
 _cached_model_name = "mlp"
+_l2cs_boost_enabled = False
 async def _wait_for_ice_gathering(pc: RTCPeerConnection):
     if pc.iceGatheringState == "complete":
     notification_threshold: Optional[int] = None
     frame_rate: Optional[int] = None
     model_name: Optional[str] = None
+    l2cs_boost: Optional[bool] = None
 class VideoTransformTrack(VideoStreamTrack):
     def __init__(self, track, session_id: int, get_channel: Callable[[], Any]):
             self.last_inference_time = now
             model_name = _cached_model_name
+            if model_name == "l2cs" and pipelines.get("l2cs") is None:
+                _ensure_l2cs()
             if model_name not in pipelines or pipelines.get(model_name) is None:
                 model_name = 'mlp'
             active_pipeline = pipelines.get(model_name)
         except Exception as e:
             print(f"[DB] Flush error: {e}")
+# ================ STARTUP/SHUTDOWN ================
+pipelines = {
+    "geometric": None,
+    "mlp": None,
+    "hybrid": None,
+    "xgboost": None,
+    "l2cs": None,
+}
+# Thread pool for CPU-bound inference so the event loop stays responsive.
+_inference_executor = concurrent.futures.ThreadPoolExecutor(
+    max_workers=4,
+    thread_name_prefix="inference",
+)
+# One lock per pipeline so shared state (TemporalTracker, etc.) is not corrupted when
+# multiple frames are processed in parallel by the thread pool.
+_pipeline_locks = {name: threading.Lock() for name in ("geometric", "mlp", "hybrid", "xgboost", "l2cs")}
+_l2cs_load_lock = threading.Lock()
+_l2cs_error: str | None = None
+def _ensure_l2cs():
+    # lazy-load L2CS on first use, double-checked locking
+    global _l2cs_error
+    if pipelines["l2cs"] is not None:
+        return True
+    with _l2cs_load_lock:
+        if pipelines["l2cs"] is not None:
+            return True
+        if not is_l2cs_weights_available():
+            _l2cs_error = "Weights not found"
+            return False
+        try:
+            pipelines["l2cs"] = L2CSPipeline()
+            _l2cs_error = None
+            print("[OK] L2CSPipeline lazy-loaded")
+            return True
+        except Exception as e:
+            _l2cs_error = str(e)
+            print(f"[ERR] L2CS lazy-load failed: {e}")
+            return False
+def _process_frame_safe(pipeline, frame, model_name):
     with _pipeline_locks[model_name]:
         return pipeline.process_frame(frame)
 def _first_available_pipeline_name(preferred: str | None = None) -> str | None:
     if preferred and preferred in pipelines and pipelines.get(preferred) is not None:
         return preferred
             return name
     return None
+_BOOST_BASE_W = 0.35
+_BOOST_L2CS_W = 0.65
+_BOOST_VETO = 0.38  # L2CS below this -> forced not-focused
+def _process_frame_with_l2cs_boost(base_pipeline, frame, base_model_name):
+    # run base model
+    with _pipeline_locks[base_model_name]:
+        base_out = base_pipeline.process_frame(frame)
+    l2cs_pipe = pipelines.get("l2cs")
+    if l2cs_pipe is None:
+        base_out["boost_active"] = False
+        return base_out
+    # run L2CS
+    with _pipeline_locks["l2cs"]:
+        l2cs_out = l2cs_pipe.process_frame(frame)
+    base_score = base_out.get("mlp_prob", base_out.get("raw_score", 0.0))
+    l2cs_score = l2cs_out.get("raw_score", 0.0)
+    # veto: gaze clearly off-screen overrides base model
+    if l2cs_score < _BOOST_VETO:
+        fused_score = l2cs_score * 0.8
+        is_focused = False
+    else:
+        fused_score = _BOOST_BASE_W * base_score + _BOOST_L2CS_W * l2cs_score
+        is_focused = fused_score >= 0.52
+    base_out["raw_score"] = fused_score
+    base_out["is_focused"] = is_focused
+    base_out["boost_active"] = True
+    base_out["base_score"] = round(base_score, 3)
+    base_out["l2cs_score"] = round(l2cs_score, 3)
+    if l2cs_out.get("gaze_yaw") is not None:
+        base_out["gaze_yaw"] = l2cs_out["gaze_yaw"]
+        base_out["gaze_pitch"] = l2cs_out["gaze_pitch"]
+    return base_out
+@app.on_event("startup")
+async def startup_event():
+    global pipelines, _cached_model_name
+    print(" Starting Focus Guard API...")
+    await init_database()
+    # Load cached model name from DB
+    async with aiosqlite.connect(db_path) as db:
+        cursor = await db.execute("SELECT model_name FROM user_settings WHERE id = 1")
+        row = await cursor.fetchone()
+        if row:
+            _cached_model_name = row[0]
+    print("[OK] Database initialized")
+    try:
+        pipelines["geometric"] = FaceMeshPipeline()
+        print("[OK] FaceMeshPipeline (geometric) loaded")
+    except Exception as e:
+        print(f"[WARN] FaceMeshPipeline unavailable: {e}")
+    try:
+        pipelines["mlp"] = MLPPipeline()
+        print("[OK] MLPPipeline loaded")
+    except Exception as e:
+        print(f"[ERR] Failed to load MLPPipeline: {e}")
+    try:
+        pipelines["hybrid"] = HybridFocusPipeline()
+        print("[OK] HybridFocusPipeline loaded")
+    except Exception as e:
+        print(f"[WARN] HybridFocusPipeline unavailable: {e}")
+    try:
+        pipelines["xgboost"] = XGBoostPipeline()
+        print("[OK] XGBoostPipeline loaded")
+    except Exception as e:
+        print(f"[ERR] Failed to load XGBoostPipeline: {e}")
+    if is_l2cs_weights_available():
+        print("[OK] L2CS weights found — pipeline will be lazy-loaded on first use")
+    else:
+        print("[WARN] L2CS weights not found — l2cs model unavailable")
+@app.on_event("shutdown")
+async def shutdown_event():
+    _inference_executor.shutdown(wait=False)
+    print(" Shutting down Focus Guard API...")
 # ================ WEBRTC SIGNALING ================
 @app.post("/api/webrtc/offer")
 @app.websocket("/ws/video")
 async def websocket_endpoint(websocket: WebSocket):
+    from models.gaze_calibration import GazeCalibration
+    from models.gaze_eye_fusion import GazeEyeFusion
     await websocket.accept()
     session_id = None
     frame_count = 0
     running = True
     event_buffer = _EventBuffer(flush_interval=2.0)
+    # Calibration state (per-connection)
+    _cal: dict = {"cal": None, "collecting": False, "fusion": None}
+    # Latest frame slot — only the most recent frame is kept, older ones are dropped.
     _slot = {"frame": None}
     _frame_ready = asyncio.Event()
                 data = json.loads(text)
                 if data["type"] == "frame":
                     _slot["frame"] = base64.b64decode(data["image"])
                     _frame_ready.set()
                         if summary:
                             await websocket.send_json({"type": "session_ended", "summary": summary})
                         session_id = None
+                # ---- Calibration commands ----
+                elif data["type"] == "calibration_start":
+                    loop = asyncio.get_event_loop()
+                    await loop.run_in_executor(_inference_executor, _ensure_l2cs)
+                    _cal["cal"] = GazeCalibration()
+                    _cal["collecting"] = True
+                    _cal["fusion"] = None
+                    cal = _cal["cal"]
+                    await websocket.send_json({
+                        "type": "calibration_started",
+                        "num_points": cal.num_points,
+                        "target": list(cal.current_target),
+                        "index": cal.current_index,
+                    })
+                elif data["type"] == "calibration_next":
+                    cal = _cal.get("cal")
+                    if cal is not None:
+                        more = cal.advance()
+                        if more:
+                            await websocket.send_json({
+                                "type": "calibration_point",
+                                "target": list(cal.current_target),
+                                "index": cal.current_index,
+                            })
+                        else:
+                            _cal["collecting"] = False
+                            ok = cal.fit()
+                            if ok:
+                                _cal["fusion"] = GazeEyeFusion(cal)
+                                await websocket.send_json({"type": "calibration_done", "success": True})
+                            else:
+                                await websocket.send_json({"type": "calibration_done", "success": False, "error": "Not enough samples"})
+                elif data["type"] == "calibration_cancel":
+                    _cal["cal"] = None
+                    _cal["collecting"] = False
+                    _cal["fusion"] = None
+                    await websocket.send_json({"type": "calibration_cancelled"})
         except WebSocketDisconnect:
             running = False
             _frame_ready.set()
             if not running:
                 return
             raw = _slot["frame"]
             _slot["frame"] = None
             if raw is None:
                     continue
                 frame = cv2.resize(frame, (640, 480))
+                # During calibration collection, always use L2CS
+                collecting = _cal.get("collecting", False)
+                if collecting:
+                    if pipelines.get("l2cs") is None:
+                        await loop.run_in_executor(_inference_executor, _ensure_l2cs)
+                    use_model = "l2cs" if pipelines.get("l2cs") is not None else _cached_model_name
+                else:
+                    use_model = _cached_model_name
+                model_name = use_model
+                if model_name == "l2cs" and pipelines.get("l2cs") is None:
+                    await loop.run_in_executor(_inference_executor, _ensure_l2cs)
+                if model_name not in pipelines or pipelines.get(model_name) is None:
+                    model_name = "mlp"
+                active_pipeline = pipelines.get(model_name)
+                # L2CS boost: run L2CS alongside base model
+                use_boost = (
+                    _l2cs_boost_enabled
+                    and model_name != "l2cs"
+                    and pipelines.get("l2cs") is not None
+                    and not collecting
+                )
                 landmarks_list = None
+                out = None
                 if active_pipeline is not None:
+                    if use_boost:
+                        out = await loop.run_in_executor(
+                            _inference_executor,
+                            _process_frame_with_l2cs_boost,
+                            active_pipeline,
+                            frame,
+                            model_name,
+                        )
+                    else:
+                        out = await loop.run_in_executor(
+                            _inference_executor,
+                            _process_frame_safe,
+                            active_pipeline,
+                            frame,
+                            model_name,
+                        )
                     is_focused = out["is_focused"]
                     confidence = out.get("mlp_prob", out.get("raw_score", 0.0))
                     lm = out.get("landmarks")
                     if lm is not None:
                         landmarks_list = [
                             [round(float(lm[i, 0]), 3), round(float(lm[i, 1]), 3)]
                             for i in range(lm.shape[0])
                         ]
+                    # Calibration sample collection (L2CS gaze angles)
+                    if collecting and _cal.get("cal") is not None:
+                        pipe_yaw = out.get("gaze_yaw")
+                        pipe_pitch = out.get("gaze_pitch")
+                        if pipe_yaw is not None and pipe_pitch is not None:
+                            _cal["cal"].collect_sample(pipe_yaw, pipe_pitch)
+                    # Gaze fusion (when L2CS active + calibration fitted)
+                    fusion = _cal.get("fusion")
+                    if (
+                        fusion is not None
+                        and model_name == "l2cs"
+                        and out.get("gaze_yaw") is not None
+                    ):
+                        fuse = fusion.update(
+                            out["gaze_yaw"], out["gaze_pitch"], lm
+                        )
+                        is_focused = fuse["focused"]
+                        confidence = fuse["focus_score"]
                     if session_id:
+                        metadata = {
                             "s_face": out.get("s_face", 0.0),
                             "s_eye": out.get("s_eye", 0.0),
                             "mar": out.get("mar", 0.0),
                             "model": model_name,
+                        }
+                        event_buffer.add(session_id, is_focused, confidence, metadata)
                 else:
                     is_focused = False
                     confidence = 0.0
                     "fc": frame_count,
                     "frame_count": frame_count,
                 }
+                if out is not None:
                     if out.get("yaw") is not None:
                         resp["yaw"] = round(out["yaw"], 1)
                         resp["pitch"] = round(out["pitch"], 1)
                         resp["mar"] = round(out["mar"], 3)
                     resp["sf"] = round(out.get("s_face", 0), 3)
                     resp["se"] = round(out.get("s_eye", 0), 3)
+                    # Gaze fusion fields (L2CS standalone or boost mode)
+                    fusion = _cal.get("fusion")
+                    has_gaze = out.get("gaze_yaw") is not None
+                    if fusion is not None and has_gaze and (model_name == "l2cs" or use_boost):
+                        fuse = fusion.update(out["gaze_yaw"], out["gaze_pitch"], out.get("landmarks"))
+                        resp["gaze_x"] = fuse["gaze_x"]
+                        resp["gaze_y"] = fuse["gaze_y"]
+                        resp["on_screen"] = fuse["on_screen"]
+                        if model_name == "l2cs":
+                            resp["focused"] = fuse["focused"]
+                            resp["confidence"] = round(fuse["focus_score"], 3)
+                    if out.get("boost_active"):
+                        resp["boost"] = True
+                        resp["base_score"] = out.get("base_score", 0)
+                        resp["l2cs_score"] = out.get("l2cs_score", 0)
                 if landmarks_list is not None:
                     resp["lm"] = landmarks_list
                 await websocket.send_json(resp)
         db.row_factory = aiosqlite.Row
         cursor = await db.execute("SELECT * FROM user_settings WHERE id = 1")
         row = await cursor.fetchone()
+        result = dict(row) if row else {'sensitivity': 6, 'notification_enabled': True, 'notification_threshold': 30, 'frame_rate': 30, 'model_name': 'mlp'}
+        result['l2cs_boost'] = _l2cs_boost_enabled
+        return result
 @app.put("/api/settings")
 async def update_settings(settings: SettingsUpdate):
         if settings.frame_rate is not None:
             updates.append("frame_rate = ?")
             params.append(max(5, min(60, settings.frame_rate)))
+        if settings.model_name is not None and settings.model_name in pipelines:
+            if settings.model_name == "l2cs":
+                loop = asyncio.get_event_loop()
+                loaded = await loop.run_in_executor(_inference_executor, _ensure_l2cs)
+                if not loaded:
+                    raise HTTPException(status_code=400, detail=f"L2CS model unavailable: {_l2cs_error}")
+            elif pipelines[settings.model_name] is None:
+                raise HTTPException(status_code=400, detail=f"Model '{settings.model_name}' not loaded")
             updates.append("model_name = ?")
             params.append(settings.model_name)
             global _cached_model_name
             _cached_model_name = settings.model_name
+        if settings.l2cs_boost is not None:
+            global _l2cs_boost_enabled
+            if settings.l2cs_boost:
+                loop = asyncio.get_event_loop()
+                loaded = await loop.run_in_executor(_inference_executor, _ensure_l2cs)
+                if not loaded:
+                    raise HTTPException(status_code=400, detail=f"L2CS boost unavailable: {_l2cs_error}")
+            _l2cs_boost_enabled = settings.l2cs_boost
         if updates:
             query = f"UPDATE user_settings SET {', '.join(updates)} WHERE id = 1"
             await db.execute(query, params)
 @app.get("/api/models")
 async def get_available_models():
+    """Return model names, statuses, and which is currently active."""
+    statuses = {}
+    errors = {}
+    available = []
+    for name, p in pipelines.items():
+        if name == "l2cs":
+            if p is not None:
+                statuses[name] = "ready"
+                available.append(name)
+            elif is_l2cs_weights_available():
+                statuses[name] = "lazy"
+                available.append(name)
+            elif _l2cs_error:
+                statuses[name] = "error"
+                errors[name] = _l2cs_error
+            else:
+                statuses[name] = "unavailable"
+        elif p is not None:
+            statuses[name] = "ready"
+            available.append(name)
+        else:
+            statuses[name] = "unavailable"
     async with aiosqlite.connect(db_path) as db:
         cursor = await db.execute("SELECT model_name FROM user_settings WHERE id = 1")
         row = await cursor.fetchone()
         current = row[0] if row else "mlp"
         if current not in available and available:
             current = available[0]
+    l2cs_boost_available = (
+        statuses.get("l2cs") in ("ready", "lazy") and current != "l2cs"
+    )
+    return {
+        "available": available,
+        "current": current,
+        "statuses": statuses,
+        "errors": errors,
+        "l2cs_boost": _l2cs_boost_enabled,
+        "l2cs_boost_available": l2cs_boost_available,
+    }
+@app.get("/api/l2cs/status")
+async def l2cs_status():
+    """L2CS-specific status: weights available, loaded, and calibration info."""
+    loaded = pipelines.get("l2cs") is not None
+    return {
+        "weights_available": is_l2cs_weights_available(),
+        "loaded": loaded,
+        "error": _l2cs_error,
+    }
 @app.get("/api/mesh-topology")
 async def get_mesh_topology():

models/L2CS-Net/.gitignore ADDED Viewed

	@@ -0,0 +1,140 @@

+# Ignore the test data - sensitive
+datasets/
+evaluation/
+output/
+# Ignore debugging configurations
+/.vscode
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+.python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# Ignore other files
+my.secrets

models/L2CS-Net/LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2022 Ahmed Abdelrahman
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

models/L2CS-Net/README.md ADDED Viewed

	@@ -0,0 +1,148 @@

+ <p align="center">
+  <img src="https://github.com/Ahmednull/Storage/blob/main/gaze.gif" alt="animated" />
+</p>
+___
+# L2CS-Net
+The official PyTorch implementation of L2CS-Net for gaze estimation and tracking.
+## Installation
+<img src="https://img.shields.io/badge/python%20-%2314354C.svg?&style=for-the-badge&logo=python&logoColor=white"/> <img src="https://img.shields.io/badge/PyTorch%20-%23EE4C2C.svg?&style=for-the-badge&logo=PyTorch&logoColor=white" />
+Install package with the following:
+```
+pip install git+https://github.com/Ahmednull/L2CS-Net.git@main
+```
+Or, you can git clone the repo and install with the following:
+```
+pip install [-e] .
+```
+Now you should be able to import the package with the following command:
+```
+$ python
+>>> import l2cs
+```
+## Usage
+Detect face and predict gaze from webcam
+```python
+from l2cs import Pipeline, render
+import cv2
+gaze_pipeline = Pipeline(
+    weights=CWD / 'models' / 'L2CSNet_gaze360.pkl',
+    arch='ResNet50',
+    device=torch.device('cpu') # or 'gpu'
+)
+cap = cv2.VideoCapture(cam)
+_, frame = cap.read()
+# Process frame and visualize
+results = gaze_pipeline.step(frame)
+frame = render(frame, results)
+```
+## Demo
+* Download the pre-trained models from [here](https://drive.google.com/drive/folders/17p6ORr-JQJcw-eYtG2WGNiuS_qVKwdWd?usp=sharing) and Store it to *models/*.
+*  Run:
+```
+ python demo.py \
+ --snapshot models/L2CSNet_gaze360.pkl \
+ --gpu 0 \
+ --cam 0 \
+```
+This means the demo will run using *L2CSNet_gaze360.pkl* pretrained model
+## Community Contributions
+- [Gaze Detection and Eye Tracking: A How-To Guide](https://blog.roboflow.com/gaze-direction-position/): Use L2CS-Net through a HTTP interface with the open source Roboflow Inference project.
+## MPIIGaze
+We provide the code for train and test MPIIGaze dataset with leave-one-person-out evaluation.
+### Prepare datasets
+* Download **MPIIFaceGaze dataset** from [here](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/gaze-based-human-computer-interaction/its-written-all-over-your-face-full-face-appearance-based-gaze-estimation).
+* Apply data preprocessing from [here](http://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/).
+* Store the dataset to *datasets/MPIIFaceGaze*.
+### Train
+```
+ python train.py \
+ --dataset mpiigaze \
+ --snapshot output/snapshots \
+ --gpu 0 \
+ --num_epochs 50 \
+ --batch_size 16 \
+ --lr 0.00001 \
+ --alpha 1 \
+```
+This means the code will perform leave-one-person-out training automatically and store the models to *output/snapshots*.
+### Test
+```
+ python test.py \
+ --dataset mpiigaze \
+ --snapshot output/snapshots/snapshot_folder \
+ --evalpath evaluation/L2CS-mpiigaze  \
+ --gpu 0 \
+```
+This means the code will perform leave-one-person-out testing automatically and store the results to *evaluation/L2CS-mpiigaze*.
+To get the average leave-one-person-out accuracy use:
+```
+ python leave_one_out_eval.py \
+ --evalpath evaluation/L2CS-mpiigaze  \
+ --respath evaluation/L2CS-mpiigaze  \
+```
+This means the code will take the evaluation path and outputs the leave-one-out gaze accuracy to the *evaluation/L2CS-mpiigaze*.
+## Gaze360
+We provide the code for train and test Gaze360 dataset with train-val-test evaluation.
+### Prepare datasets
+* Download **Gaze360 dataset** from [here](http://gaze360.csail.mit.edu/download.php).
+* Apply data preprocessing from [here](http://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/).
+* Store the dataset to *datasets/Gaze360*.
+### Train
+```
+ python train.py \
+ --dataset gaze360 \
+ --snapshot output/snapshots \
+ --gpu 0 \
+ --num_epochs 50 \
+ --batch_size 16 \
+ --lr 0.00001 \
+ --alpha 1 \
+```
+This means the code will perform training and store the models to *output/snapshots*.
+### Test
+```
+ python test.py \
+ --dataset gaze360 \
+ --snapshot output/snapshots/snapshot_folder \
+ --evalpath evaluation/L2CS-gaze360  \
+ --gpu 0 \
+```
+This means the code will perform testing on snapshot_folder and store the results to *evaluation/L2CS-gaze360*.

models/L2CS-Net/demo.py ADDED Viewed

	@@ -0,0 +1,87 @@

+import argparse
+import pathlib
+import numpy as np
+import cv2
+import time
+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+from torchvision import transforms
+import torch.backends.cudnn as cudnn
+import torchvision
+from PIL import Image
+from PIL import Image, ImageOps
+from face_detection import RetinaFace
+from l2cs import select_device, draw_gaze, getArch, Pipeline, render
+CWD = pathlib.Path.cwd()
+def parse_args():
+    """Parse input arguments."""
+    parser = argparse.ArgumentParser(
+        description='Gaze evalution using model pretrained with L2CS-Net on Gaze360.')
+    parser.add_argument(
+        '--device',dest='device', help='Device to run model: cpu or gpu:0',
+        default="cpu", type=str)
+    parser.add_argument(
+        '--snapshot',dest='snapshot', help='Path of model snapshot.',
+        default='output/snapshots/L2CS-gaze360-_loader-180-4/_epoch_55.pkl', type=str)
+    parser.add_argument(
+        '--cam',dest='cam_id', help='Camera device id to use [0]',
+        default=0, type=int)
+    parser.add_argument(
+        '--arch',dest='arch',help='Network architecture, can be: ResNet18, ResNet34, ResNet50, ResNet101, ResNet152',
+        default='ResNet50', type=str)
+    args = parser.parse_args()
+    return args
+if __name__ == '__main__':
+    args = parse_args()
+    cudnn.enabled = True
+    arch=args.arch
+    cam = args.cam_id
+    # snapshot_path = args.snapshot
+    gaze_pipeline = Pipeline(
+        weights=CWD / 'models' / 'L2CSNet_gaze360.pkl',
+        arch='ResNet50',
+        device = select_device(args.device, batch_size=1)
+    )
+    cap = cv2.VideoCapture(cam)
+    # Check if the webcam is opened correctly
+    if not cap.isOpened():
+        raise IOError("Cannot open webcam")
+    with torch.no_grad():
+        while True:
+            # Get frame
+            success, frame = cap.read()
+            start_fps = time.time()
+            if not success:
+                print("Failed to obtain frame")
+                time.sleep(0.1)
+            # Process frame
+            results = gaze_pipeline.step(frame)
+            # Visualize output
+            frame = render(frame, results)
+            myFPS = 1.0 / (time.time() - start_fps)
+            cv2.putText(frame, 'FPS: {:.1f}'.format(myFPS), (10, 20),cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, (0, 255, 0), 1, cv2.LINE_AA)
+            cv2.imshow("Demo",frame)
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                break
+            success,frame = cap.read()

models/L2CS-Net/l2cs/__init__.py ADDED Viewed

	@@ -0,0 +1,21 @@

+from .utils import select_device, natural_keys, gazeto3d, angular, getArch
+from .vis import draw_gaze, render
+from .model import L2CS
+from .pipeline import Pipeline
+from .datasets import Gaze360, Mpiigaze
+__all__ = [
+    # Classes
+    'L2CS',
+    'Pipeline',
+    'Gaze360',
+    'Mpiigaze',
+    # Utils
+    'render',
+    'select_device',
+    'draw_gaze',
+    'natural_keys',
+    'gazeto3d',
+    'angular',
+    'getArch'
+]

models/L2CS-Net/l2cs/datasets.py ADDED Viewed

	@@ -0,0 +1,157 @@

+import os
+import numpy as np
+import cv2
+import torch
+from torch.utils.data.dataset import Dataset
+from torchvision import transforms
+from PIL import Image, ImageFilter
+class Gaze360(Dataset):
+    def __init__(self, path, root, transform, angle, binwidth, train=True):
+        self.transform = transform
+        self.root = root
+        self.orig_list_len = 0
+        self.angle = angle
+        if train==False:
+          angle=90
+        self.binwidth=binwidth
+        self.lines = []
+        if isinstance(path, list):
+            for i in path:
+                with open(i) as f:
+                    print("here")
+                    line = f.readlines()
+                    line.pop(0)
+                    self.lines.extend(line)
+        else:
+            with open(path) as f:
+                lines = f.readlines()
+                lines.pop(0)
+                self.orig_list_len = len(lines)
+                for line in lines:
+                    gaze2d = line.strip().split(" ")[5]
+                    label = np.array(gaze2d.split(",")).astype("float")
+                    if abs((label[0]*180/np.pi)) <= angle and abs((label[1]*180/np.pi)) <= angle:
+                        self.lines.append(line)
+        print("{} items removed from dataset that have an angle > {}".format(self.orig_list_len-len(self.lines), angle))
+    def __len__(self):
+        return len(self.lines)
+    def __getitem__(self, idx):
+        line = self.lines[idx]
+        line = line.strip().split(" ")
+        face = line[0]
+        lefteye = line[1]
+        righteye = line[2]
+        name = line[3]
+        gaze2d = line[5]
+        label = np.array(gaze2d.split(",")).astype("float")
+        label = torch.from_numpy(label).type(torch.FloatTensor)
+        pitch = label[0]* 180 / np.pi
+        yaw = label[1]* 180 / np.pi
+        img = Image.open(os.path.join(self.root, face))
+        # fimg = cv2.imread(os.path.join(self.root, face))
+        # fimg = cv2.resize(fimg, (448, 448))/255.0
+        # fimg = fimg.transpose(2, 0, 1)
+        # img=torch.from_numpy(fimg).type(torch.FloatTensor)
+        if self.transform:
+            img = self.transform(img)
+        # Bin values
+        bins = np.array(range(-1*self.angle, self.angle, self.binwidth))
+        binned_pose = np.digitize([pitch, yaw], bins) - 1
+        labels = binned_pose
+        cont_labels = torch.FloatTensor([pitch, yaw])
+        return img, labels, cont_labels, name
+class Mpiigaze(Dataset):
+  def __init__(self, pathorg, root, transform, train, angle,fold=0):
+    self.transform = transform
+    self.root = root
+    self.orig_list_len = 0
+    self.lines = []
+    path=pathorg.copy()
+    if train==True:
+      path.pop(fold)
+    else:
+      path=path[fold]
+    if isinstance(path, list):
+        for i in path:
+            with open(i) as f:
+                lines = f.readlines()
+                lines.pop(0)
+                self.orig_list_len += len(lines)
+                for line in lines:
+                    gaze2d = line.strip().split(" ")[7]
+                    label = np.array(gaze2d.split(",")).astype("float")
+                    if abs((label[0]*180/np.pi)) <= angle and abs((label[1]*180/np.pi)) <= angle:
+                        self.lines.append(line)
+    else:
+      with open(path) as f:
+        lines = f.readlines()
+        lines.pop(0)
+        self.orig_list_len += len(lines)
+        for line in lines:
+            gaze2d = line.strip().split(" ")[7]
+            label = np.array(gaze2d.split(",")).astype("float")
+            if abs((label[0]*180/np.pi)) <= 42 and abs((label[1]*180/np.pi)) <= 42:
+                self.lines.append(line)
+    print("{} items removed from dataset that have an angle > {}".format(self.orig_list_len-len(self.lines),angle))
+  def __len__(self):
+    return len(self.lines)
+  def __getitem__(self, idx):
+    line = self.lines[idx]
+    line = line.strip().split(" ")
+    name = line[3]
+    gaze2d = line[7]
+    head2d = line[8]
+    lefteye = line[1]
+    righteye = line[2]
+    face = line[0]
+    label = np.array(gaze2d.split(",")).astype("float")
+    label = torch.from_numpy(label).type(torch.FloatTensor)
+    pitch = label[0]* 180 / np.pi
+    yaw = label[1]* 180 / np.pi
+    img = Image.open(os.path.join(self.root, face))
+    # fimg = cv2.imread(os.path.join(self.root, face))
+    # fimg = cv2.resize(fimg, (448, 448))/255.0
+    # fimg = fimg.transpose(2, 0, 1)
+    # img=torch.from_numpy(fimg).type(torch.FloatTensor)
+    if self.transform:
+        img = self.transform(img)
+    # Bin values
+    bins = np.array(range(-42, 42,3))
+    binned_pose = np.digitize([pitch, yaw], bins) - 1
+    labels = binned_pose
+    cont_labels = torch.FloatTensor([pitch, yaw])
+    return img, labels, cont_labels, name

models/L2CS-Net/l2cs/model.py ADDED Viewed

	@@ -0,0 +1,73 @@

+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+import math
+import torch.nn.functional as F
+class L2CS(nn.Module):
+    def __init__(self, block, layers, num_bins):
+        self.inplanes = 64
+        super(L2CS, self).__init__()
+        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,bias=False)
+        self.bn1 = nn.BatchNorm2d(64)
+        self.relu = nn.ReLU(inplace=True)
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.layer1 = self._make_layer(block, 64, layers[0])
+        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
+        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
+        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
+        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
+        self.fc_yaw_gaze = nn.Linear(512 * block.expansion, num_bins)
+        self.fc_pitch_gaze = nn.Linear(512 * block.expansion, num_bins)
+       # Vestigial layer from previous experiments
+        self.fc_finetune = nn.Linear(512 * block.expansion + 3, 3)
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                m.weight.data.normal_(0, math.sqrt(2. / n))
+            elif isinstance(m, nn.BatchNorm2d):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
+    def _make_layer(self, block, planes, blocks, stride=1):
+        downsample = None
+        if stride != 1 or self.inplanes != planes * block.expansion:
+            downsample = nn.Sequential(
+                nn.Conv2d(self.inplanes, planes * block.expansion,
+                          kernel_size=1, stride=stride, bias=False),
+                nn.BatchNorm2d(planes * block.expansion),
+            )
+        layers = []
+        layers.append(block(self.inplanes, planes, stride, downsample))
+        self.inplanes = planes * block.expansion
+        for i in range(1, blocks):
+            layers.append(block(self.inplanes, planes))
+        return nn.Sequential(*layers)
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        x = self.maxpool(x)
+        x = self.layer1(x)
+        x = self.layer2(x)
+        x = self.layer3(x)
+        x = self.layer4(x)
+        x = self.avgpool(x)
+        x = x.view(x.size(0), -1)
+        # gaze
+        pre_yaw_gaze =  self.fc_yaw_gaze(x)
+        pre_pitch_gaze = self.fc_pitch_gaze(x)
+        return pre_yaw_gaze, pre_pitch_gaze

models/L2CS-Net/l2cs/pipeline.py ADDED Viewed

	@@ -0,0 +1,133 @@

+import pathlib
+from typing import Union
+import cv2
+import numpy as np
+import torch
+import torch.nn as nn
+from dataclasses import dataclass
+from face_detection import RetinaFace
+from .utils import prep_input_numpy, getArch
+from .results import GazeResultContainer
+class Pipeline:
+    def __init__(
+        self,
+        weights: pathlib.Path,
+        arch: str,
+        device: str = 'cpu',
+        include_detector:bool = True,
+        confidence_threshold:float = 0.5
+        ):
+        # Save input parameters
+        self.weights = weights
+        self.include_detector = include_detector
+        self.device = device
+        self.confidence_threshold = confidence_threshold
+        # Create L2CS model
+        self.model = getArch(arch, 90)
+        self.model.load_state_dict(torch.load(self.weights, map_location=device))
+        self.model.to(self.device)
+        self.model.eval()
+        # Create RetinaFace if requested
+        if self.include_detector:
+            if device.type == 'cpu':
+                self.detector = RetinaFace()
+            else:
+                self.detector = RetinaFace(gpu_id=device.index)
+            self.softmax = nn.Softmax(dim=1)
+            self.idx_tensor = [idx for idx in range(90)]
+            self.idx_tensor = torch.FloatTensor(self.idx_tensor).to(self.device)
+    def step(self, frame: np.ndarray) -> GazeResultContainer:
+        # Creating containers
+        face_imgs = []
+        bboxes = []
+        landmarks = []
+        scores = []
+        if self.include_detector:
+            faces = self.detector(frame)
+            if faces is not None:
+                for box, landmark, score in faces:
+                    # Apply threshold
+                    if score < self.confidence_threshold:
+                        continue
+                    # Extract safe min and max of x,y
+                    x_min=int(box[0])
+                    if x_min < 0:
+                        x_min = 0
+                    y_min=int(box[1])
+                    if y_min < 0:
+                        y_min = 0
+                    x_max=int(box[2])
+                    y_max=int(box[3])
+                    # Crop image
+                    img = frame[y_min:y_max, x_min:x_max]
+                    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+                    img = cv2.resize(img, (224, 224))
+                    face_imgs.append(img)
+                    # Save data
+                    bboxes.append(box)
+                    landmarks.append(landmark)
+                    scores.append(score)
+                # Predict gaze
+                pitch, yaw = self.predict_gaze(np.stack(face_imgs))
+            else:
+                pitch = np.empty((0,1))
+                yaw = np.empty((0,1))
+        else:
+            pitch, yaw = self.predict_gaze(frame)
+        # Save data
+        results = GazeResultContainer(
+            pitch=pitch,
+            yaw=yaw,
+            bboxes=np.stack(bboxes),
+            landmarks=np.stack(landmarks),
+            scores=np.stack(scores)
+        )
+        return results
+    def predict_gaze(self, frame: Union[np.ndarray, torch.Tensor]):
+        # Prepare input
+        if isinstance(frame, np.ndarray):
+            img = prep_input_numpy(frame, self.device)
+        elif isinstance(frame, torch.Tensor):
+            img = frame
+        else:
+            raise RuntimeError("Invalid dtype for input")
+        # Predict
+        gaze_pitch, gaze_yaw = self.model(img)
+        pitch_predicted = self.softmax(gaze_pitch)
+        yaw_predicted = self.softmax(gaze_yaw)
+        # Get continuous predictions in degrees.
+        pitch_predicted = torch.sum(pitch_predicted.data * self.idx_tensor, dim=1) * 4 - 180
+        yaw_predicted = torch.sum(yaw_predicted.data * self.idx_tensor, dim=1) * 4 - 180
+        pitch_predicted= pitch_predicted.cpu().detach().numpy()* np.pi/180.0
+        yaw_predicted= yaw_predicted.cpu().detach().numpy()* np.pi/180.0
+        return pitch_predicted, yaw_predicted

models/L2CS-Net/l2cs/results.py ADDED Viewed

	@@ -0,0 +1,11 @@

+from dataclasses import dataclass
+import numpy as np
+@dataclass
+class GazeResultContainer:
+    pitch: np.ndarray
+    yaw: np.ndarray
+    bboxes: np.ndarray
+    landmarks: np.ndarray
+    scores: np.ndarray

models/L2CS-Net/l2cs/utils.py ADDED Viewed

	@@ -0,0 +1,145 @@

+import sys
+import os
+import math
+from math import cos, sin
+from pathlib import Path
+import subprocess
+import re
+import numpy as np
+import torch
+import torch.nn as nn
+import scipy.io as sio
+import cv2
+import torchvision
+from torchvision import transforms
+from .model import L2CS
+transformations = transforms.Compose([
+    transforms.ToPILImage(),
+    transforms.Resize(448),
+    transforms.ToTensor(),
+    transforms.Normalize(
+        mean=[0.485, 0.456, 0.406],
+        std=[0.229, 0.224, 0.225]
+    )
+])
+def atoi(text):
+    return int(text) if text.isdigit() else text
+def natural_keys(text):
+    '''
+    alist.sort(key=natural_keys) sorts in human order
+    http://nedbatchelder.com/blog/200712/human_sorting.html
+    (See Toothy's implementation in the comments)
+    '''
+    return [ atoi(c) for c in re.split(r'(\d+)', text) ]
+def prep_input_numpy(img:np.ndarray, device:str):
+    """Preparing a Numpy Array as input to L2CS-Net."""
+    if len(img.shape) == 4:
+        imgs = []
+        for im in img:
+            imgs.append(transformations(im))
+        img = torch.stack(imgs)
+    else:
+        img = transformations(img)
+    img = img.to(device)
+    if len(img.shape) == 3:
+        img = img.unsqueeze(0)
+    return img
+def gazeto3d(gaze):
+    gaze_gt = np.zeros([3])
+    gaze_gt[0] = -np.cos(gaze[1]) * np.sin(gaze[0])
+    gaze_gt[1] = -np.sin(gaze[1])
+    gaze_gt[2] = -np.cos(gaze[1]) * np.cos(gaze[0])
+    return gaze_gt
+def angular(gaze, label):
+    total = np.sum(gaze * label)
+    return np.arccos(min(total/(np.linalg.norm(gaze)* np.linalg.norm(label)), 0.9999999))*180/np.pi
+def select_device(device='', batch_size=None):
+    # device = 'cpu' or '0' or '0,1,2,3'
+    s = f'YOLOv3 🚀 {git_describe() or date_modified()} torch {torch.__version__} '  # string
+    cpu = device.lower() == 'cpu'
+    if cpu:
+        os.environ['CUDA_VISIBLE_DEVICES'] = '-1'  # force torch.cuda.is_available() = False
+    elif device:  # non-cpu device requested
+        os.environ['CUDA_VISIBLE_DEVICES'] = device  # set environment variable
+        # assert torch.cuda.is_available(), f'CUDA unavailable, invalid device {device} requested'  # check availability
+    cuda = not cpu and torch.cuda.is_available()
+    if cuda:
+        devices = device.split(',') if device else range(torch.cuda.device_count())  # i.e. 0,1,6,7
+        n = len(devices)  # device count
+        if n > 1 and batch_size:  # check batch_size is divisible by device_count
+            assert batch_size % n == 0, f'batch-size {batch_size} not multiple of GPU count {n}'
+        space = ' ' * len(s)
+        for i, d in enumerate(devices):
+            p = torch.cuda.get_device_properties(i)
+            s += f"{'' if i == 0 else space}CUDA:{d} ({p.name}, {p.total_memory / 1024 ** 2}MB)\n"  # bytes to MB
+    else:
+        s += 'CPU\n'
+    return torch.device('cuda:0' if cuda else 'cpu')
+def spherical2cartesial(x):
+    output = torch.zeros(x.size(0),3)
+    output[:,2] = -torch.cos(x[:,1])*torch.cos(x[:,0])
+    output[:,0] = torch.cos(x[:,1])*torch.sin(x[:,0])
+    output[:,1] = torch.sin(x[:,1])
+    return output
+def compute_angular_error(input,target):
+    input = spherical2cartesial(input)
+    target = spherical2cartesial(target)
+    input = input.view(-1,3,1)
+    target = target.view(-1,1,3)
+    output_dot = torch.bmm(target,input)
+    output_dot = output_dot.view(-1)
+    output_dot = torch.acos(output_dot)
+    output_dot = output_dot.data
+    output_dot = 180*torch.mean(output_dot)/math.pi
+    return output_dot
+def softmax_temperature(tensor, temperature):
+    result = torch.exp(tensor / temperature)
+    result = torch.div(result, torch.sum(result, 1).unsqueeze(1).expand_as(result))
+    return result
+def git_describe(path=Path(__file__).parent):  # path must be a directory
+    # return human-readable git description, i.e. v5.0-5-g3e25f1e https://git-scm.com/docs/git-describe
+    s = f'git -C {path} describe --tags --long --always'
+    try:
+        return subprocess.check_output(s, shell=True, stderr=subprocess.STDOUT).decode()[:-1]
+    except subprocess.CalledProcessError as e:
+        return ''  # not a git repository
+def getArch(arch,bins):
+    # Base network structure
+    if arch == 'ResNet18':
+        model = L2CS( torchvision.models.resnet.BasicBlock,[2, 2,  2, 2], bins)
+    elif arch == 'ResNet34':
+        model = L2CS( torchvision.models.resnet.BasicBlock,[3, 4,  6, 3], bins)
+    elif arch == 'ResNet101':
+        model = L2CS( torchvision.models.resnet.Bottleneck,[3, 4, 23, 3], bins)
+    elif arch == 'ResNet152':
+        model = L2CS( torchvision.models.resnet.Bottleneck,[3, 8, 36, 3], bins)
+    else:
+        if arch != 'ResNet50':
+            print('Invalid value for architecture is passed! '
+                'The default value of ResNet50 will be used instead!')
+        model = L2CS( torchvision.models.resnet.Bottleneck, [3, 4, 6,  3], bins)
+    return model

models/L2CS-Net/l2cs/vis.py ADDED Viewed

	@@ -0,0 +1,64 @@

+import cv2
+import numpy as np
+from .results import GazeResultContainer
+def draw_gaze(a,b,c,d,image_in, pitchyaw, thickness=2, color=(255, 255, 0),sclae=2.0):
+    """Draw gaze angle on given image with a given eye positions."""
+    image_out = image_in
+    (h, w) = image_in.shape[:2]
+    length = c
+    pos = (int(a+c / 2.0), int(b+d / 2.0))
+    if len(image_out.shape) == 2 or image_out.shape[2] == 1:
+        image_out = cv2.cvtColor(image_out, cv2.COLOR_GRAY2BGR)
+    dx = -length * np.sin(pitchyaw[0]) * np.cos(pitchyaw[1])
+    dy = -length * np.sin(pitchyaw[1])
+    cv2.arrowedLine(image_out, tuple(np.round(pos).astype(np.int32)),
+                   tuple(np.round([pos[0] + dx, pos[1] + dy]).astype(int)), color,
+                   thickness, cv2.LINE_AA, tipLength=0.18)
+    return image_out
+def draw_bbox(frame: np.ndarray, bbox: np.ndarray):
+    x_min=int(bbox[0])
+    if x_min < 0:
+        x_min = 0
+    y_min=int(bbox[1])
+    if y_min < 0:
+        y_min = 0
+    x_max=int(bbox[2])
+    y_max=int(bbox[3])
+    cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0,255,0), 1)
+    return frame
+def render(frame: np.ndarray, results: GazeResultContainer):
+    # Draw bounding boxes
+    for bbox in results.bboxes:
+        frame = draw_bbox(frame, bbox)
+    # Draw Gaze
+    for i in range(results.pitch.shape[0]):
+        bbox = results.bboxes[i]
+        pitch = results.pitch[i]
+        yaw = results.yaw[i]
+        # Extract safe min and max of x,y
+        x_min=int(bbox[0])
+        if x_min < 0:
+            x_min = 0
+        y_min=int(bbox[1])
+        if y_min < 0:
+            y_min = 0
+        x_max=int(bbox[2])
+        y_max=int(bbox[3])
+        # Compute sizes
+        bbox_width = x_max - x_min
+        bbox_height = y_max - y_min
+        draw_gaze(x_min,y_min,bbox_width, bbox_height,frame,(pitch,yaw),color=(0,0,255))
+    return frame

models/L2CS-Net/leave_one_out_eval.py ADDED Viewed

	@@ -0,0 +1,54 @@

+import os
+import argparse
+def parse_args():
+    """Parse input arguments."""
+    parser = argparse.ArgumentParser(
+        description='gaze estimation using binned loss function.')
+    parser.add_argument(
+        '--evalpath', dest='evalpath', help='path for evaluating gaze test.',
+        default="evaluation\L2CS-gaze360-_standard-10", type=str)
+    parser.add_argument(
+        '--respath', dest='respath', help='path for saving result.',
+        default="evaluation\L2CS-gaze360-_standard-10", type=str)
+if __name__ == '__main__':
+    args = parse_args()
+    evalpath =args.evalpath
+    respath=args.respath
+    if not os.path.exist(respath):
+            os.makedirs(respath)
+    with open(os.path.join(respath,"avg.log"), 'w') as outfile:
+        outfile.write("Average equal\n")
+        min=10.0
+        dirlist = os.listdir(evalpath)
+        dirlist.sort()
+        l=0.0
+        for j in range(50):
+            j=20
+            avg=0.0
+            h=j+3
+            for i in dirlist:
+                with open(evalpath+"/"+i+"/mpiigaze_binned.log") as myfile:
+                    x=list(myfile)[h]
+                    str1 = ""
+                    # traverse in the string
+                    for ele in x:
+                        str1 += ele
+                    split_string = str1.split("MAE:",1)[1]
+                    avg+=float(split_string)
+            avg=avg/15.0
+            if avg<min:
+                min=avg
+                l=j+1
+            outfile.write("epoch"+str(j+1)+"= "+str(avg)+"\n")
+        outfile.write("min angular error equal= "+str(min)+"at epoch= "+str(l)+"\n")
+    print(min)

models/L2CS-Net/models/L2CSNet_gaze360.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8a7f3480d868dd48261e1d59f915b0ef0bb33ea12ea00938fb2168f212080665
+size 95849977

models/L2CS-Net/models/README.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Path to pre-trained models

models/L2CS-Net/pyproject.toml ADDED Viewed

	@@ -0,0 +1,44 @@

+[project]
+name = "l2cs"
+version = "0.0.1"
+description = "The official PyTorch implementation of L2CS-Net for gaze estimation and tracking"
+authors = [
+    {name = "Ahmed Abderlrahman"},
+    {name = "Thorsten Hempel"}
+]
+license = {file = "LICENSE.txt"}
+readme = "README.md"
+requires-python = ">3.6"
+keywords = ["gaze", "estimation", "eye-tracking", "deep-learning", "pytorch"]
+classifiers = [
+    "Programming Language :: Python :: 3"
+]
+dependencies = [
+    'matplotlib>=3.3.4',
+    'numpy>=1.19.5',
+    'opencv-python>=4.5.5',
+    'pandas>=1.1.5',
+    'Pillow>=8.4.0',
+    'scipy>=1.5.4',
+    'torch>=1.10.1',
+    'torchvision>=0.11.2',
+    'face_detection@git+https://github.com/elliottzheng/face-detection'
+]
+[project.urls]
+homepath = "https://github.com/Ahmednull/L2CS-Net"
+repository = "https://github.com/Ahmednull/L2CS-Net"
+[build-system]
+requires = ["setuptools", "wheel"]
+build-backend = "setuptools.build_meta"
+# https://setuptools.pypa.io/en/stable/userguide/datafiles.html
+[tool.setuptools]
+include-package-data = true
+[tool.setuptools.packages.find]
+where = ["."]

models/L2CS-Net/test.py ADDED Viewed

	@@ -0,0 +1,284 @@

+import os, argparse
+import numpy as np
+import matplotlib.pyplot as plt
+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+from torch.utils.data import DataLoader
+from torchvision import transforms
+import torch.backends.cudnn as cudnn
+import torchvision
+from l2cs import select_device, natural_keys, gazeto3d, angular, getArch, L2CS, Gaze360, Mpiigaze
+def parse_args():
+    """Parse input arguments."""
+    parser = argparse.ArgumentParser(
+        description='Gaze estimation using L2CSNet .')
+     # Gaze360
+    parser.add_argument(
+        '--gaze360image_dir', dest='gaze360image_dir', help='Directory path for gaze images.',
+        default='datasets/Gaze360/Image', type=str)
+    parser.add_argument(
+        '--gaze360label_dir', dest='gaze360label_dir', help='Directory path for gaze labels.',
+        default='datasets/Gaze360/Label/test.label', type=str)
+    # mpiigaze
+    parser.add_argument(
+        '--gazeMpiimage_dir', dest='gazeMpiimage_dir', help='Directory path for gaze images.',
+        default='datasets/MPIIFaceGaze/Image', type=str)
+    parser.add_argument(
+        '--gazeMpiilabel_dir', dest='gazeMpiilabel_dir', help='Directory path for gaze labels.',
+        default='datasets/MPIIFaceGaze/Label', type=str)
+    # Important args -------------------------------------------------------------------------------------------------------
+    # ----------------------------------------------------------------------------------------------------------------------
+    parser.add_argument(
+        '--dataset', dest='dataset', help='gaze360, mpiigaze',
+        default= "gaze360", type=str)
+    parser.add_argument(
+        '--snapshot', dest='snapshot', help='Path to the folder contains models.',
+        default='output/snapshots/L2CS-gaze360-_loader-180-4-lr', type=str)
+    parser.add_argument(
+        '--evalpath', dest='evalpath', help='path for the output evaluating gaze test.',
+        default="evaluation/L2CS-gaze360-_loader-180-4-lr", type=str)
+    parser.add_argument(
+        '--gpu',dest='gpu_id', help='GPU device id to use [0]',
+        default="0", type=str)
+    parser.add_argument(
+        '--batch_size', dest='batch_size', help='Batch size.',
+        default=100, type=int)
+    parser.add_argument(
+        '--arch', dest='arch', help='Network architecture, can be: ResNet18, ResNet34, [ResNet50], ''ResNet101, ResNet152, Squeezenet_1_0, Squeezenet_1_1, MobileNetV2',
+        default='ResNet50', type=str)
+    # ---------------------------------------------------------------------------------------------------------------------
+    # Important args ------------------------------------------------------------------------------------------------------
+    args = parser.parse_args()
+    return args
+def getArch(arch,bins):
+    # Base network structure
+    if arch == 'ResNet18':
+        model = L2CS( torchvision.models.resnet.BasicBlock,[2, 2,  2, 2], bins)
+    elif arch == 'ResNet34':
+        model = L2CS( torchvision.models.resnet.BasicBlock,[3, 4,  6, 3], bins)
+    elif arch == 'ResNet101':
+        model = L2CS( torchvision.models.resnet.Bottleneck,[3, 4, 23, 3], bins)
+    elif arch == 'ResNet152':
+        model = L2CS( torchvision.models.resnet.Bottleneck,[3, 8, 36, 3], bins)
+    else:
+        if arch != 'ResNet50':
+            print('Invalid value for architecture is passed! '
+                'The default value of ResNet50 will be used instead!')
+        model = L2CS( torchvision.models.resnet.Bottleneck, [3, 4, 6,  3], bins)
+    return model
+if __name__ == '__main__':
+    args = parse_args()
+    cudnn.enabled = True
+    gpu = select_device(args.gpu_id, batch_size=args.batch_size)
+    batch_size=args.batch_size
+    arch=args.arch
+    data_set=args.dataset
+    evalpath =args.evalpath
+    snapshot_path = args.snapshot
+    bins=args.bins
+    angle=args.angle
+    bin_width=args.bin_width
+    transformations = transforms.Compose([
+        transforms.Resize(448),
+        transforms.ToTensor(),
+        transforms.Normalize(
+            mean=[0.485, 0.456, 0.406],
+            std=[0.229, 0.224, 0.225]
+        )
+    ])
+    if data_set=="gaze360":
+        gaze_dataset=Gaze360(args.gaze360label_dir,args.gaze360image_dir, transformations, 180, 4, train=False)
+        test_loader = torch.utils.data.DataLoader(
+            dataset=gaze_dataset,
+            batch_size=batch_size,
+            shuffle=False,
+            num_workers=4,
+            pin_memory=True)
+        if not os.path.exists(evalpath):
+            os.makedirs(evalpath)
+        # list all epochs for testing
+        folder = os.listdir(snapshot_path)
+        folder.sort(key=natural_keys)
+        softmax = nn.Softmax(dim=1)
+        with open(os.path.join(evalpath,data_set+".log"), 'w') as outfile:
+            configuration = f"\ntest configuration = gpu_id={gpu}, batch_size={batch_size}, model_arch={arch}\nStart testing dataset={data_set}----------------------------------------\n"
+            print(configuration)
+            outfile.write(configuration)
+            epoch_list=[]
+            avg_yaw=[]
+            avg_pitch=[]
+            avg_MAE=[]
+            for epochs in folder:
+                # Base network structure
+                model=getArch(arch, 90)
+                saved_state_dict = torch.load(os.path.join(snapshot_path, epochs))
+                model.load_state_dict(saved_state_dict)
+                model.cuda(gpu)
+                model.eval()
+                total = 0
+                idx_tensor = [idx for idx in range(90)]
+                idx_tensor = torch.FloatTensor(idx_tensor).cuda(gpu)
+                avg_error = .0
+                with torch.no_grad():
+                    for j, (images, labels, cont_labels, name) in enumerate(test_loader):
+                        images = Variable(images).cuda(gpu)
+                        total += cont_labels.size(0)
+                        label_pitch = cont_labels[:,0].float()*np.pi/180
+                        label_yaw = cont_labels[:,1].float()*np.pi/180
+                        gaze_pitch, gaze_yaw = model(images)
+                        # Binned predictions
+                        _, pitch_bpred = torch.max(gaze_pitch.data, 1)
+                        _, yaw_bpred = torch.max(gaze_yaw.data, 1)
+                        # Continuous predictions
+                        pitch_predicted = softmax(gaze_pitch)
+                        yaw_predicted = softmax(gaze_yaw)
+                        # mapping from binned (0 to 28) to angels (-180 to 180)
+                        pitch_predicted = torch.sum(pitch_predicted * idx_tensor, 1).cpu() * 4 - 180
+                        yaw_predicted = torch.sum(yaw_predicted * idx_tensor, 1).cpu() * 4 - 180
+                        pitch_predicted = pitch_predicted*np.pi/180
+                        yaw_predicted = yaw_predicted*np.pi/180
+                        for p,y,pl,yl in zip(pitch_predicted,yaw_predicted,label_pitch,label_yaw):
+                            avg_error += angular(gazeto3d([p,y]), gazeto3d([pl,yl]))
+                x = ''.join(filter(lambda i: i.isdigit(), epochs))
+                epoch_list.append(x)
+                avg_MAE.append(avg_error/total)
+                loger = f"[{epochs}---{args.dataset}] Total Num:{total},MAE:{avg_error/total}\n"
+                outfile.write(loger)
+                print(loger)
+        fig = plt.figure(figsize=(14, 8))
+        plt.xlabel('epoch')
+        plt.ylabel('avg')
+        plt.title('Gaze angular error')
+        plt.legend()
+        plt.plot(epoch_list, avg_MAE, color='k', label='mae')
+        fig.savefig(os.path.join(evalpath,data_set+".png"), format='png')
+        plt.show()
+    elif data_set=="mpiigaze":
+        model_used=getArch(arch, bins)
+        for fold in range(15):
+            folder = os.listdir(args.gazeMpiilabel_dir)
+            folder.sort()
+            testlabelpathombined = [os.path.join(args.gazeMpiilabel_dir, j) for j in folder]
+            gaze_dataset=Mpiigaze(testlabelpathombined,args.gazeMpiimage_dir, transformations, False, angle, fold)
+            test_loader = torch.utils.data.DataLoader(
+                dataset=gaze_dataset,
+                batch_size=batch_size,
+                shuffle=True,
+                num_workers=4,
+                pin_memory=True)
+            if not os.path.exists(os.path.join(evalpath, f"fold"+str(fold))):
+                os.makedirs(os.path.join(evalpath, f"fold"+str(fold)))
+            # list all epochs for testing
+            folder = os.listdir(os.path.join(snapshot_path,"fold"+str(fold)))
+            folder.sort(key=natural_keys)
+            softmax = nn.Softmax(dim=1)
+            with open(os.path.join(evalpath, os.path.join("fold"+str(fold), data_set+".log")), 'w') as outfile:
+                configuration = f"\ntest configuration equal gpu_id={gpu}, batch_size={batch_size}, model_arch={arch}\nStart testing dataset={data_set}, fold={fold}---------------------------------------\n"
+                print(configuration)
+                outfile.write(configuration)
+                epoch_list=[]
+                avg_MAE=[]
+                for epochs in folder:
+                    model=model_used
+                    saved_state_dict = torch.load(os.path.join(snapshot_path+"/fold"+str(fold),epochs))
+                    model= nn.DataParallel(model,device_ids=[0])
+                    model.load_state_dict(saved_state_dict)
+                    model.cuda(gpu)
+                    model.eval()
+                    total = 0
+                    idx_tensor = [idx for idx in range(28)]
+                    idx_tensor = torch.FloatTensor(idx_tensor).cuda(gpu)
+                    avg_error = .0
+                    with torch.no_grad():
+                        for j, (images, labels, cont_labels, name) in enumerate(test_loader):
+                            images = Variable(images).cuda(gpu)
+                            total += cont_labels.size(0)
+                            label_pitch = cont_labels[:,0].float()*np.pi/180
+                            label_yaw = cont_labels[:,1].float()*np.pi/180
+                            gaze_pitch, gaze_yaw = model(images)
+                            # Binned predictions
+                            _, pitch_bpred = torch.max(gaze_pitch.data, 1)
+                            _, yaw_bpred = torch.max(gaze_yaw.data, 1)
+                            # Continuous predictions
+                            pitch_predicted = softmax(gaze_pitch)
+                            yaw_predicted = softmax(gaze_yaw)
+                            # mapping from binned (0 to 28) to angels (-42 to 42)
+                            pitch_predicted = \
+                                torch.sum(pitch_predicted * idx_tensor, 1).cpu() * 3 - 42
+                            yaw_predicted = \
+                                torch.sum(yaw_predicted * idx_tensor, 1).cpu() * 3 - 42
+                            pitch_predicted = pitch_predicted*np.pi/180
+                            yaw_predicted = yaw_predicted*np.pi/180
+                            for p,y,pl,yl in zip(pitch_predicted, yaw_predicted, label_pitch, label_yaw):
+                                avg_error += angular(gazeto3d([p,y]), gazeto3d([pl,yl]))
+                    x = ''.join(filter(lambda i: i.isdigit(), epochs))
+                    epoch_list.append(x)
+                    avg_MAE.append(avg_error/ total)
+                    loger = f"[{epochs}---{args.dataset}] Total Num:{total},MAE:{avg_error/total} \n"
+                    outfile.write(loger)
+                    print(loger)
+            fig = plt.figure(figsize=(14, 8))
+            plt.xlabel('epoch')
+            plt.ylabel('avg')
+            plt.title('Gaze angular error')
+            plt.legend()
+            plt.plot(epoch_list, avg_MAE, color='k', label='mae')
+            fig.savefig(os.path.join(evalpath, os.path.join("fold"+str(fold), data_set+".png")), format='png')
+            # plt.show()

models/L2CS-Net/train.py ADDED Viewed

	@@ -0,0 +1,384 @@

+import os
+import argparse
+import time
+import torch.utils.model_zoo as model_zoo
+import torch
+import torch.nn as nn
+from torch.autograd import Variable
+from torch.utils.data import DataLoader
+from torchvision import transforms
+import torch.backends.cudnn as cudnn
+import torchvision
+from l2cs import L2CS, select_device, Gaze360, Mpiigaze
+def parse_args():
+    """Parse input arguments."""
+    parser = argparse.ArgumentParser(description='Gaze estimation using L2CSNet.')
+    # Gaze360
+    parser.add_argument(
+        '--gaze360image_dir', dest='gaze360image_dir', help='Directory path for gaze images.',
+        default='datasets/Gaze360/Image', type=str)
+    parser.add_argument(
+        '--gaze360label_dir', dest='gaze360label_dir', help='Directory path for gaze labels.',
+        default='datasets/Gaze360/Label/train.label', type=str)
+    # mpiigaze
+    parser.add_argument(
+        '--gazeMpiimage_dir', dest='gazeMpiimage_dir', help='Directory path for gaze images.',
+        default='datasets/MPIIFaceGaze/Image', type=str)
+    parser.add_argument(
+        '--gazeMpiilabel_dir', dest='gazeMpiilabel_dir', help='Directory path for gaze labels.',
+        default='datasets/MPIIFaceGaze/Label', type=str)
+    # Important args -------------------------------------------------------------------------------------------------------
+    # ----------------------------------------------------------------------------------------------------------------------
+    parser.add_argument(
+        '--dataset', dest='dataset', help='mpiigaze, rtgene, gaze360, ethgaze',
+        default= "gaze360", type=str)
+    parser.add_argument(
+        '--output', dest='output', help='Path of output models.',
+        default='output/snapshots/', type=str)
+    parser.add_argument(
+        '--snapshot', dest='snapshot', help='Path of model snapshot.',
+        default='', type=str)
+    parser.add_argument(
+        '--gpu', dest='gpu_id', help='GPU device id to use [0] or multiple 0,1,2,3',
+        default='0', type=str)
+    parser.add_argument(
+        '--num_epochs', dest='num_epochs', help='Maximum number of training epochs.',
+        default=60, type=int)
+    parser.add_argument(
+        '--batch_size', dest='batch_size', help='Batch size.',
+        default=1, type=int)
+    parser.add_argument(
+        '--arch', dest='arch', help='Network architecture, can be: ResNet18, ResNet34, [ResNet50], ''ResNet101, ResNet152, Squeezenet_1_0, Squeezenet_1_1, MobileNetV2',
+        default='ResNet50', type=str)
+    parser.add_argument(
+        '--alpha', dest='alpha', help='Regression loss coefficient.',
+        default=1, type=float)
+    parser.add_argument(
+        '--lr', dest='lr', help='Base learning rate.',
+        default=0.00001, type=float)
+    # ---------------------------------------------------------------------------------------------------------------------
+    # Important args ------------------------------------------------------------------------------------------------------
+    args = parser.parse_args()
+    return args
+def get_ignored_params(model):
+    # Generator function that yields ignored params.
+    b = [model.conv1, model.bn1, model.fc_finetune]
+    for i in range(len(b)):
+        for module_name, module in b[i].named_modules():
+            if 'bn' in module_name:
+                module.eval()
+            for name, param in module.named_parameters():
+                yield param
+def get_non_ignored_params(model):
+    # Generator function that yields params that will be optimized.
+    b = [model.layer1, model.layer2, model.layer3, model.layer4]
+    for i in range(len(b)):
+        for module_name, module in b[i].named_modules():
+            if 'bn' in module_name:
+                module.eval()
+            for name, param in module.named_parameters():
+                yield param
+def get_fc_params(model):
+    # Generator function that yields fc layer params.
+    b = [model.fc_yaw_gaze, model.fc_pitch_gaze]
+    for i in range(len(b)):
+        for module_name, module in b[i].named_modules():
+            for name, param in module.named_parameters():
+                yield param
+def load_filtered_state_dict(model, snapshot):
+    # By user apaszke from discuss.pytorch.org
+    model_dict = model.state_dict()
+    snapshot = {k: v for k, v in snapshot.items() if k in model_dict}
+    model_dict.update(snapshot)
+    model.load_state_dict(model_dict)
+def getArch_weights(arch, bins):
+    if arch == 'ResNet18':
+        model = L2CS(torchvision.models.resnet.BasicBlock, [2, 2, 2, 2], bins)
+        pre_url = 'https://download.pytorch.org/models/resnet18-5c106cde.pth'
+    elif arch == 'ResNet34':
+        model = L2CS(torchvision.models.resnet.BasicBlock, [3, 4, 6, 3], bins)
+        pre_url = 'https://download.pytorch.org/models/resnet34-333f7ec4.pth'
+    elif arch == 'ResNet101':
+        model = L2CS(torchvision.models.resnet.Bottleneck, [3, 4, 23, 3], bins)
+        pre_url = 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth'
+    elif arch == 'ResNet152':
+        model = L2CS(torchvision.models.resnet.Bottleneck,[3, 8, 36, 3], bins)
+        pre_url = 'https://download.pytorch.org/models/resnet152-b121ed2d.pth'
+    else:
+        if arch != 'ResNet50':
+            print('Invalid value for architecture is passed! '
+                  'The default value of ResNet50 will be used instead!')
+        model = L2CS(torchvision.models.resnet.Bottleneck, [3, 4, 6, 3], bins)
+        pre_url = 'https://download.pytorch.org/models/resnet50-19c8e357.pth'
+    return model, pre_url
+if __name__ == '__main__':
+    args = parse_args()
+    cudnn.enabled = True
+    num_epochs = args.num_epochs
+    batch_size = args.batch_size
+    gpu = select_device(args.gpu_id, batch_size=args.batch_size)
+    data_set=args.dataset
+    alpha = args.alpha
+    output=args.output
+    transformations = transforms.Compose([
+        transforms.Resize(448),
+        transforms.ToTensor(),
+        transforms.Normalize(
+            mean=[0.485, 0.456, 0.406],
+            std=[0.229, 0.224, 0.225]
+        )
+    ])
+    if data_set=="gaze360":
+        model, pre_url = getArch_weights(args.arch, 90)
+        if args.snapshot == '':
+            load_filtered_state_dict(model, model_zoo.load_url(pre_url))
+        else:
+            saved_state_dict = torch.load(args.snapshot)
+            model.load_state_dict(saved_state_dict)
+        model.cuda(gpu)
+        dataset=Gaze360(args.gaze360label_dir, args.gaze360image_dir, transformations, 180, 4)
+        print('Loading data.')
+        train_loader_gaze = DataLoader(
+            dataset=dataset,
+            batch_size=int(batch_size),
+            shuffle=True,
+            num_workers=0,
+            pin_memory=True)
+        torch.backends.cudnn.benchmark = True
+        summary_name = '{}_{}'.format('L2CS-gaze360-', int(time.time()))
+        output=os.path.join(output, summary_name)
+        if not os.path.exists(output):
+            os.makedirs(output)
+        criterion = nn.CrossEntropyLoss().cuda(gpu)
+        reg_criterion = nn.MSELoss().cuda(gpu)
+        softmax = nn.Softmax(dim=1).cuda(gpu)
+        idx_tensor = [idx for idx in range(90)]
+        idx_tensor = Variable(torch.FloatTensor(idx_tensor)).cuda(gpu)
+        # Optimizer gaze
+        optimizer_gaze = torch.optim.Adam([
+            {'params': get_ignored_params(model), 'lr': 0},
+            {'params': get_non_ignored_params(model), 'lr': args.lr},
+            {'params': get_fc_params(model), 'lr': args.lr}
+        ], args.lr)
+        configuration = f"\ntrain configuration, gpu_id={args.gpu_id}, batch_size={batch_size}, model_arch={args.arch}\nStart testing dataset={data_set}, loader={len(train_loader_gaze)}------------------------- \n"
+        print(configuration)
+        for epoch in range(num_epochs):
+            sum_loss_pitch_gaze = sum_loss_yaw_gaze = iter_gaze = 0
+            for i, (images_gaze, labels_gaze, cont_labels_gaze,name) in enumerate(train_loader_gaze):
+                images_gaze = Variable(images_gaze).cuda(gpu)
+                # Binned labels
+                label_pitch_gaze = Variable(labels_gaze[:, 0]).cuda(gpu)
+                label_yaw_gaze = Variable(labels_gaze[:, 1]).cuda(gpu)
+                # Continuous labels
+                label_pitch_cont_gaze = Variable(cont_labels_gaze[:, 0]).cuda(gpu)
+                label_yaw_cont_gaze = Variable(cont_labels_gaze[:, 1]).cuda(gpu)
+                pitch, yaw = model(images_gaze)
+                # Cross entropy loss
+                loss_pitch_gaze = criterion(pitch, label_pitch_gaze)
+                loss_yaw_gaze = criterion(yaw, label_yaw_gaze)
+                # MSE loss
+                pitch_predicted = softmax(pitch)
+                yaw_predicted = softmax(yaw)
+                pitch_predicted = \
+                    torch.sum(pitch_predicted * idx_tensor, 1) * 4 - 180
+                yaw_predicted = \
+                    torch.sum(yaw_predicted * idx_tensor, 1) * 4 - 180
+                loss_reg_pitch = reg_criterion(
+                    pitch_predicted, label_pitch_cont_gaze)
+                loss_reg_yaw = reg_criterion(
+                    yaw_predicted, label_yaw_cont_gaze)
+                # Total loss
+                loss_pitch_gaze += alpha * loss_reg_pitch
+                loss_yaw_gaze += alpha * loss_reg_yaw
+                sum_loss_pitch_gaze += loss_pitch_gaze
+                sum_loss_yaw_gaze += loss_yaw_gaze
+                loss_seq = [loss_pitch_gaze, loss_yaw_gaze]
+                grad_seq = [torch.tensor(1.0).cuda(gpu) for _ in range(len(loss_seq))]
+                optimizer_gaze.zero_grad(set_to_none=True)
+                torch.autograd.backward(loss_seq, grad_seq)
+                optimizer_gaze.step()
+                # scheduler.step()
+                iter_gaze += 1
+                if (i+1) % 100 == 0:
+                    print('Epoch [%d/%d], Iter [%d/%d] Losses: '
+                        'Gaze Yaw %.4f,Gaze Pitch %.4f' % (
+                            epoch+1,
+                            num_epochs,
+                            i+1,
+                            len(dataset)//batch_size,
+                            sum_loss_pitch_gaze/iter_gaze,
+                            sum_loss_yaw_gaze/iter_gaze
+                        )
+                        )
+            if epoch % 1 == 0 and epoch < num_epochs:
+                print('Taking snapshot...',
+                    torch.save(model.state_dict(),
+                                output +'/'+
+                                '_epoch_' + str(epoch+1) + '.pkl')
+                    )
+    elif data_set=="mpiigaze":
+        folder = os.listdir(args.gazeMpiilabel_dir)
+        folder.sort()
+        testlabelpathombined = [os.path.join(args.gazeMpiilabel_dir, j) for j in folder]
+        for fold in range(15):
+            model, pre_url = getArch_weights(args.arch, 28)
+            load_filtered_state_dict(model, model_zoo.load_url(pre_url))
+            model = nn.DataParallel(model)
+            model.to(gpu)
+            print('Loading data.')
+            dataset=Mpiigaze(testlabelpathombined,args.gazeMpiimage_dir, transformations, True, fold)
+            train_loader_gaze = DataLoader(
+                dataset=dataset,
+                batch_size=int(batch_size),
+                shuffle=True,
+                num_workers=4,
+                pin_memory=True)
+            torch.backends.cudnn.benchmark = True
+            summary_name = '{}_{}'.format('L2CS-mpiigaze', int(time.time()))
+            if not os.path.exists(os.path.join(output+'/{}'.format(summary_name),'fold' + str(fold))):
+                os.makedirs(os.path.join(output+'/{}'.format(summary_name),'fold' + str(fold)))
+            criterion = nn.CrossEntropyLoss().cuda(gpu)
+            reg_criterion = nn.MSELoss().cuda(gpu)
+            softmax = nn.Softmax(dim=1).cuda(gpu)
+            idx_tensor = [idx for idx in range(28)]
+            idx_tensor = Variable(torch.FloatTensor(idx_tensor)).cuda(gpu)
+            # Optimizer gaze
+            optimizer_gaze = torch.optim.Adam([
+                {'params': get_ignored_params(model, args.arch), 'lr': 0},
+                {'params': get_non_ignored_params(model, args.arch), 'lr': args.lr},
+                {'params': get_fc_params(model, args.arch), 'lr': args.lr}
+            ], args.lr)
+            configuration = f"\ntrain configuration, gpu_id={args.gpu_id}, batch_size={batch_size}, model_arch={args.arch}\n Start training dataset={data_set}, loader={len(train_loader_gaze)}, fold={fold}--------------\n"
+            print(configuration)
+            for epoch in range(num_epochs):
+                sum_loss_pitch_gaze = sum_loss_yaw_gaze = iter_gaze = 0
+                for i, (images_gaze, labels_gaze, cont_labels_gaze,name) in enumerate(train_loader_gaze):
+                    images_gaze = Variable(images_gaze).cuda(gpu)
+                    # Binned labels
+                    label_pitch_gaze = Variable(labels_gaze[:, 0]).cuda(gpu)
+                    label_yaw_gaze = Variable(labels_gaze[:, 1]).cuda(gpu)
+                    # Continuous labels
+                    label_pitch_cont_gaze = Variable(cont_labels_gaze[:, 0]).cuda(gpu)
+                    label_yaw_cont_gaze = Variable(cont_labels_gaze[:, 1]).cuda(gpu)
+                    pitch, yaw = model(images_gaze)
+                    # Cross entropy loss
+                    loss_pitch_gaze = criterion(pitch, label_pitch_gaze)
+                    loss_yaw_gaze = criterion(yaw, label_yaw_gaze)
+                    # MSE loss
+                    pitch_predicted = softmax(pitch)
+                    yaw_predicted = softmax(yaw)
+                    pitch_predicted = \
+                        torch.sum(pitch_predicted * idx_tensor, 1) * 3 - 42
+                    yaw_predicted = \
+                        torch.sum(yaw_predicted * idx_tensor, 1) * 3 - 42
+                    loss_reg_pitch = reg_criterion(
+                        pitch_predicted, label_pitch_cont_gaze)
+                    loss_reg_yaw = reg_criterion(
+                        yaw_predicted, label_yaw_cont_gaze)
+                    # Total loss
+                    loss_pitch_gaze += alpha * loss_reg_pitch
+                    loss_yaw_gaze += alpha * loss_reg_yaw
+                    sum_loss_pitch_gaze += loss_pitch_gaze
+                    sum_loss_yaw_gaze += loss_yaw_gaze
+                    loss_seq = [loss_pitch_gaze, loss_yaw_gaze]
+                    grad_seq = \
+                        [torch.tensor(1.0).cuda(gpu) for _ in range(len(loss_seq))]
+                    optimizer_gaze.zero_grad(set_to_none=True)
+                    torch.autograd.backward(loss_seq, grad_seq)
+                    optimizer_gaze.step()
+                    iter_gaze += 1
+                    if (i+1) % 100 == 0:
+                        print('Epoch [%d/%d], Iter [%d/%d] Losses: '
+                            'Gaze Yaw %.4f,Gaze Pitch %.4f' % (
+                                epoch+1,
+                                num_epochs,
+                                i+1,
+                                len(dataset)//batch_size,
+                                sum_loss_pitch_gaze/iter_gaze,
+                                sum_loss_yaw_gaze/iter_gaze
+                            )
+                            )
+                # Save models at numbered epochs.
+                if epoch % 1 == 0 and epoch < num_epochs:
+                    print('Taking snapshot...',
+                        torch.save(model.state_dict(),
+                                    output+'/fold' + str(fold) +'/'+
+                                    '_epoch_' + str(epoch+1) + '.pkl')
+                        )

models/gaze_calibration.py ADDED Viewed

	@@ -0,0 +1,146 @@

+# 9-point gaze calibration for L2CS-Net
+# Maps raw gaze angles -> normalised screen coords via polynomial least-squares.
+# Centre point is the bias reference (subtracted from all readings).
+import numpy as np
+from dataclasses import dataclass, field
+# 3x3 grid, centre first (bias ref), then row by row
+DEFAULT_TARGETS = [
+    (0.5, 0.5),
+    (0.15, 0.15), (0.50, 0.15), (0.85, 0.15),
+    (0.15, 0.50),                (0.85, 0.50),
+    (0.15, 0.85), (0.50, 0.85), (0.85, 0.85),
+]
+@dataclass
+class _PointSamples:
+    target_x: float
+    target_y: float
+    yaws: list = field(default_factory=list)
+    pitches: list = field(default_factory=list)
+def _iqr_filter(values):
+    if len(values) < 4:
+        return values
+    arr = np.array(values)
+    q1, q3 = np.percentile(arr, [25, 75])
+    iqr = q3 - q1
+    lo, hi = q1 - 1.5 * iqr, q3 + 1.5 * iqr
+    return arr[(arr >= lo) & (arr <= hi)].tolist()
+class GazeCalibration:
+    def __init__(self, targets=None):
+        self._targets = targets or list(DEFAULT_TARGETS)
+        self._points = [_PointSamples(tx, ty) for tx, ty in self._targets]
+        self._current_idx = 0
+        self._fitted = False
+        self._W = None          # (6, 2) polynomial weights
+        self._yaw_bias = 0.0
+        self._pitch_bias = 0.0
+    @property
+    def num_points(self):
+        return len(self._targets)
+    @property
+    def current_index(self):
+        return self._current_idx
+    @property
+    def current_target(self):
+        if self._current_idx < len(self._targets):
+            return self._targets[self._current_idx]
+        return self._targets[-1]
+    @property
+    def is_complete(self):
+        return self._current_idx >= len(self._targets)
+    @property
+    def is_fitted(self):
+        return self._fitted
+    def collect_sample(self, yaw_rad, pitch_rad):
+        if self._current_idx >= len(self._points):
+            return
+        pt = self._points[self._current_idx]
+        pt.yaws.append(float(yaw_rad))
+        pt.pitches.append(float(pitch_rad))
+    def advance(self):
+        self._current_idx += 1
+        return self._current_idx < len(self._targets)
+    @staticmethod
+    def _poly_features(yaw, pitch):
+        # [yaw^2, pitch^2, yaw*pitch, yaw, pitch, 1]
+        return np.array([yaw**2, pitch**2, yaw * pitch, yaw, pitch, 1.0],
+                        dtype=np.float64)
+    def fit(self):
+        # bias from centre point (index 0)
+        center = self._points[0]
+        center_yaws = _iqr_filter(center.yaws)
+        center_pitches = _iqr_filter(center.pitches)
+        if len(center_yaws) < 2 or len(center_pitches) < 2:
+            return False
+        self._yaw_bias = float(np.median(center_yaws))
+        self._pitch_bias = float(np.median(center_pitches))
+        rows_A, rows_B = [], []
+        for pt in self._points:
+            clean_yaws = _iqr_filter(pt.yaws)
+            clean_pitches = _iqr_filter(pt.pitches)
+            if len(clean_yaws) < 2 or len(clean_pitches) < 2:
+                continue
+            med_yaw = float(np.median(clean_yaws)) - self._yaw_bias
+            med_pitch = float(np.median(clean_pitches)) - self._pitch_bias
+            rows_A.append(self._poly_features(med_yaw, med_pitch))
+            rows_B.append([pt.target_x, pt.target_y])
+        if len(rows_A) < 5:
+            return False
+        A = np.array(rows_A, dtype=np.float64)
+        B = np.array(rows_B, dtype=np.float64)
+        try:
+            W, _, _, _ = np.linalg.lstsq(A, B, rcond=None)
+            self._W = W
+            self._fitted = True
+            return True
+        except np.linalg.LinAlgError:
+            return False
+    def predict(self, yaw_rad, pitch_rad):
+        if not self._fitted or self._W is None:
+            return 0.5, 0.5
+        feat = self._poly_features(yaw_rad - self._yaw_bias, pitch_rad - self._pitch_bias)
+        xy = feat @ self._W
+        return float(np.clip(xy[0], 0, 1)), float(np.clip(xy[1], 0, 1))
+    def to_dict(self):
+        return {
+            "targets": self._targets,
+            "fitted": self._fitted,
+            "current_index": self._current_idx,
+            "W": self._W.tolist() if self._W is not None else None,
+            "yaw_bias": self._yaw_bias,
+            "pitch_bias": self._pitch_bias,
+        }
+    @classmethod
+    def from_dict(cls, d):
+        cal = cls(targets=d.get("targets", DEFAULT_TARGETS))
+        cal._fitted = d.get("fitted", False)
+        cal._current_idx = d.get("current_index", 0)
+        cal._yaw_bias = d.get("yaw_bias", 0.0)
+        cal._pitch_bias = d.get("pitch_bias", 0.0)
+        w = d.get("W")
+        if w is not None:
+            cal._W = np.array(w, dtype=np.float64)
+        return cal

models/gaze_eye_fusion.py ADDED Viewed

	@@ -0,0 +1,66 @@

+# Fuses calibrated gaze position with eye openness (EAR) for focus detection.
+# Takes L2CS gaze angles + MediaPipe landmarks, outputs screen coords + focus decision.
+import math
+import numpy as np
+from .gaze_calibration import GazeCalibration
+from .eye_scorer import compute_avg_ear
+_EAR_BLINK = 0.18
+_ON_SCREEN_MARGIN = 0.08
+class GazeEyeFusion:
+    def __init__(self, calibration, ear_weight=0.3, gaze_weight=0.7, focus_threshold=0.52):
+        if not calibration.is_fitted:
+            raise ValueError("Calibration must be fitted first")
+        self._cal = calibration
+        self._ear_w = ear_weight
+        self._gaze_w = gaze_weight
+        self._threshold = focus_threshold
+        self._smooth_x = 0.5
+        self._smooth_y = 0.5
+        self._alpha = 0.5
+    def update(self, yaw_rad, pitch_rad, landmarks):
+        gx, gy = self._cal.predict(yaw_rad, pitch_rad)
+        # EMA smooth the gaze position
+        self._smooth_x += self._alpha * (gx - self._smooth_x)
+        self._smooth_y += self._alpha * (gy - self._smooth_y)
+        gx, gy = self._smooth_x, self._smooth_y
+        on_screen = (
+            -_ON_SCREEN_MARGIN <= gx <= 1.0 + _ON_SCREEN_MARGIN and
+            -_ON_SCREEN_MARGIN <= gy <= 1.0 + _ON_SCREEN_MARGIN
+        )
+        ear = None
+        ear_score = 1.0
+        if landmarks is not None:
+            ear = compute_avg_ear(landmarks)
+            ear_score = 0.0 if ear < _EAR_BLINK else min(ear / 0.30, 1.0)
+        # penalise gaze near screen edges
+        gaze_score = 1.0 if on_screen else 0.0
+        if on_screen:
+            dx = max(0.0, abs(gx - 0.5) - 0.3)
+            dy = max(0.0, abs(gy - 0.5) - 0.3)
+            gaze_score = max(0.0, 1.0 - math.sqrt(dx**2 + dy**2) * 5.0)
+        score = float(np.clip(self._gaze_w * gaze_score + self._ear_w * ear_score, 0, 1))
+        return {
+            "gaze_x": round(float(gx), 4),
+            "gaze_y": round(float(gy), 4),
+            "on_screen": on_screen,
+            "ear": round(ear, 4) if ear is not None else None,
+            "focus_score": round(score, 4),
+            "focused": score >= self._threshold,
+        }
+    def reset(self):
+        self._smooth_x = 0.5
+        self._smooth_y = 0.5

requirements.txt CHANGED Viewed

@@ -20,3 +20,5 @@ xgboost>=2.0.0
 clearml>=2.0.2
 pytest>=9.0.0
 pytest-cov>=5.0.0

 clearml>=2.0.2
 pytest>=9.0.0
 pytest-cov>=5.0.0
+face_detection @ git+https://github.com/elliottzheng/face-detection
+gdown>=5.0.0

src/components/CalibrationOverlay.jsx ADDED Viewed

	@@ -0,0 +1,146 @@

+import React, { useState, useEffect, useRef, useCallback } from 'react';
+const COLLECT_MS = 2000;
+const CENTER_MS = 3000; // centre point gets extra time (bias reference)
+function CalibrationOverlay({ calibration, videoManager }) {
+  const [progress, setProgress] = useState(0);
+  const timerRef = useRef(null);
+  const startRef = useRef(null);
+  const overlayRef = useRef(null);
+  const enterFullscreen = useCallback(() => {
+    const el = overlayRef.current;
+    if (!el) return;
+    const req = el.requestFullscreen || el.webkitRequestFullscreen || el.msRequestFullscreen;
+    if (req) req.call(el).catch(() => {});
+  }, []);
+  const exitFullscreen = useCallback(() => {
+    if (document.fullscreenElement || document.webkitFullscreenElement) {
+      const exit = document.exitFullscreen || document.webkitExitFullscreen || document.msExitFullscreen;
+      if (exit) exit.call(document).catch(() => {});
+    }
+  }, []);
+  useEffect(() => {
+    if (calibration && calibration.active && !calibration.done) {
+      const t = setTimeout(enterFullscreen, 100);
+      return () => clearTimeout(t);
+    }
+  }, [calibration?.active]);
+  useEffect(() => {
+    if (!calibration || !calibration.active) exitFullscreen();
+  }, [calibration?.active]);
+  useEffect(() => {
+    if (!calibration || !calibration.collecting || calibration.done) {
+      setProgress(0);
+      if (timerRef.current) cancelAnimationFrame(timerRef.current);
+      return;
+    }
+    startRef.current = performance.now();
+    const duration = calibration.index === 0 ? CENTER_MS : COLLECT_MS;
+    const tick = () => {
+      const pct = Math.min((performance.now() - startRef.current) / duration, 1);
+      setProgress(pct);
+      if (pct >= 1) {
+        if (videoManager) videoManager.nextCalibrationPoint();
+        startRef.current = performance.now();
+        setProgress(0);
+      }
+      timerRef.current = requestAnimationFrame(tick);
+    };
+    timerRef.current = requestAnimationFrame(tick);
+    return () => { if (timerRef.current) cancelAnimationFrame(timerRef.current); };
+  }, [calibration?.index, calibration?.collecting, calibration?.done]);
+  const handleCancel = () => {
+    if (videoManager) videoManager.cancelCalibration();
+    exitFullscreen();
+  };
+  if (!calibration || !calibration.active) return null;
+  if (calibration.done) {
+    return (
+      <div ref={overlayRef} style={overlayStyle}>
+        <div style={messageBoxStyle}>
+          <h2 style={{ margin: '0 0 10px', color: calibration.success ? '#4ade80' : '#f87171' }}>
+            {calibration.success ? 'Calibration Complete' : 'Calibration Failed'}
+          </h2>
+          <p style={{ color: '#ccc', margin: 0 }}>
+            {calibration.success
+              ? 'Gaze tracking is now active.'
+              : 'Not enough samples collected. Try again.'}
+          </p>
+        </div>
+      </div>
+    );
+  }
+  const [tx, ty] = calibration.target || [0.5, 0.5];
+  return (
+    <div ref={overlayRef} style={overlayStyle}>
+      <div style={{
+        position: 'absolute', top: '30px', left: '50%', transform: 'translateX(-50%)',
+        color: '#fff', fontSize: '16px', textAlign: 'center',
+        textShadow: '0 0 8px rgba(0,0,0,0.8)', pointerEvents: 'none',
+      }}>
+        <div style={{ fontWeight: 'bold', fontSize: '20px' }}>
+          Look at the dot ({calibration.index + 1}/{calibration.numPoints})
+        </div>
+        <div style={{ fontSize: '14px', color: '#aaa', marginTop: '6px' }}>
+          {calibration.index === 0
+            ? 'Look at the center dot - this sets your baseline'
+            : 'Hold your gaze steady on the target'}
+        </div>
+      </div>
+      <div style={{
+        position: 'absolute', left: `${tx * 100}%`, top: `${ty * 100}%`,
+        transform: 'translate(-50%, -50%)',
+      }}>
+        <svg width="60" height="60" style={{ position: 'absolute', left: '-30px', top: '-30px' }}>
+          <circle cx="30" cy="30" r="24" fill="none" stroke="rgba(255,255,255,0.15)" strokeWidth="3" />
+          <circle cx="30" cy="30" r="24" fill="none" stroke="#4ade80" strokeWidth="3"
+            strokeDasharray={`${progress * 150.8} 150.8`} strokeLinecap="round"
+            transform="rotate(-90, 30, 30)" />
+        </svg>
+        <div style={{
+          width: '20px', height: '20px', borderRadius: '50%',
+          background: 'radial-gradient(circle, #fff 30%, #4ade80 100%)',
+          boxShadow: '0 0 20px rgba(74, 222, 128, 0.8)',
+        }} />
+      </div>
+      <button onClick={handleCancel} style={{
+        position: 'absolute', bottom: '40px', left: '50%', transform: 'translateX(-50%)',
+        padding: '10px 28px', background: 'rgba(255,255,255,0.1)',
+        border: '1px solid rgba(255,255,255,0.3)', color: '#fff',
+        borderRadius: '20px', cursor: 'pointer', fontSize: '14px',
+      }}>
+        Cancel Calibration
+      </button>
+    </div>
+  );
+}
+const overlayStyle = {
+  position: 'fixed', top: 0, left: 0, width: '100vw', height: '100vh',
+  background: 'rgba(0, 0, 0, 0.92)', zIndex: 10000,
+  display: 'flex', alignItems: 'center', justifyContent: 'center',
+};
+const messageBoxStyle = {
+  textAlign: 'center', padding: '30px 40px',
+  background: 'rgba(30, 30, 50, 0.9)', borderRadius: '16px',
+  border: '1px solid rgba(255,255,255,0.1)',
+};
+export default CalibrationOverlay;

src/components/FocusPageLocal.jsx CHANGED Viewed

@@ -1,4 +1,5 @@
 import React, { useState, useEffect, useRef } from 'react';
 const FLOW_STEPS = {
   intro: 'intro',
@@ -48,6 +49,9 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
   const [isStarting, setIsStarting] = useState(false);
   const [focusState, setFocusState] = useState(FOCUS_STATES.pending);
   const [cameraError, setCameraError] = useState('');
   const localVideoRef = useRef(null);
   const displayCanvasRef = useRef(null);
@@ -127,6 +131,8 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
       setFocusState(FOCUS_STATES.pending);
       setCameraReady(false);
       if (originalOnSessionEnd) originalOnSessionEnd(summary);
     };
     const statsInterval = setInterval(() => {
@@ -136,8 +142,10 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
     }, 1000);
     return () => {
-      videoManager.callbacks.onStatusUpdate = originalOnStatusUpdate;
-      videoManager.callbacks.onSessionEnd = originalOnSessionEnd;
       clearInterval(statsInterval);
     };
   }, [videoManager]);
@@ -149,6 +157,8 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
       .then((data) => {
         if (data.available) setAvailableModels(data.available);
         if (data.current) setCurrentModel(data.current);
       })
       .catch((err) => console.error('Failed to fetch models:', err));
   }, []);
@@ -204,6 +214,8 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
       const result = await res.json();
       if (result.updated) {
         setCurrentModel(modelName);
       }
     } catch (err) {
       console.error('Failed to switch model:', err);
@@ -225,6 +237,21 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
       console.error('Camera init error:', err);
     }
   };
   const handleStart = async () => {
     try {
       setIsStarting(true);
@@ -697,6 +724,65 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
         }}>
           <span title="Server CPU">CPU: <strong style={{ color: '#8f8' }}>{systemStats.cpu_percent}%</strong></span>
           <span title="Server memory">RAM: <strong style={{ color: '#8af' }}>{systemStats.memory_percent}%</strong> ({systemStats.memory_used_mb}/{systemStats.memory_total_mb} MB)</span>
         </section>
       )}
@@ -787,6 +873,58 @@ function FocusPageLocal({ videoManager, sessionResult, setSessionResult, isActiv
           </section>
         </>
       ) : null}
     </main>
   );
 }

 import React, { useState, useEffect, useRef } from 'react';
+import CalibrationOverlay from './CalibrationOverlay';
 const FLOW_STEPS = {
   intro: 'intro',
   const [isStarting, setIsStarting] = useState(false);
   const [focusState, setFocusState] = useState(FOCUS_STATES.pending);
   const [cameraError, setCameraError] = useState('');
+  const [calibration, setCalibration] = useState(null);
+  const [l2csBoost, setL2csBoost] = useState(false);
+  const [l2csBoostAvailable, setL2csBoostAvailable] = useState(false);
   const localVideoRef = useRef(null);
   const displayCanvasRef = useRef(null);
       setFocusState(FOCUS_STATES.pending);
       setCameraReady(false);
       if (originalOnSessionEnd) originalOnSessionEnd(summary);
+    videoManager.callbacks.onCalibrationUpdate = (cal) => {
+      setCalibration(cal && cal.active ? { ...cal } : null);
     };
     const statsInterval = setInterval(() => {
     }, 1000);
     return () => {
+      if (videoManager) {
+        videoManager.callbacks.onStatusUpdate = originalOnStatusUpdate;
+        videoManager.callbacks.onCalibrationUpdate = null;
+      }
       clearInterval(statsInterval);
     };
   }, [videoManager]);
       .then((data) => {
         if (data.available) setAvailableModels(data.available);
         if (data.current) setCurrentModel(data.current);
+        if (data.l2cs_boost !== undefined) setL2csBoost(data.l2cs_boost);
+        if (data.l2cs_boost_available !== undefined) setL2csBoostAvailable(data.l2cs_boost_available);
       })
       .catch((err) => console.error('Failed to fetch models:', err));
   }, []);
       const result = await res.json();
       if (result.updated) {
         setCurrentModel(modelName);
+        setL2csBoostAvailable(modelName !== 'l2cs' && availableModels.includes('l2cs'));
+        if (modelName === 'l2cs') setL2csBoost(false);
       }
     } catch (err) {
       console.error('Failed to switch model:', err);
       console.error('Camera init error:', err);
     }
   };
+  const handleBoostToggle = async () => {
+    const next = !l2csBoost;
+    try {
+      const res = await fetch('/api/settings', {
+        method: 'PUT',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ l2cs_boost: next })
+      });
+      if (res.ok) setL2csBoost(next);
+    } catch (err) {
+      console.error('Failed to toggle L2CS boost:', err);
+    }
+  };
   const handleStart = async () => {
     try {
       setIsStarting(true);
         }}>
           <span title="Server CPU">CPU: <strong style={{ color: '#8f8' }}>{systemStats.cpu_percent}%</strong></span>
           <span title="Server memory">RAM: <strong style={{ color: '#8af' }}>{systemStats.memory_percent}%</strong> ({systemStats.memory_used_mb}/{systemStats.memory_total_mb} MB)</span>
+          <span style={{ color: '#aaa', fontSize: '13px', marginRight: '4px' }}>Model:</span>
+          {availableModels.map(name => (
+            <button
+              key={name}
+              onClick={() => handleModelChange(name)}
+              style={{
+                padding: '5px 14px',
+                borderRadius: '16px',
+                border: currentModel === name ? '2px solid #007BFF' : '1px solid #555',
+                background: currentModel === name ? '#007BFF' : 'transparent',
+                color: currentModel === name ? '#fff' : '#ccc',
+                fontSize: '12px',
+                fontWeight: currentModel === name ? 'bold' : 'normal',
+                cursor: 'pointer',
+                textTransform: 'uppercase',
+                transition: 'all 0.2s'
+              }}
+            >
+              {name}
+            </button>
+          ))}
+          {l2csBoostAvailable && currentModel !== 'l2cs' && (
+            <button
+              onClick={handleBoostToggle}
+              style={{
+                padding: '5px 14px',
+                borderRadius: '16px',
+                border: l2csBoost ? '2px solid #f59e0b' : '1px solid #555',
+                background: l2csBoost ? 'rgba(245, 158, 11, 0.15)' : 'transparent',
+                color: l2csBoost ? '#f59e0b' : '#888',
+                fontSize: '11px',
+                fontWeight: l2csBoost ? 'bold' : 'normal',
+                cursor: 'pointer',
+                transition: 'all 0.2s',
+                marginLeft: '4px',
+              }}
+            >
+              {l2csBoost ? 'GAZE ON' : 'GAZE'}
+            </button>
+          )}
+          {(currentModel === 'l2cs' || l2csBoost) && stats && stats.isStreaming && (
+            <button
+              onClick={() => videoManager && videoManager.startCalibration()}
+              style={{
+                padding: '5px 14px',
+                borderRadius: '16px',
+                border: '1px solid #4ade80',
+                background: 'transparent',
+                color: '#4ade80',
+                fontSize: '12px',
+                fontWeight: 'bold',
+                cursor: 'pointer',
+                transition: 'all 0.2s',
+                marginLeft: '4px',
+              }}
+            >
+              Calibrate
+            </button>
+          )}
         </section>
       )}
           </section>
         </>
       ) : null}
+          ))}
+        </div>
+        <div id="timeline-line"></div>
+      </section>
+      {/* 4. Control Buttons */}
+      <section id="control-panel">
+        <button id="btn-cam-start" className="action-btn green" onClick={handleStart}>
+          Start
+        </button>
+        <button id="btn-floating" className="action-btn yellow" onClick={handleFloatingWindow}>
+          Floating Window
+        </button>
+        <button
+          id="btn-preview"
+          className="action-btn"
+          style={{ backgroundColor: '#6c5ce7' }}
+          onClick={handlePreview}
+        >
+          Preview Result
+        </button>
+        <button id="btn-cam-stop" className="action-btn red" onClick={handleStop}>
+          Stop
+        </button>
+      </section>
+      {/* 5. Frame Control */}
+      <section id="frame-control">
+        <label htmlFor="frame-slider">Frame Rate (FPS)</label>
+        <input
+          type="range"
+          id="frame-slider"
+          min="10"
+          max="30"
+          value={currentFrame}
+          onChange={(e) => handleFrameChange(e.target.value)}
+        />
+        <input
+          type="number"
+          id="frame-input"
+          min="10"
+          max="30"
+          value={currentFrame}
+          onChange={(e) => handleFrameChange(e.target.value)}
+        />
+      </section>
+      {/* Calibration overlay (fixed fullscreen, must be outside overflow:hidden containers) */}
+      <CalibrationOverlay calibration={calibration} videoManager={videoManager} />
     </main>
   );
 }

src/utils/VideoManagerLocal.js CHANGED Viewed

@@ -40,6 +40,17 @@ export class VideoManagerLocal {
         this.lastNotificationTime = null;
         this.notificationCooldown = 60000;
         // Performance metrics
         this.stats = {
             framesSent: 0,
@@ -74,8 +85,8 @@ export class VideoManagerLocal {
             // Create a smaller capture canvas for faster encoding and transfer.
             this.canvas = document.createElement('canvas');
-            this.canvas.width = 320;
-            this.canvas.height = 240;
             console.log('Local camera initialized');
             return true;
@@ -247,7 +258,7 @@ export class VideoManagerLocal {
                         this.ws.send(blob);
                         this.stats.framesSent++;
                     }
-                }, 'image/jpeg', 0.5);
             } catch (error) {
                 this._sendingBlob = false;
                 console.error('Capture error:', error);
@@ -312,6 +323,19 @@ export class VideoManagerLocal {
                         ctx.textAlign = 'left';
                     }
                 }
                 // Performance stats
                 ctx.fillStyle = 'rgba(0,0,0,0.5)';
                 ctx.fillRect(0, h - 25, w, 25);
@@ -380,6 +404,9 @@ export class VideoManagerLocal {
                     mar: data.mar,
                     sf: data.sf,
                     se: data.se,
                 };
                 this.drawDetectionResult(detectionData);
                 break;
@@ -397,6 +424,51 @@ export class VideoManagerLocal {
                 this.sessionStartTime = null;
                 break;
             case 'error':
                 console.error('Server error:', data.message);
                 break;
@@ -406,6 +478,28 @@ export class VideoManagerLocal {
         }
     }
     // Face mesh landmark index groups (matches live_demo.py)
     static FACE_OVAL = [10,338,297,332,284,251,389,356,454,323,361,288,397,365,379,378,400,377,152,148,176,149,150,136,172,58,132,93,234,127,162,21,54,103,67,109,10];
     static LEFT_EYE = [33,7,163,144,145,153,154,155,133,173,157,158,159,160,161,246];

         this.lastNotificationTime = null;
         this.notificationCooldown = 60000;
+        // Calibration state
+        this.calibration = {
+            active: false,
+            collecting: false,
+            target: null,
+            index: 0,
+            numPoints: 0,
+            done: false,
+            success: false,
+        };
         // Performance metrics
         this.stats = {
             framesSent: 0,
             // Create a smaller capture canvas for faster encoding and transfer.
             this.canvas = document.createElement('canvas');
+            this.canvas.width = 640;
+            this.canvas.height = 480;
             console.log('Local camera initialized');
             return true;
                         this.ws.send(blob);
                         this.stats.framesSent++;
                     }
+                }, 'image/jpeg', 0.75);
             } catch (error) {
                 this._sendingBlob = false;
                 console.error('Capture error:', error);
                         ctx.textAlign = 'left';
                     }
                 }
+                // Gaze pointer (L2CS + calibration)
+                if (data && data.gaze_x !== undefined && data.gaze_y !== undefined) {
+                    const gx = data.gaze_x * w;
+                    const gy = data.gaze_y * h;
+                    ctx.beginPath();
+                    ctx.arc(gx, gy, 8, 0, 2 * Math.PI);
+                    ctx.fillStyle = data.on_screen ? 'rgba(0, 200, 255, 0.7)' : 'rgba(255, 80, 80, 0.5)';
+                    ctx.fill();
+                    ctx.strokeStyle = '#FFFFFF';
+                    ctx.lineWidth = 2;
+                    ctx.stroke();
+                }
                 // Performance stats
                 ctx.fillStyle = 'rgba(0,0,0,0.5)';
                 ctx.fillRect(0, h - 25, w, 25);
                     mar: data.mar,
                     sf: data.sf,
                     se: data.se,
+                    gaze_x: data.gaze_x,
+                    gaze_y: data.gaze_y,
+                    on_screen: data.on_screen,
                 };
                 this.drawDetectionResult(detectionData);
                 break;
                 this.sessionStartTime = null;
                 break;
+            case 'calibration_started':
+                this.calibration = {
+                    active: true,
+                    collecting: true,
+                    target: data.target,
+                    index: data.index,
+                    numPoints: data.num_points,
+                    done: false,
+                    success: false,
+                };
+                if (this.callbacks.onCalibrationUpdate) {
+                    this.callbacks.onCalibrationUpdate({ ...this.calibration });
+                }
+                break;
+            case 'calibration_point':
+                this.calibration.target = data.target;
+                this.calibration.index = data.index;
+                if (this.callbacks.onCalibrationUpdate) {
+                    this.callbacks.onCalibrationUpdate({ ...this.calibration });
+                }
+                break;
+            case 'calibration_done':
+                this.calibration.collecting = false;
+                this.calibration.done = true;
+                this.calibration.success = data.success;
+                if (this.callbacks.onCalibrationUpdate) {
+                    this.callbacks.onCalibrationUpdate({ ...this.calibration });
+                }
+                setTimeout(() => {
+                    this.calibration.active = false;
+                    if (this.callbacks.onCalibrationUpdate) {
+                        this.callbacks.onCalibrationUpdate({ ...this.calibration });
+                    }
+                }, 2000);
+                break;
+            case 'calibration_cancelled':
+                this.calibration = { active: false, collecting: false, target: null, index: 0, numPoints: 0, done: false, success: false };
+                if (this.callbacks.onCalibrationUpdate) {
+                    this.callbacks.onCalibrationUpdate({ ...this.calibration });
+                }
+                break;
             case 'error':
                 console.error('Server error:', data.message);
                 break;
         }
     }
+    startCalibration() {
+        if (this.ws && this.ws.readyState === WebSocket.OPEN) {
+            this.ws.send(JSON.stringify({ type: 'calibration_start' }));
+        }
+    }
+    nextCalibrationPoint() {
+        if (this.ws && this.ws.readyState === WebSocket.OPEN) {
+            this.ws.send(JSON.stringify({ type: 'calibration_next' }));
+        }
+    }
+    cancelCalibration() {
+        if (this.ws && this.ws.readyState === WebSocket.OPEN) {
+            this.ws.send(JSON.stringify({ type: 'calibration_cancel' }));
+        }
+        this.calibration = { active: false, collecting: false, target: null, index: 0, numPoints: 0, done: false, success: false };
+        if (this.callbacks.onCalibrationUpdate) {
+            this.callbacks.onCalibrationUpdate({ ...this.calibration });
+        }
+    }
     // Face mesh landmark index groups (matches live_demo.py)
     static FACE_OVAL = [10,338,297,332,284,251,389,356,454,323,361,288,397,365,379,378,400,377,152,148,176,149,150,136,172,58,132,93,234,127,162,21,54,103,67,109,10];
     static LEFT_EYE = [33,7,163,144,145,153,154,155,133,173,157,158,159,160,161,246];

ui/pipeline.py CHANGED Viewed

@@ -5,6 +5,7 @@ import glob
 import json
 import math
 import os
 import sys
 import numpy as np
@@ -54,8 +55,12 @@ def _clip_features(vec):
 class _OutputSmoother:
-    def __init__(self, alpha: float = 0.3, grace_frames: int = 15):
-        self._alpha = alpha
         self._grace = grace_frames
         self._score = 0.5
         self._no_face = 0
@@ -64,14 +69,15 @@ class _OutputSmoother:
         self._score = 0.5
         self._no_face = 0
-    def update(self, raw_score: float, face_detected: bool) -> float:
         if face_detected:
             self._no_face = 0
-            self._score += self._alpha * (raw_score - self._score)
         else:
             self._no_face += 1
             if self._no_face > self._grace:
-                self._score *= 0.85
         return self._score
@@ -645,3 +651,141 @@ class XGBoostPipeline:
     def __exit__(self, *args):
         self.close()

 import json
 import math
 import os
+import pathlib
 import sys
 import numpy as np
 class _OutputSmoother:
+    # Asymmetric EMA: rises fast (recognise focus), falls slower (avoid flicker).
+    # Grace period holds score steady for a few frames when face is lost.
+    def __init__(self, alpha_up=0.55, alpha_down=0.45, grace_frames=10):
+        self._alpha_up = alpha_up
+        self._alpha_down = alpha_down
         self._grace = grace_frames
         self._score = 0.5
         self._no_face = 0
         self._score = 0.5
         self._no_face = 0
+    def update(self, raw_score, face_detected):
         if face_detected:
             self._no_face = 0
+            alpha = self._alpha_up if raw_score > self._score else self._alpha_down
+            self._score += alpha * (raw_score - self._score)
         else:
             self._no_face += 1
             if self._no_face > self._grace:
+                self._score *= 0.80
         return self._score
     def __exit__(self, *args):
         self.close()
+def _resolve_l2cs_weights():
+    for p in [
+        os.path.join(_PROJECT_ROOT, "models", "L2CS-Net", "models", "L2CSNet_gaze360.pkl"),
+        os.path.join(_PROJECT_ROOT, "models", "L2CSNet_gaze360.pkl"),
+        os.path.join(_PROJECT_ROOT, "checkpoints", "L2CSNet_gaze360.pkl"),
+    ]:
+        if os.path.isfile(p):
+            return p
+    return None
+def is_l2cs_weights_available():
+    return _resolve_l2cs_weights() is not None
+class L2CSPipeline:
+    # Uses in-tree l2cs.Pipeline (RetinaFace + ResNet50) for gaze estimation
+    # and MediaPipe for head pose, EAR, MAR, and roll de-rotation.
+    YAW_THRESHOLD = 22.0
+    PITCH_THRESHOLD = 20.0
+    def __init__(self, weights_path=None, arch="ResNet50", device="cpu",
+                 threshold=0.52, detector=None):
+        resolved = weights_path or _resolve_l2cs_weights()
+        if resolved is None or not os.path.isfile(resolved):
+            raise FileNotFoundError(
+                "L2CS weights not found. Place L2CSNet_gaze360.pkl in "
+                "models/L2CS-Net/models/ or checkpoints/"
+            )
+        # add in-tree L2CS-Net to import path
+        l2cs_root = os.path.join(_PROJECT_ROOT, "models", "L2CS-Net")
+        if l2cs_root not in sys.path:
+            sys.path.insert(0, l2cs_root)
+        from l2cs import Pipeline as _L2CSPipeline
+        import torch
+        # bypass upstream select_device bug by constructing torch.device directly
+        self._pipeline = _L2CSPipeline(
+            weights=pathlib.Path(resolved), arch=arch, device=torch.device(device),
+        )
+        self._detector = detector or FaceMeshDetector()
+        self._owns_detector = detector is None
+        self._head_pose = HeadPoseEstimator()
+        self.head_pose = self._head_pose
+        self._eye_scorer = EyeBehaviourScorer()
+        self._threshold = threshold
+        self._smoother = _OutputSmoother()
+        print(
+            f"[L2CS] Loaded {resolved} | arch={arch} device={device} "
+            f"yaw_thresh={self.YAW_THRESHOLD} pitch_thresh={self.PITCH_THRESHOLD} "
+            f"threshold={threshold}"
+        )
+    @staticmethod
+    def _derotate_gaze(pitch_rad, yaw_rad, roll_deg):
+        # remove head roll so tilted-but-looking-at-screen reads as (0,0)
+        roll_rad = -math.radians(roll_deg)
+        cos_r, sin_r = math.cos(roll_rad), math.sin(roll_rad)
+        return (yaw_rad * sin_r + pitch_rad * cos_r,
+                yaw_rad * cos_r - pitch_rad * sin_r)
+    def process_frame(self, bgr_frame):
+        landmarks = self._detector.process(bgr_frame)
+        h, w = bgr_frame.shape[:2]
+        out = {
+            "landmarks": landmarks, "is_focused": False, "raw_score": 0.0,
+            "s_face": 0.0, "s_eye": 0.0, "gaze_pitch": None, "gaze_yaw": None,
+            "yaw": None, "pitch": None, "roll": None, "mar": None, "is_yawning": False,
+        }
+        # MediaPipe: head pose, eye/mouth scores
+        roll_deg = 0.0
+        if landmarks is not None:
+            angles = self._head_pose.estimate(landmarks, w, h)
+            if angles is not None:
+                out["yaw"], out["pitch"], out["roll"] = angles
+                roll_deg = angles[2]
+            out["s_face"] = self._head_pose.score(landmarks, w, h)
+            out["s_eye"] = self._eye_scorer.score(landmarks)
+            out["mar"] = compute_mar(landmarks)
+            out["is_yawning"] = out["mar"] > MAR_YAWN_THRESHOLD
+        # L2CS gaze (uses its own RetinaFace detector internally)
+        results = self._pipeline.step(bgr_frame)
+        if results is None or results.pitch.shape[0] == 0:
+            smoothed = self._smoother.update(0.0, landmarks is not None)
+            out["raw_score"] = smoothed
+            out["is_focused"] = smoothed >= self._threshold
+            return out
+        pitch_rad = float(results.pitch[0])
+        yaw_rad = float(results.yaw[0])
+        pitch_rad, yaw_rad = self._derotate_gaze(pitch_rad, yaw_rad, roll_deg)
+        out["gaze_pitch"] = pitch_rad
+        out["gaze_yaw"] = yaw_rad
+        yaw_deg = abs(math.degrees(yaw_rad))
+        pitch_deg = abs(math.degrees(pitch_rad))
+        # fall back to L2CS angles if MediaPipe didn't produce head pose
+        out["yaw"] = out.get("yaw") or math.degrees(yaw_rad)
+        out["pitch"] = out.get("pitch") or math.degrees(pitch_rad)
+        # cosine scoring: 1.0 at centre, 0.0 at threshold
+        yaw_t = min(yaw_deg / self.YAW_THRESHOLD, 1.0)
+        pitch_t = min(pitch_deg / self.PITCH_THRESHOLD, 1.0)
+        yaw_score = 0.5 * (1.0 + math.cos(math.pi * yaw_t))
+        pitch_score = 0.5 * (1.0 + math.cos(math.pi * pitch_t))
+        gaze_score = 0.55 * yaw_score + 0.45 * pitch_score
+        if out["is_yawning"]:
+            gaze_score = 0.0
+        out["raw_score"] = self._smoother.update(float(gaze_score), True)
+        out["is_focused"] = out["raw_score"] >= self._threshold
+        return out
+    def reset_session(self):
+        self._smoother.reset()
+    def close(self):
+        if self._owns_detector:
+            self._detector.close()
+    def __enter__(self):
+        return self
+    def __exit__(self, *args):
+        self.close()