| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - scene-graph-generation |
| - object-detection |
| - visual-relationship-detection |
| - pytorch |
| - yolo |
| pipeline_tag: object-detection |
| library_name: sgg-benchmark |
| model-index: |
| - name: REACT++ yolo12l |
| results: |
| - task: |
| type: object-detection |
| name: Scene Graph Detection |
| dataset: |
| name: PSG |
| type: psg |
| metrics: |
| - type: mR@20 |
| value: 23.2 |
| name: mR@20 |
| - type: R@20 |
| value: 30.99 |
| name: R@20 |
| - type: F1@20 |
| value: 26.53 |
| name: F1@20 |
| - type: mR@50 |
| value: 25.49 |
| name: mR@50 |
| - type: R@50 |
| value: 35.3 |
| name: R@50 |
| - type: F1@50 |
| value: 29.6 |
| name: F1@50 |
| - type: mR@100 |
| value: 26.45 |
| name: mR@100 |
| - type: R@100 |
| value: 36.68 |
| name: R@100 |
| - type: F1@100 |
| value: 30.74 |
| name: F1@100 |
| - type: e2e_latency_ms |
| value: 19.6 |
| name: e2e_latency_ms |
| - name: REACT++ yolo12m |
| results: |
| - task: |
| type: object-detection |
| name: Scene Graph Detection |
| dataset: |
| name: PSG |
| type: psg |
| metrics: |
| - type: mR@20 |
| value: 22.74 |
| name: mR@20 |
| - type: R@20 |
| value: 32.69 |
| name: R@20 |
| - type: F1@20 |
| value: 26.82 |
| name: F1@20 |
| - type: mR@50 |
| value: 25.21 |
| name: mR@50 |
| - type: R@50 |
| value: 37.2 |
| name: R@50 |
| - type: F1@50 |
| value: 30.05 |
| name: F1@50 |
| - type: mR@100 |
| value: 26.08 |
| name: mR@100 |
| - type: R@100 |
| value: 38.58 |
| name: R@100 |
| - type: F1@100 |
| value: 31.12 |
| name: F1@100 |
| - type: e2e_latency_ms |
| value: 15.7 |
| name: e2e_latency_ms |
| - name: REACT++ yolo12s |
| results: |
| - task: |
| type: object-detection |
| name: Scene Graph Detection |
| dataset: |
| name: PSG |
| type: psg |
| metrics: |
| - type: mR@20 |
| value: 21.12 |
| name: mR@20 |
| - type: R@20 |
| value: 29.28 |
| name: R@20 |
| - type: F1@20 |
| value: 24.54 |
| name: F1@20 |
| - type: mR@50 |
| value: 23.21 |
| name: mR@50 |
| - type: R@50 |
| value: 33.48 |
| name: R@50 |
| - type: F1@50 |
| value: 27.41 |
| name: F1@50 |
| - type: mR@100 |
| value: 23.77 |
| name: mR@100 |
| - type: R@100 |
| value: 34.74 |
| name: R@100 |
| - type: F1@100 |
| value: 28.23 |
| name: F1@100 |
| - type: e2e_latency_ms |
| value: 12.2 |
| name: e2e_latency_ms |
| - name: REACT++ yolo12n |
| results: |
| - task: |
| type: object-detection |
| name: Scene Graph Detection |
| dataset: |
| name: PSG |
| type: psg |
| metrics: |
| - type: mR@20 |
| value: 16.88 |
| name: mR@20 |
| - type: R@20 |
| value: 26.88 |
| name: R@20 |
| - type: F1@20 |
| value: 20.74 |
| name: F1@20 |
| - type: mR@50 |
| value: 18.65 |
| name: mR@50 |
| - type: R@50 |
| value: 30.61 |
| name: R@50 |
| - type: F1@50 |
| value: 23.17 |
| name: F1@50 |
| - type: mR@100 |
| value: 19.5 |
| name: mR@100 |
| - type: R@100 |
| value: 31.8 |
| name: R@100 |
| - type: F1@100 |
| value: 24.17 |
| name: F1@100 |
| - type: e2e_latency_ms |
| value: 11.4 |
| name: e2e_latency_ms |
| - name: REACT++ yolov8m |
| results: |
| - task: |
| type: object-detection |
| name: Scene Graph Detection |
| dataset: |
| name: PSG |
| type: psg |
| metrics: |
| - type: mR@20 |
| value: 22.75 |
| name: mR@20 |
| - type: R@20 |
| value: 30.69 |
| name: R@20 |
| - type: F1@20 |
| value: 26.13 |
| name: F1@20 |
| - type: mR@50 |
| value: 25.46 |
| name: mR@50 |
| - type: R@50 |
| value: 35.68 |
| name: R@50 |
| - type: F1@50 |
| value: 29.72 |
| name: F1@50 |
| - type: mR@100 |
| value: 26.4 |
| name: mR@100 |
| - type: R@100 |
| value: 37.43 |
| name: R@100 |
| - type: F1@100 |
| value: 30.96 |
| name: F1@100 |
| - type: e2e_latency_ms |
| value: 15.3 |
| name: e2e_latency_ms |
| --- |
| |
| # REACT++ Scene Graph Generation — PSG (yolo12l, yolo12m, yolo12s, yolo12n, yolov8m) |
|
|
| This repository contains **REACT++** model checkpoints for scene graph generation (SGG) |
| on the **PSG** benchmark, across 5 backbone sizes. |
|
|
| REACT++ is a parameter-efficient, attention-augmented relation predictor built on top of |
| a YOLO12 backbone. It uses: |
|
|
| - **DAMP** (Detection-Anchored Multi-Scale Pooling), a new simple pooling algorithm for one-stage object detectors such as YOLO |
| - **SwiGLU gated MLP** for all feed-forward blocks (½ the params of ReLU-MLP at equal capacity) |
| - **Visual x Semantic cross-attention** — visual tokens attend to GloVe prototype embeddings |
| - **Geometry RoPE** — box-position encoded as a rotary frequency bias on the Q matrix |
| - **Prototype Momentum Buffer** — per-class EMA prototype bank |
| - **P5 Scene Context** — AIFI-enhanced P5 tokens provide global context via cross-attention |
|
|
| The models were trained with the |
| [SGG-Benchmark](https://github.com/Maelic/SGG-Benchmark) framework and described in the |
| [REACT++ paper (Neau et al., 2026)](https://arxiv.org/abs/2603.06386). |
|
|
| --- |
|
|
| ## Results — SGDet on PSG test split (ONNX, CUDA) |
|
|
| > Metrics from end-to-end ONNX evaluation (`tools/eval_onnx_psg.py`). E2E Latency = image load + pre-process + ONNX forward. |
|
|
| | Backbone | Params | R@20 | R@50 | R@100 | mR@20 | mR@50 | mR@100 | F1@20 | F1@50 | F1@100 | E2E Lat. (ms) | |
| |----------|:------:|-----:|-----:|------:|------:|------:|-------:|------:|------:|-------:|--------------:| |
| | yolo12l | ~26.5M | 30.99 | 35.3 | 36.68 | 23.2 | 25.49 | 26.45 | 26.53 | 29.6 | 30.74 | 19.6 | |
| | yolo12m | ~20.2M | 32.69 | 37.2 | 38.58 | 22.74 | 25.21 | 26.08 | 26.82 | 30.05 | 31.12 | 15.7 | |
| | yolo12s | ~9.2M | 29.28 | 33.48 | 34.74 | 21.12 | 23.21 | 23.77 | 24.54 | 27.41 | 28.23 | 12.2 | |
| | yolo12n | ~2.6M | 26.88 | 30.61 | 31.8 | 16.88 | 18.65 | 19.5 | 20.74 | 23.17 | 24.17 | 11.4 | |
| | yolov8m | ~25.9M | 30.69 | 35.68 | 37.43 | 22.75 | 25.46 | 26.4 | 26.13 | 29.72 | 30.96 | 15.3 | |
|
|
| --- |
|
|
| ## Checkpoints |
|
|
| | Variant | Sub-folder | Checkpoint files | |
| |---------|------------|-----------------| |
| | yolo12l | `yolo12l/` | `yolo12l/model.onnx` (ONNX) · `yolo12l/best_model_epoch_9.pth` (PyTorch) | |
| | yolo12m | `yolo12m/` | `yolo12m/model.onnx` (ONNX) · `yolo12m/best_model_epoch_9.pth` (PyTorch) | |
| | yolo12s | `yolo12s/` | `yolo12s/model.onnx` (ONNX) · `yolo12s/best_model_epoch_6.pth` (PyTorch) | |
| | yolo12n | `yolo12n/` | `yolo12n/model.onnx` (ONNX) · `yolo12n/best_model_epoch_5.pth` (PyTorch) | |
| | yolov8m | `yolov8m/` | `yolov8m/model.onnx` (ONNX) · `yolov8m/best_model_epoch_6.pth` (PyTorch) | |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### ONNX (recommended — no Python dependencies beyond onnxruntime) |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| |
| onnx_path = hf_hub_download( |
| repo_id="maelic/REACTPlusPlus_PSG", |
| filename="yolo12l/react_pp_yolo12m.onnx", |
| repo_type="model", |
| ) |
| # Run with tools/eval_onnx_psg.py or load directly via onnxruntime |
| ``` |
|
|
| ### PyTorch |
|
|
| ```python |
| # 1. Clone the repository |
| # git clone https://github.com/Maelic/SGG-Benchmark |
| |
| # 2. Install dependencies |
| # pip install -e . |
| |
| # 3. Download checkpoint + config |
| from huggingface_hub import hf_hub_download |
| |
| ckpt_path = hf_hub_download( |
| repo_id="maelic/REACTPlusPlus_PSG", |
| filename="yolo12l/best_model.pth", |
| repo_type="model", |
| ) |
| cfg_path = hf_hub_download( |
| repo_id="maelic/REACTPlusPlus_PSG", |
| filename="yolo12l/config.yml", |
| repo_type="model", |
| ) |
| |
| # 4. Run evaluation |
| import subprocess |
| subprocess.run([ |
| "python", "tools/relation_eval_hydra.py", |
| "--config-path", str(cfg_path), |
| "--task", "sgdet", |
| "--eval-only", |
| "--checkpoint", str(ckpt_path), |
| ]) |
| ``` |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{neau2026reactpp, |
| title = {REACT++: Efficient Cross-Attention for Real-Time Scene Graph Generation |
| }, |
| author = {Neau, Maëlic and Falomir, Zoe}, |
| year = {2026}, |
| url = {https://arxiv.org/abs/2603.06386}, |
| } |
| ``` |
|
|