garrying
/

GlassSemNet

+---
+license: cc-by-nc-4.0
+task_categories:
+  - image-segmentation
+tags:
+  - glass-surface-detection
+  - semantic-segmentation
+  - scene-understanding
+  - pytorch
+pretty_name: GlassSemNet (Glass Semantic Network)
+---
+# GlassSemNet — Glass Semantic Network
+Pre-trained weights for **GlassSemNet**, introduced in:
+> **Exploiting Semantic Relations for Glass Surface Detection**
+> Jiaying Lin, Yuen-Hei Yeung, Rynson W. H. Lau
+> NeurIPS 2022
+> [Paper](https://openreview.net/forum?id=WrIrYMCZgbb) · [Project Page](https://jiaying.link/neurips2022-gsds/) · [Dataset (GSD-S)](https://huggingface.co/datasets/garrying/GSD-S)
+## Model Summary
+GlassSemNet detects glass surfaces by exploiting semantic relations between the glass region and its surrounding scene context. It uses a dual-backbone design:
+- **Spatial backbone (SegFormer)**: extracts multi-scale spatial features.
+- **Semantic backbone (ResNet-50 + DeepLabV3+)**: encodes 43-class semantic scene features into compact per-class encodings.
+- **Semantic-Aware Attention (SAA)**: fuses spatial and semantic features at three scales using the semantic encodings as guidance.
+- **Cross-modal Context Aggregation (CCA)**: aggregates cross-scale context at the deepest level.
+- **UPerNet decoder**: combines the fused multi-scale features into the final glass surface prediction.
+| File | Description |
+|------|-------------|
+| `GlassSemNet.pth` | Best checkpoint (917 MB), saved as a raw `state_dict` |
+## Loading the Weights
+```python
+import torch
+from model.GlassSemNet import GlassSemNet   # from the code release
+model = GlassSemNet()
+state_dict = torch.load("GlassSemNet.pth", map_location="cpu")
+model.load_state_dict(state_dict)
+model.eval()
+```
+Download the checkpoint:
+```bash
+huggingface-cli download garrying/GlassSemNet GlassSemNet.pth --local-dir ./weights
+```
+## Inference
+```bash
+python predict.py -c GlassSemNet.pth -i /path/to/images/ -o /path/to/output/
+```
+Images are resized to **384 × 384** internally. Predictions are post-processed with CRF refinement and thresholded to produce binary glass surface masks.
+## Training Dataset
+This model was trained and evaluated on **GSD-S**, the first glass surface detection dataset with semantic annotations:
+- 4,519 images (3,511 train / 1,008 test) with binary glass masks, instance segmentation maps, and 43-class semantic labels
+- Available at [garrying/GSD-S](https://huggingface.co/datasets/garrying/GSD-S)
+## Citation
+```bibtex
+@article{neurips2022:gsds2022,
+  author    = {Lin, Jiaying and Yeung, Yuen-Hei and Lau, Rynson W.H.},
+  title     = {Exploiting Semantic Relations for Glass Surface Detection},
+  journal   = {NeurIPS},
+  year      = {2022},
+}
+```
+## License
+Non-commercial use only — [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).