| --- |
| license: cc-by-nc-4.0 |
| task_categories: |
| - image-segmentation |
| tags: |
| - glass-surface-detection |
| - semantic-segmentation |
| - scene-understanding |
| - pytorch |
| pretty_name: GlassSemNet (Glass Semantic Network) |
| --- |
| |
| # GlassSemNet — Glass Semantic Network |
|
|
| Pre-trained weights for **GlassSemNet**, introduced in: |
|
|
| > **Exploiting Semantic Relations for Glass Surface Detection** |
| > Jiaying Lin, Yuen-Hei Yeung, Rynson W. H. Lau |
| > NeurIPS 2022 |
| > [Paper](https://openreview.net/forum?id=WrIrYMCZgbb) · [Project Page](https://jiaying.link/neurips2022-gsds/) · [Dataset (GSD-S)](https://huggingface.co/datasets/garrying/GSD-S) |
|
|
| ## Model Summary |
|
|
| GlassSemNet detects glass surfaces by exploiting semantic relations between the glass region and its surrounding scene context. It uses a dual-backbone design: |
|
|
| - **Spatial backbone (SegFormer)**: extracts multi-scale spatial features. |
| - **Semantic backbone (ResNet-50 + DeepLabV3+)**: encodes 43-class semantic scene features into compact per-class encodings. |
| - **Semantic-Aware Attention (SAA)**: fuses spatial and semantic features at three scales using the semantic encodings as guidance. |
| - **Cross-modal Context Aggregation (CCA)**: aggregates cross-scale context at the deepest level. |
| - **UPerNet decoder**: combines the fused multi-scale features into the final glass surface prediction. |
|
|
| | File | Description | |
| |------|-------------| |
| | `GlassSemNet.pth` | Best checkpoint (917 MB), saved as a raw `state_dict` | |
|
|
| ## Loading the Weights |
|
|
| ```python |
| import torch |
| from model.GlassSemNet import GlassSemNet # from the code release |
| |
| model = GlassSemNet() |
| state_dict = torch.load("GlassSemNet.pth", map_location="cpu") |
| model.load_state_dict(state_dict) |
| model.eval() |
| ``` |
|
|
| Download the checkpoint: |
| ```bash |
| huggingface-cli download garrying/GlassSemNet GlassSemNet.pth --local-dir ./weights |
| ``` |
|
|
| ## Inference |
|
|
| ```bash |
| python predict.py -c GlassSemNet.pth -i /path/to/images/ -o /path/to/output/ |
| ``` |
|
|
| Images are resized to **384 × 384** internally. Predictions are post-processed with CRF refinement and thresholded to produce binary glass surface masks. |
|
|
| ## Training Dataset |
|
|
| This model was trained and evaluated on **GSD-S**, the first glass surface detection dataset with semantic annotations: |
|
|
| - 4,519 images (3,511 train / 1,008 test) with binary glass masks, instance segmentation maps, and 43-class semantic labels |
| - Available at [garrying/GSD-S](https://huggingface.co/datasets/garrying/GSD-S) |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{neurips2022:gsds2022, |
| author = {Lin, Jiaying and Yeung, Yuen-Hei and Lau, Rynson W.H.}, |
| title = {Exploiting Semantic Relations for Glass Surface Detection}, |
| journal = {NeurIPS}, |
| year = {2022}, |
| } |
| ``` |
|
|
| ## License |
|
|
| Non-commercial use only — [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). |
|
|