Longxiang-ai commited on
Commit
796f97a
·
verified ·
1 Parent(s): 59075bc

Update TransNormal model card usage

Browse files
Files changed (1) hide show
  1. README.md +133 -25
README.md CHANGED
@@ -1,55 +1,163 @@
1
  ---
2
  license: cc-by-nc-4.0
 
 
 
 
 
 
 
3
  tags:
4
  - normal-estimation
5
- - depth-estimation
6
- - diffusion
7
  - transparent-objects
8
- library_name: diffusers
9
- pipeline_tag: image-to-image
 
 
 
10
  ---
11
 
12
  # TransNormal
13
 
14
- Surface normal estimation for transparent objects using diffusion models with DINOv3 semantic guidance.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ```python
19
- from transnormal import TransNormalPipeline, create_dino_encoder
20
  import torch
 
 
 
 
21
 
22
- # Load DINO encoder (download separately)
23
  dino_encoder = create_dino_encoder(
24
  model_name="dinov3_vith16plus",
25
- weights_path="path/to/dinov3_vith16plus",
26
- projector_path="path/to/cross_attention_projector.pt",
27
- device="cuda",
28
- dtype=torch.bfloat16,
 
29
  )
30
 
31
- # Load pipeline
32
  pipe = TransNormalPipeline.from_pretrained(
33
- "longxiang-ai/transnormal-v1",
34
  dino_encoder=dino_encoder,
35
- torch_dtype=torch.bfloat16,
 
36
  )
37
- pipe = pipe.to("cuda")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
- # Inference
40
- normal_map = pipe("image.jpg", output_type="pil")
 
 
 
 
 
41
  ```
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ## Citation
44
 
 
 
45
  ```bibtex
46
- @article{transnormal2025,
47
- title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
48
- author={Li, Mingwei and Fan, Hehe and Yang, Yi},
49
- year={2025}
 
 
 
 
50
  }
51
  ```
52
 
53
- ## License
54
-
55
- CC BY-NC 4.0
 
1
  ---
2
  license: cc-by-nc-4.0
3
+ library_name: diffusers
4
+ pipeline_tag: image-to-image
5
+ inference: false
6
+ base_model:
7
+ - stabilityai/stable-diffusion-2-base
8
+ datasets:
9
+ - Longxiang-ai/TransNormal-Synthetic
10
  tags:
11
  - normal-estimation
12
+ - surface-normal-estimation
 
13
  - transparent-objects
14
+ - diffusion
15
+ - dinov3
16
+ - image-to-image
17
+ - computer-vision
18
+ - robotics
19
  ---
20
 
21
  # TransNormal
22
 
23
+ Official model weights for **TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation** (ICML 2026).
24
+
25
+ TransNormal estimates camera-space surface normal maps from a single RGB image, with a focus on transparent objects such as laboratory glassware. The model adapts Stable Diffusion 2 as a single-step normal regressor and injects dense DINOv3 visual semantics through cross-attention.
26
+
27
+ **Links:** [Paper](https://arxiv.org/abs/2602.00839) | [Project page](https://longxiang-ai.github.io/TransNormal/) | [Code](https://github.com/longxiang-ai/TransNormal) | [Dataset](https://huggingface.co/datasets/Longxiang-ai/TransNormal-Synthetic)
28
+
29
+ > **Important:** The generic Hugging Face / Diffusers "Use this model" snippet is not sufficient for this repository. TransNormal uses a custom pipeline and requires a DINOv3 backbone in addition to the weights stored here. Please use the instructions below.
30
+
31
+ ## What This Repository Contains
32
+
33
+ This model repository contains:
34
+
35
+ - Fine-tuned TransNormal diffusion pipeline weights.
36
+ - `cross_attention_projector.pt`, the DINOv3-to-U-Net cross-attention projector.
37
+ - SD2-compatible VAE, U-Net, tokenizer, scheduler, and config files.
38
+
39
+ This repository does **not** contain the DINOv3 backbone weights. Download them separately as described below.
40
 
41
+ ## Installation
42
+
43
+ ```bash
44
+ git clone https://github.com/longxiang-ai/TransNormal.git
45
+ cd TransNormal
46
+
47
+ conda create -n TransNormal python=3.10 -y
48
+ conda activate TransNormal
49
+ pip install -r requirements.txt
50
+ ```
51
+
52
+ The code requires `transformers>=4.56.0` for Hugging Face DINOv3 support. BF16 is recommended for DINOv3 inference.
53
+
54
+ ## Download Weights
55
+
56
+ Download the TransNormal weights from this repository:
57
+
58
+ ```bash
59
+ pip install huggingface_hub
60
+
61
+ python -c "from huggingface_hub import snapshot_download; snapshot_download('Longxiang-ai/TransNormal', local_dir='./weights/transnormal')"
62
+ ```
63
+
64
+ Download the DINOv3 ViT-H+/16 backbone separately:
65
+
66
+ ```bash
67
+ python -c "from huggingface_hub import snapshot_download; snapshot_download('facebook/dinov3-vith16plus-pretrain-lvd1689m', local_dir='./weights/dinov3_vith16plus')"
68
+ ```
69
+
70
+ Access to DINOv3 may require approval from Meta / Hugging Face. See the [DINOv3 repository](https://github.com/facebookresearch/dinov3) and [Meta AI DINOv3 downloads](https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/) for details.
71
+
72
+ ## Python Usage
73
 
74
  ```python
 
75
  import torch
76
+ from transnormal import TransNormalPipeline, create_dino_encoder, save_normal_map
77
+
78
+ device = "cuda"
79
+ dtype = torch.bfloat16
80
 
 
81
  dino_encoder = create_dino_encoder(
82
  model_name="dinov3_vith16plus",
83
+ weights_path="./weights/dinov3_vith16plus",
84
+ projector_path="./weights/transnormal/cross_attention_projector.pt",
85
+ device=device,
86
+ dtype=dtype,
87
+ freeze_encoder=True,
88
  )
89
 
 
90
  pipe = TransNormalPipeline.from_pretrained(
91
+ "./weights/transnormal",
92
  dino_encoder=dino_encoder,
93
+ torch_dtype=dtype,
94
+ safety_checker=None,
95
  )
96
+ pipe = pipe.to(device)
97
+
98
+ normal_map = pipe(
99
+ image="path/to/image.jpg",
100
+ timestep=999,
101
+ output_type="np",
102
+ )
103
+
104
+ save_normal_map(normal_map, "output_normal.png")
105
+ ```
106
+
107
+ ## Command Line Usage
108
+
109
+ Single image:
110
+
111
+ ```bash
112
+ python inference.py \
113
+ --image path/to/image.jpg \
114
+ --output normal.png \
115
+ --model_path ./weights/transnormal \
116
+ --dino_path ./weights/dinov3_vith16plus \
117
+ --projector_path ./weights/transnormal/cross_attention_projector.pt \
118
+ --timestep 999
119
+ ```
120
+
121
+ Batch inference:
122
 
123
+ ```bash
124
+ python inference_batch.py \
125
+ --input_dir ./examples/input \
126
+ --output_dir ./examples/output \
127
+ --model_path ./weights/transnormal \
128
+ --dino_path ./weights/dinov3_vith16plus \
129
+ --timestep 999
130
  ```
131
 
132
+ ## Output Format
133
+
134
+ The output is a normal-map visualization in `[0, 1]`, where `0.5` represents zero for each normal component. See the [GitHub README](https://github.com/longxiang-ai/TransNormal#output-format) for the current camera-coordinate convention and saving utilities.
135
+
136
+ ## Dataset
137
+
138
+ The accompanying **TransNormal-Synthetic** dataset is available at:
139
+
140
+ https://huggingface.co/datasets/Longxiang-ai/TransNormal-Synthetic
141
+
142
+ It provides physics-based rendered transparent labware scenes with RGB images, surface normal maps, depth maps, masks, material variants, and camera metadata.
143
+
144
+ ## License
145
+
146
+ This model is released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). For commercial licensing inquiries, please contact the authors.
147
+
148
  ## Citation
149
 
150
+ If you find this work useful, please cite:
151
+
152
  ```bibtex
153
+ @misc{li2026transnormal,
154
+ title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
155
+ author={Mingwei Li and Hehe Fan and Yi Yang},
156
+ year={2026},
157
+ eprint={2602.00839},
158
+ archivePrefix={arXiv},
159
+ primaryClass={cs.CV},
160
+ url={https://arxiv.org/abs/2602.00839},
161
  }
162
  ```
163