Longxiang-ai
/

TransNormal

@@ -1,55 +1,163 @@
 ---
 license: cc-by-nc-4.0
 tags:
 - normal-estimation
-- depth-estimation
-- diffusion
 - transparent-objects
-library_name: diffusers
-pipeline_tag: image-to-image
 ---
 # TransNormal
-Surface normal estimation for transparent objects using diffusion models with DINOv3 semantic guidance.
-## Usage
 ```python
-from transnormal import TransNormalPipeline, create_dino_encoder
 import torch
-# Load DINO encoder (download separately)
 dino_encoder = create_dino_encoder(
     model_name="dinov3_vith16plus",
-    weights_path="path/to/dinov3_vith16plus",
-    projector_path="path/to/cross_attention_projector.pt",
-    device="cuda",
-    dtype=torch.bfloat16,
 )
-# Load pipeline
 pipe = TransNormalPipeline.from_pretrained(
-    "longxiang-ai/transnormal-v1",
     dino_encoder=dino_encoder,
-    torch_dtype=torch.bfloat16,
 )
-pipe = pipe.to("cuda")
-# Inference
-normal_map = pipe("image.jpg", output_type="pil")
 ```
 ## Citation
 ```bibtex
-@article{transnormal2025,
-  title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
-  author={Li, Mingwei and Fan, Hehe and Yang, Yi},
-  year={2025}
 }
 ```
-## License
-CC BY-NC 4.0

 ---
 license: cc-by-nc-4.0
+library_name: diffusers
+pipeline_tag: image-to-image
+inference: false
+base_model:
+- stabilityai/stable-diffusion-2-base
+datasets:
+- Longxiang-ai/TransNormal-Synthetic
 tags:
 - normal-estimation
+- surface-normal-estimation
 - transparent-objects
+- diffusion
+- dinov3
+- image-to-image
+- computer-vision
+- robotics
 ---
 # TransNormal
+Official model weights for **TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation** (ICML 2026).
+TransNormal estimates camera-space surface normal maps from a single RGB image, with a focus on transparent objects such as laboratory glassware. The model adapts Stable Diffusion 2 as a single-step normal regressor and injects dense DINOv3 visual semantics through cross-attention.
+**Links:** [Paper](https://arxiv.org/abs/2602.00839) | [Project page](https://longxiang-ai.github.io/TransNormal/) | [Code](https://github.com/longxiang-ai/TransNormal) | [Dataset](https://huggingface.co/datasets/Longxiang-ai/TransNormal-Synthetic)
+> **Important:** The generic Hugging Face / Diffusers "Use this model" snippet is not sufficient for this repository. TransNormal uses a custom pipeline and requires a DINOv3 backbone in addition to the weights stored here. Please use the instructions below.
+## What This Repository Contains
+This model repository contains:
+- Fine-tuned TransNormal diffusion pipeline weights.
+- `cross_attention_projector.pt`, the DINOv3-to-U-Net cross-attention projector.
+- SD2-compatible VAE, U-Net, tokenizer, scheduler, and config files.
+This repository does **not** contain the DINOv3 backbone weights. Download them separately as described below.
+## Installation
+```bash
+git clone https://github.com/longxiang-ai/TransNormal.git
+cd TransNormal
+conda create -n TransNormal python=3.10 -y
+conda activate TransNormal
+pip install -r requirements.txt
+```
+The code requires `transformers>=4.56.0` for Hugging Face DINOv3 support. BF16 is recommended for DINOv3 inference.
+## Download Weights
+Download the TransNormal weights from this repository:
+```bash
+pip install huggingface_hub
+python -c "from huggingface_hub import snapshot_download; snapshot_download('Longxiang-ai/TransNormal', local_dir='./weights/transnormal')"
+```
+Download the DINOv3 ViT-H+/16 backbone separately:
+```bash
+python -c "from huggingface_hub import snapshot_download; snapshot_download('facebook/dinov3-vith16plus-pretrain-lvd1689m', local_dir='./weights/dinov3_vith16plus')"
+```
+Access to DINOv3 may require approval from Meta / Hugging Face. See the [DINOv3 repository](https://github.com/facebookresearch/dinov3) and [Meta AI DINOv3 downloads](https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/) for details.
+## Python Usage
 ```python
 import torch
+from transnormal import TransNormalPipeline, create_dino_encoder, save_normal_map
+device = "cuda"
+dtype = torch.bfloat16
 dino_encoder = create_dino_encoder(
     model_name="dinov3_vith16plus",
+    weights_path="./weights/dinov3_vith16plus",
+    projector_path="./weights/transnormal/cross_attention_projector.pt",
+    device=device,
+    dtype=dtype,
+    freeze_encoder=True,
 )
 pipe = TransNormalPipeline.from_pretrained(
+    "./weights/transnormal",
     dino_encoder=dino_encoder,
+    torch_dtype=dtype,
+    safety_checker=None,
 )
+pipe = pipe.to(device)
+normal_map = pipe(
+    image="path/to/image.jpg",
+    timestep=999,
+    output_type="np",
+)
+save_normal_map(normal_map, "output_normal.png")
+```
+## Command Line Usage
+Single image:
+```bash
+python inference.py \
+    --image path/to/image.jpg \
+    --output normal.png \
+    --model_path ./weights/transnormal \
+    --dino_path ./weights/dinov3_vith16plus \
+    --projector_path ./weights/transnormal/cross_attention_projector.pt \
+    --timestep 999
+```
+Batch inference:
+```bash
+python inference_batch.py \
+    --input_dir ./examples/input \
+    --output_dir ./examples/output \
+    --model_path ./weights/transnormal \
+    --dino_path ./weights/dinov3_vith16plus \
+    --timestep 999
 ```
+## Output Format
+The output is a normal-map visualization in `[0, 1]`, where `0.5` represents zero for each normal component. See the [GitHub README](https://github.com/longxiang-ai/TransNormal#output-format) for the current camera-coordinate convention and saving utilities.
+## Dataset
+The accompanying **TransNormal-Synthetic** dataset is available at:
+https://huggingface.co/datasets/Longxiang-ai/TransNormal-Synthetic
+It provides physics-based rendered transparent labware scenes with RGB images, surface normal maps, depth maps, masks, material variants, and camera metadata.
+## License
+This model is released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). For commercial licensing inquiries, please contact the authors.
 ## Citation
+If you find this work useful, please cite:
 ```bibtex
+@misc{li2026transnormal,
+      title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
+      author={Mingwei Li and Hehe Fan and Yi Yang},
+      year={2026},
+      eprint={2602.00839},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2602.00839},
 }
 ```