| | ---
|
| | license: mit
|
| | library_name: diffusers
|
| | pipeline_tag: text-to-image
|
| | language:
|
| | - en
|
| | tags:
|
| | - diffusers
|
| | - matfuse
|
| | - pbr
|
| | - material-generation
|
| | - svbrdf
|
| | - text-to-image
|
| | ---
|
| |
|
| |
|
| | # MatFuse — Controllable Material Generation with Diffusion Models
|
| |
|
| | MatFuse generates tileable PBR material maps (diffuse, normal, roughness,
|
| | specular) from text, reference images, sketches, and/or color palettes.
|
| |
|
| | > **Paper:** [MatFuse: Controllable Material Generation with Diffusion Models](https://arxiv.org/abs/2308.11408) — CVPR 2024
|
| | > **Project page:** <https://gvecchio.com/matfuse/>
|
| |
|
| | ## Quick Start
|
| |
|
| | ```python
|
| | import torch
|
| | from diffusers import DiffusionPipeline
|
| |
|
| | pipe = DiffusionPipeline.from_pretrained(
|
| | "gvecchio/MatFuse",
|
| | trust_remote_code=True,
|
| | torch_dtype=torch.float16,
|
| | )
|
| | pipe = pipe.to("cuda")
|
| |
|
| | result = pipe(
|
| | text="red brick wall",
|
| | num_inference_steps=50,
|
| | guidance_scale=4.0,
|
| | generator=torch.Generator("cuda").manual_seed(42),
|
| | )
|
| |
|
| | result["diffuse"][0].save("diffuse.png")
|
| | result["normal"][0].save("normal.png")
|
| | result["roughness"][0].save("roughness.png")
|
| | result["specular"][0].save("specular.png")
|
| | ```
|
| |
|
| | ## Conditioning Inputs
|
| |
|
| | All conditions are **optional** and freely composable:
|
| |
|
| | | Input | Type | Description |
|
| | |-------|------|-------------|
|
| | | `text` | `str` | Text description of the material |
|
| | | `image` | `PIL.Image` | Reference image for style/appearance |
|
| | | `sketch` | `PIL.Image` (grayscale) | Binary edge map for structure |
|
| | | `palette` | `list[tuple]` | Up to 5 RGB colour tuples (0–255) |
|
| |
|
| | ```python
|
| | from PIL import Image
|
| |
|
| | result = pipe(
|
| | image=Image.open("reference.png"),
|
| | text="rough stone texture",
|
| | palette=[(120, 80, 60), (90, 60, 40), (150, 110, 80), (70, 50, 30), (180, 140, 100)],
|
| | num_inference_steps=50,
|
| | guidance_scale=4.0,
|
| | )
|
| | ```
|
| |
|
| | ## Architecture
|
| |
|
| | | Component | Class | Key parameters |
|
| | |-----------|-------|----------------|
|
| | | **UNet** | `UNet2DConditionModel` | in=16, out=12, blocks=[256,512,1024], cross_attn=512 |
|
| | | **VAE** | `MatFuseVQModel` (custom) | 4 encoders + 4 VQ codebooks (4096×3), shared decoder, f=8 |
|
| | | **Scheduler** | `DDIMScheduler` | β 0.0015–0.0195, scaled_linear, ε-prediction |
|
| | | **Conditioning** | `MultiConditionEncoder` (custom) | CLIP ViT-B/16 · sentence-transformers · palette MLP · sketch CNN |
|
| |
|
| | ## 📜 Citation
|
| |
|
| | ```bibtex
|
| | @inproceedings{vecchio2024matfuse,
|
| | author = {Vecchio, Giuseppe and Sortino, Renato and Palazzo, Simone and Spampinato, Concetto},
|
| | title = {MatFuse: Controllable Material Generation with Diffusion Models},
|
| | booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
|
| | month = {June},
|
| | year = {2024},
|
| | pages = {4429-4438}
|
| | }
|
| | ```
|
| |
|
| | ## License
|
| |
|
| | This project is licensed under the MIT License. |