MatFuse / README.md
gvecchio's picture
Add model
5b8131f
---
license: mit
library_name: diffusers
pipeline_tag: text-to-image
language:
- en
tags:
- diffusers
- matfuse
- pbr
- material-generation
- svbrdf
- text-to-image
---
# MatFuse — Controllable Material Generation with Diffusion Models
MatFuse generates tileable PBR material maps (diffuse, normal, roughness,
specular) from text, reference images, sketches, and/or color palettes.
> **Paper:** [MatFuse: Controllable Material Generation with Diffusion Models](https://arxiv.org/abs/2308.11408) — CVPR 2024
> **Project page:** <https://gvecchio.com/matfuse/>
## Quick Start
```python
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"gvecchio/MatFuse",
trust_remote_code=True,
torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")
result = pipe(
text="red brick wall",
num_inference_steps=50,
guidance_scale=4.0,
generator=torch.Generator("cuda").manual_seed(42),
)
result["diffuse"][0].save("diffuse.png")
result["normal"][0].save("normal.png")
result["roughness"][0].save("roughness.png")
result["specular"][0].save("specular.png")
```
## Conditioning Inputs
All conditions are **optional** and freely composable:
| Input | Type | Description |
|-------|------|-------------|
| `text` | `str` | Text description of the material |
| `image` | `PIL.Image` | Reference image for style/appearance |
| `sketch` | `PIL.Image` (grayscale) | Binary edge map for structure |
| `palette` | `list[tuple]` | Up to 5 RGB colour tuples (0–255) |
```python
from PIL import Image
result = pipe(
image=Image.open("reference.png"),
text="rough stone texture",
palette=[(120, 80, 60), (90, 60, 40), (150, 110, 80), (70, 50, 30), (180, 140, 100)],
num_inference_steps=50,
guidance_scale=4.0,
)
```
## Architecture
| Component | Class | Key parameters |
|-----------|-------|----------------|
| **UNet** | `UNet2DConditionModel` | in=16, out=12, blocks=[256,512,1024], cross_attn=512 |
| **VAE** | `MatFuseVQModel` (custom) | 4 encoders + 4 VQ codebooks (4096×3), shared decoder, f=8 |
| **Scheduler** | `DDIMScheduler` | β 0.0015–0.0195, scaled_linear, ε-prediction |
| **Conditioning** | `MultiConditionEncoder` (custom) | CLIP ViT-B/16 · sentence-transformers · palette MLP · sketch CNN |
## 📜 Citation
```bibtex
@inproceedings{vecchio2024matfuse,
author = {Vecchio, Giuseppe and Sortino, Renato and Palazzo, Simone and Spampinato, Concetto},
title = {MatFuse: Controllable Material Generation with Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {4429-4438}
}
```
## License
This project is licensed under the MIT License.