gvecchio
/

MatFuse

material-generation

Model card Files Files and versions

MatFuse / README.md

gvecchio's picture

Add model

5b8131f 20 days ago

|

history blame contribute delete

2.88 kB

	---
	license: mit
	library_name: diffusers
	pipeline_tag: text-to-image
	language:
	- en
	tags:
	- diffusers
	- matfuse
	- pbr
	- material-generation
	- svbrdf
	- text-to-image
	---


	# MatFuse — Controllable Material Generation with Diffusion Models

	MatFuse generates tileable PBR material maps (diffuse, normal, roughness,
	specular) from text, reference images, sketches, and/or color palettes.

	> Paper: [MatFuse: Controllable Material Generation with Diffusion Models](https://arxiv.org/abs/2308.11408) — CVPR 2024
	> Project page: <https://gvecchio.com/matfuse/>

	## Quick Start

	```python
	import torch
	from diffusers import DiffusionPipeline

	pipe = DiffusionPipeline.from_pretrained(
	"gvecchio/MatFuse",
	trust_remote_code=True,
	torch_dtype=torch.float16,
	)
	pipe = pipe.to("cuda")

	result = pipe(
	text="red brick wall",
	num_inference_steps=50,
	guidance_scale=4.0,
	generator=torch.Generator("cuda").manual_seed(42),
	)

	result["diffuse"][0].save("diffuse.png")
	result["normal"][0].save("normal.png")
	result["roughness"][0].save("roughness.png")
	result["specular"][0].save("specular.png")
	```

	## Conditioning Inputs

	All conditions are optional and freely composable:

	\| Input \| Type \| Description \|
	\|-------\|------\|-------------\|
	\| `text` \| `str` \| Text description of the material \|
	\| `image` \| `PIL.Image` \| Reference image for style/appearance \|
	\| `sketch` \| `PIL.Image` (grayscale) \| Binary edge map for structure \|
	\| `palette` \| `list[tuple]` \| Up to 5 RGB colour tuples (0–255) \|

	```python
	from PIL import Image

	result = pipe(
	image=Image.open("reference.png"),
	text="rough stone texture",
	palette=[(120, 80, 60), (90, 60, 40), (150, 110, 80), (70, 50, 30), (180, 140, 100)],
	num_inference_steps=50,
	guidance_scale=4.0,
	)
	```

	## Architecture

	\| Component \| Class \| Key parameters \|
	\|-----------\|-------\|----------------\|
	\| UNet \| `UNet2DConditionModel` \| in=16, out=12, blocks=[256,512,1024], cross_attn=512 \|
	\| VAE \| `MatFuseVQModel` (custom) \| 4 encoders + 4 VQ codebooks (4096×3), shared decoder, f=8 \|
	\| Scheduler \| `DDIMScheduler` \| β 0.0015–0.0195, scaled_linear, ε-prediction \|
	\| Conditioning \| `MultiConditionEncoder` (custom) \| CLIP ViT-B/16 · sentence-transformers · palette MLP · sketch CNN \|

	## 📜 Citation

	```bibtex
	@inproceedings{vecchio2024matfuse,
	author = {Vecchio, Giuseppe and Sortino, Renato and Palazzo, Simone and Spampinato, Concetto},
	title = {MatFuse: Controllable Material Generation with Diffusion Models},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	month = {June},
	year = {2024},
	pages = {4429-4438}
	}
	```

	## License

	This project is licensed under the MIT License.