Update README.md

d360f29 verified about 13 hours ago

3.74 kB

	---
	license: apache-2.0
	datasets:
	- opendiffusionai/laion2b-squareish-1536px
	base_model:
	- Tongyi-MAI/Z-Image
	tags:
	- z-image
	- controlnet
	thumbnail: https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/resolve/main/assets/stacked_vertical.png
	---

	# Z-Image-SAM-ControlNet
	![side by side](assets/side_by_side_d.png)
	## Fun Facts
	- This ControlNet is trained exclusively on images generated by [Segment Anything (SAM)](https://aidemos.meta.com/segment-anything/)
	- Base model used was [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image)
	- Uses SAM style images as input, outputs photorealistic images
	- Trained at 1024x1024 resolution, inference works best at 1.5k and up
	- Trained on 220K segmented images from [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px)
	- Trained using this repo: [https://github.com/aigc-apps/VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun)

	# Showcases
	<table style="width:100%; table-layout:fixed;">
	<tr>
	<td><img src="./assets/resized_kitten_seg.png" ></td>
	<td><img src="./assets/resized_kitten.png" ></td>
	</tr>
	<tr>
	<td><img src="./assets/resized_dread_girl_seg.png" ></td>
	<td><img src="./assets/resized_dread_girl.png" ></td>
	</tr>
	<tr>
	<td><img src="./assets/resized_house_seg.png" ></td>
	<td><img src="./assets/resized_house.png" ></td>
	</tr>
	</table>

	# ComfyUI Usage
	1) Copy the weights from [comfy-ui-patch/z-image-sam-controlnet.safetensors](comfy-ui-patch/z-image-sam-controlnet.safetensors) to `ComfyUI/models/model_patches`
	2) Use `ModelPatchLoader` to load the patch
	3) Plug `MODEL_PATCH` into `model_patch` on `ZImageFunControlnet`
	4) Plug the model, VAE and image into `ZImageFunControlnet`
	5) Plug the `ZImageFunControlnet` into KSampler
	![videoXFun Nodes](assets/comfyui.png)

	## Add Auto Segmentation (optional)
	1) Use the ComfyUI Manager to add [ComfyUI-segment-anything-2](https://github.com/kijai/ComfyUI-segment-anything-2)
	2) Use `Sam2AutoSegmentation` node to create segmented image

	Here's an example workflow json: [comfy-ui-patch/z-image-control.json](https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/blob/main/comfy-ui-patch/z-image-control.json) (includes option which performs segmentation first)
	# Hugging Face Usage

	## Compatibility
	```py
	pip install -U diffusers==0.37.0
	```

	## Download
	```bash
	sudo apt-get install git-lfs
	git lfs install

	git clone https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet

	cd Z-Image-SAM-ControlNet
	```

	## Inference
	```python
	import torch
	from diffusers.utils import load_image
	from diffusers_local.pipeline_z_image_control_unified import ZImageControlUnifiedPipeline
	from diffusers_local.z_image_control_transformer_2d import ZImageControlTransformer2DModel

	transformer = ZImageControlTransformer2DModel.from_pretrained(
	".",
	torch_dtype=torch.bfloat16,
	use_safetensors=True,
	add_control_noise_refiner=True,
	)

	pipe = ZImageControlUnifiedPipeline.from_pretrained(
	"Tongyi-MAI/Z-Image",
	torch_dtype=torch.bfloat16,
	transformer=transformer,
	)

	pipe.enable_model_cpu_offload()

	image = pipe(
	prompt="some beach wood washed up on the sunny sand, spelling the words z-image, with footprints and waves crashing",
	negative_prompt="低分辨率，低画质，肢体畸形，手指畸形，画面过饱和，蜡像感，人脸无细节，过度光滑，画面具有AI感。构图混乱。文字模糊，扭曲。",
	control_image=load_image("assets/z-image.png"),
	height=1024,
	width=1024,
	num_inference_steps=50,
	guidance_scale=4.0,
	controlnet_conditioning_scale=1.0,
	generator= torch.Generator("cuda").manual_seed(45),
	).images[0]

	image.save("output.png")
	image
	```