| --- |
| license: apache-2.0 |
| datasets: |
| - opendiffusionai/laion2b-squareish-1536px |
| base_model: |
| - Tongyi-MAI/Z-Image |
| tags: |
| - z-image |
| - controlnet |
| thumbnail: https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/resolve/main/assets/stacked_vertical.png |
| --- |
| |
| # Z-Image-SAM-ControlNet |
|  |
| ## Fun Facts |
| - This ControlNet is trained exclusively on images generated by [Segment Anything (SAM)](https://aidemos.meta.com/segment-anything/) |
| - Base model used was [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image) |
| - Uses SAM style images as input, outputs photorealistic images |
| - Trained at 1024x1024 resolution, inference works best at 1.5k and up |
| - Trained on 220K segmented images from [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px) |
| - Trained using this repo: [https://github.com/aigc-apps/VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun) |
|
|
| # Showcases |
| <table style="width:100%; table-layout:fixed;"> |
| <tr> |
| <td><img src="./assets/resized_kitten_seg.png" ></td> |
| <td><img src="./assets/resized_kitten.png" ></td> |
| </tr> |
| <tr> |
| <td><img src="./assets/resized_dread_girl_seg.png" ></td> |
| <td><img src="./assets/resized_dread_girl.png" ></td> |
| </tr> |
| <tr> |
| <td><img src="./assets/resized_house_seg.png" ></td> |
| <td><img src="./assets/resized_house.png" ></td> |
| </tr> |
| </table> |
| |
| # ComfyUI Usage |
| 1) Copy the weights from [comfy-ui-patch/z-image-sam-controlnet.safetensors](comfy-ui-patch/z-image-sam-controlnet.safetensors) to `ComfyUI/models/model_patches` |
| 2) Use `ModelPatchLoader` to load the patch |
| 3) Plug `MODEL_PATCH` into `model_patch` on `ZImageFunControlnet` |
| 4) Plug the model, VAE and image into `ZImageFunControlnet` |
| 5) Plug the `ZImageFunControlnet` into KSampler |
|  |
|
|
| ## Add Auto Segmentation (optional) |
| 1) Use the ComfyUI Manager to add [ComfyUI-segment-anything-2](https://github.com/kijai/ComfyUI-segment-anything-2) |
| 2) Use `Sam2AutoSegmentation` node to create segmented image |
|
|
| Here's an example workflow json: [comfy-ui-patch/z-image-control.json](https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/blob/main/comfy-ui-patch/z-image-control.json) (includes option which performs segmentation first) |
| # Hugging Face Usage |
| |
| ## Compatibility |
| ```py |
| pip install -U diffusers==0.37.0 |
| ``` |
|
|
| ## Download |
| ```bash |
| sudo apt-get install git-lfs |
| git lfs install |
| |
| git clone https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet |
| |
| cd Z-Image-SAM-ControlNet |
| ``` |
|
|
| ## Inference |
| ```python |
| import torch |
| from diffusers.utils import load_image |
| from diffusers_local.pipeline_z_image_control_unified import ZImageControlUnifiedPipeline |
| from diffusers_local.z_image_control_transformer_2d import ZImageControlTransformer2DModel |
| |
| transformer = ZImageControlTransformer2DModel.from_pretrained( |
| ".", |
| torch_dtype=torch.bfloat16, |
| use_safetensors=True, |
| add_control_noise_refiner=True, |
| ) |
| |
| pipe = ZImageControlUnifiedPipeline.from_pretrained( |
| "Tongyi-MAI/Z-Image", |
| torch_dtype=torch.bfloat16, |
| transformer=transformer, |
| ) |
| |
| pipe.enable_model_cpu_offload() |
| |
| image = pipe( |
| prompt="some beach wood washed up on the sunny sand, spelling the words z-image, with footprints and waves crashing", |
| negative_prompt="低分辨率,低画质,肢体畸形,手指畸形,画面过饱和,蜡像感,人脸无细节,过度光滑,画面具有AI感。构图混乱。文字模糊,扭曲。", |
| control_image=load_image("assets/z-image.png"), |
| height=1024, |
| width=1024, |
| num_inference_steps=50, |
| guidance_scale=4.0, |
| controlnet_conditioning_scale=1.0, |
| generator= torch.Generator("cuda").manual_seed(45), |
| ).images[0] |
| |
| image.save("output.png") |
| image |
| ``` |
|
|