--- license: apache-2.0 datasets: - opendiffusionai/laion2b-squareish-1536px base_model: - Tongyi-MAI/Z-Image tags: - z-image - controlnet thumbnail: https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/resolve/main/assets/stacked_vertical.png --- # Z-Image-SAM-ControlNet ![side by side](assets/side_by_side_d.png) ## Fun Facts - This ControlNet is trained exclusively on images generated by [Segment Anything (SAM)](https://aidemos.meta.com/segment-anything/) - Base model used was [Tongyi-MAI/Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image) - Uses SAM style images as input, outputs photorealistic images - Trained at 1024x1024 resolution, inference works best at 1.5k and up - Trained on 220K segmented images from [laion2b-squareish-1536px](https://huggingface.co/datasets/opendiffusionai/laion2b-squareish-1536px) - Trained using this repo: [https://github.com/aigc-apps/VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun) # Showcases
# ComfyUI Usage 1) Copy the weights from [comfy-ui-patch/z-image-sam-controlnet.safetensors](comfy-ui-patch/z-image-sam-controlnet.safetensors) to `ComfyUI/models/model_patches` 2) Use `ModelPatchLoader` to load the patch 3) Plug `MODEL_PATCH` into `model_patch` on `ZImageFunControlnet` 4) Plug the model, VAE and image into `ZImageFunControlnet` 5) Plug the `ZImageFunControlnet` into KSampler ![videoXFun Nodes](assets/comfyui.png) ## Add Auto Segmentation (optional) 1) Use the ComfyUI Manager to add [ComfyUI-segment-anything-2](https://github.com/kijai/ComfyUI-segment-anything-2) 2) Use `Sam2AutoSegmentation` node to create segmented image Here's an example workflow json: [comfy-ui-patch/z-image-control.json](https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet/blob/main/comfy-ui-patch/z-image-control.json) (includes option which performs segmentation first) # Hugging Face Usage ## Compatibility ```py pip install -U diffusers==0.37.0 ``` ## Download ```bash sudo apt-get install git-lfs git lfs install git clone https://huggingface.co/neuralvfx/Z-Image-SAM-ControlNet cd Z-Image-SAM-ControlNet ``` ## Inference ```python import torch from diffusers.utils import load_image from diffusers_local.pipeline_z_image_control_unified import ZImageControlUnifiedPipeline from diffusers_local.z_image_control_transformer_2d import ZImageControlTransformer2DModel transformer = ZImageControlTransformer2DModel.from_pretrained( ".", torch_dtype=torch.bfloat16, use_safetensors=True, add_control_noise_refiner=True, ) pipe = ZImageControlUnifiedPipeline.from_pretrained( "Tongyi-MAI/Z-Image", torch_dtype=torch.bfloat16, transformer=transformer, ) pipe.enable_model_cpu_offload() image = pipe( prompt="some beach wood washed up on the sunny sand, spelling the words z-image, with footprints and waves crashing", negative_prompt="低分辨率,低画质,肢体畸形,手指畸形,画面过饱和,蜡像感,人脸无细节,过度光滑,画面具有AI感。构图混乱。文字模糊,扭曲。", control_image=load_image("assets/z-image.png"), height=1024, width=1024, num_inference_steps=50, guidance_scale=4.0, controlnet_conditioning_scale=1.0, generator= torch.Generator("cuda").manual_seed(45), ).images[0] image.save("output.png") image ```