| | --- |
| | |
| | frameworks: |
| | - Pytorch |
| | license: apache-2.0 |
| | tags: [] |
| | tasks: |
| | - text-to-image-synthesis |
| | base_model: |
| | - Tongyi-MAI/Z-Image |
| | base_model_relation: adapter |
| | --- |
| | ## 模型介绍 |
| |
|
| | i2L (Image to LoRA) 模型是我们以疯狂的思路设计的模型结构。模型的输入为一张图片,输出为这张图片训练出的 LoRA 模型。本模型基于我们之前的 Qwen-Image-i2L([模型](https://modelscope.cn/models/DiffSynth-Studio/Qwen-Image-i2L)、[技术博客](https://modelscope.cn/learn/3343)),进一步完善并迁移到 [Z-Image](https://modelscope.cn/models/Tongyi-MAI/Z-Image),着重增强了模型的风格保持能力。 |
| |
|
| | 为保证生成的图像质量,我们建议按以下参数使用本模型产生的 LoRA 模型: |
| |
|
| | * 使用负向提示词 |
| | * 中文:`"泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符"` |
| | * 英文:`"Yellowed, green-tinted, blurry, low-resolution, low-quality image, distorted limbs, eerie appearance, ugly, AI-looking, noise, grid-like artifacts, JPEG compression artifacts, abnormal limbs, watermark, garbled text, meaningless characters"` |
| | * `cfg_scale = 4` |
| | * `sigma_shift = 8` |
| | * 仅在正向提示词侧启用 LoRA,在负向提示词侧关闭 LoRA,这会提升图像质量 |
| |
|
| | 在线体验:https://modelscope.cn/studios/DiffSynth-Studio/Z-Image-i2L |
| |
|
| | ## 效果展示 |
| |
|
| | Z-Image-i2L 模型可用于快速生成风格 LoRA,只需输入几张风格统一的图像。以下是我们生成的结果,随机种子都是 0。 |
| |
|
| | ### 风格1:水彩绘画 |
| |
|
| | 输入图像: |
| |
|
| | ||||| |
| | |-|-|-|-| |
| |
|
| | 生成图像: |
| |
|
| | |a cat|a dog|a girl| |
| | |-|-|-| |
| | |||| |
| |
|
| | ### 风格2:写实细节 |
| |
|
| | 输入图像: |
| |
|
| | |||||| |
| | |-|-|-|-|-| |
| |
|
| | 生成图像: |
| |
|
| | |a cat|a dog|a girl| |
| | |-|-|-| |
| | |||| |
| |
|
| | ### 风格3:缤纷色块 |
| |
|
| | 输入图像: |
| |
|
| | ||||||| |
| | |-|-|-|-|-|-| |
| |
|
| | 生成图像: |
| |
|
| | |a cat|a dog|a girl| |
| | |-|-|-| |
| | |||| |
| |
|
| | ### 风格4:鲜花少女 |
| |
|
| | 输入图像: |
| |
|
| | ||||| |
| | |-|-|-|-| |
| |
|
| | 生成图像: |
| |
|
| | |a cat|a dog|a girl| |
| | |-|-|-| |
| | |||| |
| |
|
| | ### 风格5:黑白简约 |
| |
|
| | 输入图像: |
| |
|
| | ||||| |
| | |-|-|-|-| |
| |
|
| | 生成图像: |
| |
|
| | |a cat|a dog|a girl| |
| | |-|-|-| |
| | |||| |
| |
|
| | ### 风格6:幻想世界 |
| |
|
| | 输入图像: |
| |
|
| | ||||||| |
| | |-|-|-|-|-|-| |
| |
|
| | 生成图像: |
| |
|
| | |a cat|a dog|a girl| |
| | |-|-|-| |
| | |||| |
| |
|
| | ## 推理代码 |
| |
|
| | 安装 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio): |
| |
|
| | ```shell |
| | git clone https://github.com/modelscope/DiffSynth-Studio.git |
| | cd DiffSynth-Studio |
| | pip install -e . |
| | ``` |
| |
|
| | 模型推理: |
| |
|
| | ```python |
| | from diffsynth.pipelines.z_image import ( |
| | ZImagePipeline, ModelConfig, |
| | ZImageUnit_Image2LoRAEncode, ZImageUnit_Image2LoRADecode |
| | ) |
| | from modelscope import snapshot_download |
| | from safetensors.torch import save_file |
| | import torch |
| | from PIL import Image |
| | |
| | # Use `vram_config` to enable LoRA hot-loading |
| | vram_config = { |
| | "offload_dtype": torch.bfloat16, |
| | "offload_device": "cuda", |
| | "onload_dtype": torch.bfloat16, |
| | "onload_device": "cuda", |
| | "preparing_dtype": torch.bfloat16, |
| | "preparing_device": "cuda", |
| | "computation_dtype": torch.bfloat16, |
| | "computation_device": "cuda", |
| | } |
| | |
| | # Load models |
| | pipe = ZImagePipeline.from_pretrained( |
| | torch_dtype=torch.bfloat16, |
| | device="cuda", |
| | model_configs=[ |
| | ModelConfig(model_id="Tongyi-MAI/Z-Image", origin_file_pattern="transformer/*.safetensors", **vram_config), |
| | ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="text_encoder/*.safetensors"), |
| | ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"), |
| | ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="SigLIP2-G384/model.safetensors"), |
| | ModelConfig(model_id="DiffSynth-Studio/General-Image-Encoders", origin_file_pattern="DINOv3-7B/model.safetensors"), |
| | ModelConfig(model_id="DiffSynth-Studio/Z-Image-i2L", origin_file_pattern="model.safetensors"), |
| | ], |
| | tokenizer_config=ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="tokenizer/"), |
| | ) |
| | |
| | # Load images |
| | snapshot_download( |
| | model_id="DiffSynth-Studio/Z-Image-i2L", |
| | allow_file_pattern="assets/style/*", |
| | local_dir="data/Z-Image-i2L_style_input" |
| | ) |
| | images = [Image.open(f"data/Z-Image-i2L_style_input/assets/style/1/{i}.jpg") for i in range(4)] |
| | |
| | # Image to LoRA |
| | with torch.no_grad(): |
| | embs = ZImageUnit_Image2LoRAEncode().process(pipe, image2lora_images=images) |
| | lora = ZImageUnit_Image2LoRADecode().process(pipe, **embs)["lora"] |
| | save_file(lora, "lora.safetensors") |
| | |
| | # Generate images |
| | prompt = "a cat" |
| | negative_prompt = "泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符" |
| | image = pipe( |
| | prompt=prompt, |
| | negative_prompt=negative_prompt, |
| | seed=0, cfg_scale=4, num_inference_steps=50, |
| | positive_only_lora=lora, |
| | sigma_shift=8 |
| | ) |
| | image.save("image.jpg") |
| | ``` |
| |
|
| |
|