huangfeice commited on
Commit
d74fa98
Β·
verified Β·
1 Parent(s): 6d034c4

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,110 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-to-image
4
+ tags:
5
+ - comfyui
6
+ - image-editing
7
+ - joyai
8
+ - multi-image
9
  ---
10
+
11
+ # JoyAI-Image-Edit-Plus (ComfyUI weights)
12
+
13
+ Single-file `.safetensors` checkpoints of [JoyAI-Image-Edit-Plus](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers), repackaged for **native ComfyUI** support (no custom node required).
14
+
15
+ JoyAI-Image-Edit-Plus is the multi-image instruction-guided editing model of the [JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image) family. It accepts **1–6 reference images** and a text instruction, and generates a new image that combines elements from the references according to the instruction.
16
+
17
+ ## Files
18
+
19
+ | File | Size | Goes into | Component |
20
+ |------|------|-----------|-----------|
21
+ | `diffusion_models/joy_image_edit_plus_bf16.safetensors` | ~31 GB | `ComfyUI/models/diffusion_models/` | `JoyImageEditPlusTransformer3DModel` (bf16) |
22
+ | `text_encoders/qwen3vl_joyimage_bf16.safetensors` | ~17 GB | `ComfyUI/models/text_encoders/` | Qwen3-VL-8B text encoder (bf16) |
23
+ | `vae/joy_image_edit_vae.safetensors` | ~243 MB | `ComfyUI/models/vae/` | `AutoencoderKLWan` |
24
+
25
+ The repo layout already matches `ComfyUI/models/`, so a single `hf download` into your models root drops every file where it needs to go.
26
+
27
+ ## Model architecture
28
+
29
+ - **Transformer**: 40-layer DiT, hidden size 4096, 32 heads, in/out channels 16, patch size `[1, 2, 2]`, 3D RoPE (`rope_dim_list = [16, 56, 56]`, theta 10000). Each reference image is patchified independently and concatenated on the sequence dimension with a per-image temporal offset in the 3D RoPE grid, so references may differ in resolution.
30
+ - **Text encoder**: `Qwen3VLForConditionalGeneration` (text dim 4096). The instruction is wrapped with one `<|vision_start|><|image_pad|><|vision_end|>` block per reference image.
31
+ - **VAE**: `AutoencoderKLWan` (z_dim 16, spatial downscale 8, temporal downscale 4) β€” the same VAE used by the single-image edit model.
32
+ - **Scheduler**: FlowMatch (Euler), sampling shift 1.5.
33
+
34
+ Weight names are byte-identical to the diffusers checkpoint (894 transformer keys, zero renaming); ComfyUI auto-detects the model as `joyimage`.
35
+
36
+ ## Installation
37
+
38
+ The model runs natively in ComfyUI. Native support is proposed upstream in [Comfy-Org/ComfyUI#14428](https://github.com/Comfy-Org/ComfyUI/pull/14428); until it is merged, install the fork branch:
39
+
40
+ ```bash
41
+ git clone -b joyimage-edit-pr https://github.com/feice-huang/ComfyUI.git
42
+ cd ComfyUI
43
+ pip install -r requirements.txt
44
+ ```
45
+
46
+ Once the PR is merged upstream, the stock ComfyUI release will run these weights with no fork needed.
47
+
48
+ Then download the weights straight into `ComfyUI/models/`:
49
+
50
+ ```bash
51
+ hf download jdopensource/JoyAI-Image-Edit-Plus-ComfyUI \
52
+ --local-dir /path/to/ComfyUI/models
53
+ ```
54
+
55
+ Restart ComfyUI.
56
+
57
+ ## Usage
58
+
59
+ Example workflow: [workflow_joyimage_edit.json](https://github.com/user-attachments/files/29588811/workflow_joyimage_edit_plus.json)
60
+
61
+ Build the graph from these native nodes:
62
+
63
+ 1. **Load Diffusion Model** (`UNETLoader`) β†’ `diffusion_models/joy_image_edit_plus_bf16.safetensors`
64
+ 2. **Load CLIP** (`CLIPLoader`) β†’ `text_encoders/qwen3vl_joyimage_bf16.safetensors`, type `joyimage`
65
+ 3. **Load VAE** (`VAELoader`) β†’ `vae/joy_image_edit_vae.safetensors`
66
+ 4. **Load Image** (`LoadImage`) for each reference (1–6)
67
+ 5. **TextEncodeJoyImageEditPlus** β€” feed `clip`, `vae`, the instruction, and the reference images into `image1`…`image6`. Wire one instance for the positive prompt and one (empty prompt, same images) for the negative. Each node bucket-resizes the references to the 1024-base buckets, VAE-encodes them, and appends the reference latents to the conditioning; its `image` output feeds `VAEDecode` / empty-latent sizing.
68
+ 6. **KSampler** β†’ **VAEDecode** β†’ **SaveImage**
69
+
70
+ ## Recommended parameters
71
+
72
+ | Parameter | Value |
73
+ |-----------|-------|
74
+ | Steps | 30 |
75
+ | CFG | 4.0 |
76
+ | Sampler | `euler` |
77
+ | Scheduler | `simple` |
78
+ | dtype | bf16 |
79
+ | Resolution | auto (1024-base buckets, per reference) |
80
+
81
+ ## Example
82
+
83
+ **Prompt:** "The woman is lovingly holding the cute puppy in her arms"
84
+
85
+ | Input 0 | Input 1 | Output |
86
+ |---------|---------|--------|
87
+ | ![input_0](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers/resolve/main/examples/input_0.png) | ![input_1](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers/resolve/main/examples/input_1.png) | ![output](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers/resolve/main/examples/output.png) |
88
+
89
+ ## Model details
90
+
91
+ - **Developed by**: JD.com
92
+ - **License**: Apache-2.0
93
+ - **Framework**: PyTorch / ComfyUI
94
+
95
+ ## Links
96
+
97
+ - Source code and documentation: [github.com/jd-opensource/JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image)
98
+ - Original Diffusers-format weights: [jdopensource/JoyAI-Image-Edit-Plus-Diffusers](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers)
99
+ - Single-image edit model (ComfyUI): [jdopensource/JoyAI-Image-Edit-ComfyUI](https://huggingface.co/jdopensource/JoyAI-Image-Edit-ComfyUI)
100
+
101
+ ## Citation
102
+
103
+ ```bibtex
104
+ @misc{joyai-image-2025,
105
+ title={JoyAI-Image: A Unified Multimodal Foundation Model for Image Understanding, Generation, and Editing},
106
+ author={Joy Future Academy, JD},
107
+ year={2025},
108
+ url={https://github.com/jd-opensource/JoyAI-Image}
109
+ }
110
+ ```
diffusion_models/joy_image_edit_plus_bf16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:80372f34c379b96a1d8cecc57a3816c5ae2c4d3844483016930a7806875c8103
3
+ size 32527457192
text_encoders/qwen3vl_joyimage_bf16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1eee8233f3063466dd3b06e857b0a2cf13a77c543908374771d51a546beb2e0
3
+ size 17534340584
vae/joy_image_edit_vae.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fc39d31359a4b0a64f55876d8ff7fa8d780956ae2cb13463b0223e15148976b
3
+ size 253815318