Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,3 +1,110 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-to-image
|
| 4 |
+
tags:
|
| 5 |
+
- comfyui
|
| 6 |
+
- image-editing
|
| 7 |
+
- joyai
|
| 8 |
+
- multi-image
|
| 9 |
---
|
| 10 |
+
|
| 11 |
+
# JoyAI-Image-Edit-Plus (ComfyUI weights)
|
| 12 |
+
|
| 13 |
+
Single-file `.safetensors` checkpoints of [JoyAI-Image-Edit-Plus](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers), repackaged for **native ComfyUI** support (no custom node required).
|
| 14 |
+
|
| 15 |
+
JoyAI-Image-Edit-Plus is the multi-image instruction-guided editing model of the [JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image) family. It accepts **1β6 reference images** and a text instruction, and generates a new image that combines elements from the references according to the instruction.
|
| 16 |
+
|
| 17 |
+
## Files
|
| 18 |
+
|
| 19 |
+
| File | Size | Goes into | Component |
|
| 20 |
+
|------|------|-----------|-----------|
|
| 21 |
+
| `diffusion_models/joy_image_edit_plus_bf16.safetensors` | ~31 GB | `ComfyUI/models/diffusion_models/` | `JoyImageEditPlusTransformer3DModel` (bf16) |
|
| 22 |
+
| `text_encoders/qwen3vl_joyimage_bf16.safetensors` | ~17 GB | `ComfyUI/models/text_encoders/` | Qwen3-VL-8B text encoder (bf16) |
|
| 23 |
+
| `vae/joy_image_edit_vae.safetensors` | ~243 MB | `ComfyUI/models/vae/` | `AutoencoderKLWan` |
|
| 24 |
+
|
| 25 |
+
The repo layout already matches `ComfyUI/models/`, so a single `hf download` into your models root drops every file where it needs to go.
|
| 26 |
+
|
| 27 |
+
## Model architecture
|
| 28 |
+
|
| 29 |
+
- **Transformer**: 40-layer DiT, hidden size 4096, 32 heads, in/out channels 16, patch size `[1, 2, 2]`, 3D RoPE (`rope_dim_list = [16, 56, 56]`, theta 10000). Each reference image is patchified independently and concatenated on the sequence dimension with a per-image temporal offset in the 3D RoPE grid, so references may differ in resolution.
|
| 30 |
+
- **Text encoder**: `Qwen3VLForConditionalGeneration` (text dim 4096). The instruction is wrapped with one `<|vision_start|><|image_pad|><|vision_end|>` block per reference image.
|
| 31 |
+
- **VAE**: `AutoencoderKLWan` (z_dim 16, spatial downscale 8, temporal downscale 4) β the same VAE used by the single-image edit model.
|
| 32 |
+
- **Scheduler**: FlowMatch (Euler), sampling shift 1.5.
|
| 33 |
+
|
| 34 |
+
Weight names are byte-identical to the diffusers checkpoint (894 transformer keys, zero renaming); ComfyUI auto-detects the model as `joyimage`.
|
| 35 |
+
|
| 36 |
+
## Installation
|
| 37 |
+
|
| 38 |
+
The model runs natively in ComfyUI. Native support is proposed upstream in [Comfy-Org/ComfyUI#14428](https://github.com/Comfy-Org/ComfyUI/pull/14428); until it is merged, install the fork branch:
|
| 39 |
+
|
| 40 |
+
```bash
|
| 41 |
+
git clone -b joyimage-edit-pr https://github.com/feice-huang/ComfyUI.git
|
| 42 |
+
cd ComfyUI
|
| 43 |
+
pip install -r requirements.txt
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
Once the PR is merged upstream, the stock ComfyUI release will run these weights with no fork needed.
|
| 47 |
+
|
| 48 |
+
Then download the weights straight into `ComfyUI/models/`:
|
| 49 |
+
|
| 50 |
+
```bash
|
| 51 |
+
hf download jdopensource/JoyAI-Image-Edit-Plus-ComfyUI \
|
| 52 |
+
--local-dir /path/to/ComfyUI/models
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
Restart ComfyUI.
|
| 56 |
+
|
| 57 |
+
## Usage
|
| 58 |
+
|
| 59 |
+
Example workflow: [workflow_joyimage_edit.json](https://github.com/user-attachments/files/29588811/workflow_joyimage_edit_plus.json)
|
| 60 |
+
|
| 61 |
+
Build the graph from these native nodes:
|
| 62 |
+
|
| 63 |
+
1. **Load Diffusion Model** (`UNETLoader`) β `diffusion_models/joy_image_edit_plus_bf16.safetensors`
|
| 64 |
+
2. **Load CLIP** (`CLIPLoader`) β `text_encoders/qwen3vl_joyimage_bf16.safetensors`, type `joyimage`
|
| 65 |
+
3. **Load VAE** (`VAELoader`) β `vae/joy_image_edit_vae.safetensors`
|
| 66 |
+
4. **Load Image** (`LoadImage`) for each reference (1β6)
|
| 67 |
+
5. **TextEncodeJoyImageEditPlus** β feed `clip`, `vae`, the instruction, and the reference images into `image1`β¦`image6`. Wire one instance for the positive prompt and one (empty prompt, same images) for the negative. Each node bucket-resizes the references to the 1024-base buckets, VAE-encodes them, and appends the reference latents to the conditioning; its `image` output feeds `VAEDecode` / empty-latent sizing.
|
| 68 |
+
6. **KSampler** β **VAEDecode** β **SaveImage**
|
| 69 |
+
|
| 70 |
+
## Recommended parameters
|
| 71 |
+
|
| 72 |
+
| Parameter | Value |
|
| 73 |
+
|-----------|-------|
|
| 74 |
+
| Steps | 30 |
|
| 75 |
+
| CFG | 4.0 |
|
| 76 |
+
| Sampler | `euler` |
|
| 77 |
+
| Scheduler | `simple` |
|
| 78 |
+
| dtype | bf16 |
|
| 79 |
+
| Resolution | auto (1024-base buckets, per reference) |
|
| 80 |
+
|
| 81 |
+
## Example
|
| 82 |
+
|
| 83 |
+
**Prompt:** "The woman is lovingly holding the cute puppy in her arms"
|
| 84 |
+
|
| 85 |
+
| Input 0 | Input 1 | Output |
|
| 86 |
+
|---------|---------|--------|
|
| 87 |
+
|  |  |  |
|
| 88 |
+
|
| 89 |
+
## Model details
|
| 90 |
+
|
| 91 |
+
- **Developed by**: JD.com
|
| 92 |
+
- **License**: Apache-2.0
|
| 93 |
+
- **Framework**: PyTorch / ComfyUI
|
| 94 |
+
|
| 95 |
+
## Links
|
| 96 |
+
|
| 97 |
+
- Source code and documentation: [github.com/jd-opensource/JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image)
|
| 98 |
+
- Original Diffusers-format weights: [jdopensource/JoyAI-Image-Edit-Plus-Diffusers](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers)
|
| 99 |
+
- Single-image edit model (ComfyUI): [jdopensource/JoyAI-Image-Edit-ComfyUI](https://huggingface.co/jdopensource/JoyAI-Image-Edit-ComfyUI)
|
| 100 |
+
|
| 101 |
+
## Citation
|
| 102 |
+
|
| 103 |
+
```bibtex
|
| 104 |
+
@misc{joyai-image-2025,
|
| 105 |
+
title={JoyAI-Image: A Unified Multimodal Foundation Model for Image Understanding, Generation, and Editing},
|
| 106 |
+
author={Joy Future Academy, JD},
|
| 107 |
+
year={2025},
|
| 108 |
+
url={https://github.com/jd-opensource/JoyAI-Image}
|
| 109 |
+
}
|
| 110 |
+
```
|
diffusion_models/joy_image_edit_plus_bf16.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:80372f34c379b96a1d8cecc57a3816c5ae2c4d3844483016930a7806875c8103
|
| 3 |
+
size 32527457192
|
text_encoders/qwen3vl_joyimage_bf16.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a1eee8233f3063466dd3b06e857b0a2cf13a77c543908374771d51a546beb2e0
|
| 3 |
+
size 17534340584
|
vae/joy_image_edit_vae.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2fc39d31359a4b0a64f55876d8ff7fa8d780956ae2cb13463b0223e15148976b
|
| 3 |
+
size 253815318
|