Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +107 -0
diffusion_models/joy_image_edit_plus_bf16.safetensors +3 -0
text_encoders/qwen3vl_joyimage_bf16.safetensors +3 -0
vae/joy_image_edit_vae.safetensors +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,110 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+pipeline_tag: image-to-image
+tags:
+  - comfyui
+  - image-editing
+  - joyai
+  - multi-image
 ---
+# JoyAI-Image-Edit-Plus (ComfyUI weights)
+Single-file `.safetensors` checkpoints of [JoyAI-Image-Edit-Plus](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers), repackaged for **native ComfyUI** support (no custom node required).
+JoyAI-Image-Edit-Plus is the multi-image instruction-guided editing model of the [JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image) family. It accepts **1–6 reference images** and a text instruction, and generates a new image that combines elements from the references according to the instruction.
+## Files
+| File | Size | Goes into | Component |
+|------|------|-----------|-----------|
+| `diffusion_models/joy_image_edit_plus_bf16.safetensors` | ~31 GB | `ComfyUI/models/diffusion_models/` | `JoyImageEditPlusTransformer3DModel` (bf16) |
+| `text_encoders/qwen3vl_joyimage_bf16.safetensors` | ~17 GB | `ComfyUI/models/text_encoders/` | Qwen3-VL-8B text encoder (bf16) |
+| `vae/joy_image_edit_vae.safetensors` | ~243 MB | `ComfyUI/models/vae/` | `AutoencoderKLWan` |
+The repo layout already matches `ComfyUI/models/`, so a single `hf download` into your models root drops every file where it needs to go.
+## Model architecture
+- **Transformer**: 40-layer DiT, hidden size 4096, 32 heads, in/out channels 16, patch size `[1, 2, 2]`, 3D RoPE (`rope_dim_list = [16, 56, 56]`, theta 10000). Each reference image is patchified independently and concatenated on the sequence dimension with a per-image temporal offset in the 3D RoPE grid, so references may differ in resolution.
+- **Text encoder**: `Qwen3VLForConditionalGeneration` (text dim 4096). The instruction is wrapped with one `<|vision_start|><|image_pad|><|vision_end|>` block per reference image.
+- **VAE**: `AutoencoderKLWan` (z_dim 16, spatial downscale 8, temporal downscale 4) — the same VAE used by the single-image edit model.
+- **Scheduler**: FlowMatch (Euler), sampling shift 1.5.
+Weight names are byte-identical to the diffusers checkpoint (894 transformer keys, zero renaming); ComfyUI auto-detects the model as `joyimage`.
+## Installation
+The model runs natively in ComfyUI. Native support is proposed upstream in [Comfy-Org/ComfyUI#14428](https://github.com/Comfy-Org/ComfyUI/pull/14428); until it is merged, install the fork branch:
+```bash
+git clone -b joyimage-edit-pr https://github.com/feice-huang/ComfyUI.git
+cd ComfyUI
+pip install -r requirements.txt
+```
+Once the PR is merged upstream, the stock ComfyUI release will run these weights with no fork needed.
+Then download the weights straight into `ComfyUI/models/`:
+```bash
+hf download jdopensource/JoyAI-Image-Edit-Plus-ComfyUI \
+  --local-dir /path/to/ComfyUI/models
+```
+Restart ComfyUI.
+## Usage
+Example workflow: [workflow_joyimage_edit.json](https://github.com/user-attachments/files/29588811/workflow_joyimage_edit_plus.json)
+Build the graph from these native nodes:
+1. **Load Diffusion Model** (`UNETLoader`) → `diffusion_models/joy_image_edit_plus_bf16.safetensors`
+2. **Load CLIP** (`CLIPLoader`) → `text_encoders/qwen3vl_joyimage_bf16.safetensors`, type `joyimage`
+3. **Load VAE** (`VAELoader`) → `vae/joy_image_edit_vae.safetensors`
+4. **Load Image** (`LoadImage`) for each reference (1–6)
+5. **TextEncodeJoyImageEditPlus** — feed `clip`, `vae`, the instruction, and the reference images into `image1`…`image6`. Wire one instance for the positive prompt and one (empty prompt, same images) for the negative. Each node bucket-resizes the references to the 1024-base buckets, VAE-encodes them, and appends the reference latents to the conditioning; its `image` output feeds `VAEDecode` / empty-latent sizing.
+6. **KSampler** → **VAEDecode** → **SaveImage**
+## Recommended parameters
+| Parameter | Value |
+|-----------|-------|
+| Steps | 30 |
+| CFG | 4.0 |
+| Sampler | `euler` |
+| Scheduler | `simple` |
+| dtype | bf16 |
+| Resolution | auto (1024-base buckets, per reference) |
+## Example
+**Prompt:** "The woman is lovingly holding the cute puppy in her arms"
+| Input 0 | Input 1 | Output |
+|---------|---------|--------|
+| ![input_0](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers/resolve/main/examples/input_0.png) | ![input_1](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers/resolve/main/examples/input_1.png) | ![output](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers/resolve/main/examples/output.png) |
+## Model details
+- **Developed by**: JD.com
+- **License**: Apache-2.0
+- **Framework**: PyTorch / ComfyUI
+## Links
+- Source code and documentation: [github.com/jd-opensource/JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image)
+- Original Diffusers-format weights: [jdopensource/JoyAI-Image-Edit-Plus-Diffusers](https://huggingface.co/jdopensource/JoyAI-Image-Edit-Plus-Diffusers)
+- Single-image edit model (ComfyUI): [jdopensource/JoyAI-Image-Edit-ComfyUI](https://huggingface.co/jdopensource/JoyAI-Image-Edit-ComfyUI)
+## Citation
+```bibtex
+@misc{joyai-image-2025,
+  title={JoyAI-Image: A Unified Multimodal Foundation Model for Image Understanding, Generation, and Editing},
+  author={Joy Future Academy, JD},
+  year={2025},
+  url={https://github.com/jd-opensource/JoyAI-Image}
+}
+```

diffusion_models/joy_image_edit_plus_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80372f34c379b96a1d8cecc57a3816c5ae2c4d3844483016930a7806875c8103
+size 32527457192

text_encoders/qwen3vl_joyimage_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a1eee8233f3063466dd3b06e857b0a2cf13a77c543908374771d51a546beb2e0
+size 17534340584

vae/joy_image_edit_vae.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2fc39d31359a4b0a64f55876d8ff7fa8d780956ae2cb13463b0223e15148976b
+size 253815318