Update README.md
Browse filesAdd technical details.
README.md
CHANGED
|
@@ -30,6 +30,23 @@ You can find my bitpoet-ideogram4-refimages branch [here on GitHub](https://gith
|
|
| 30 |
|
| 31 |
It also includes a fix for the UTF-8 / ANSII error lately popping up on Windows that has jobs fail at startup.
|
| 32 |
|
| 33 |
-
Note that this AI-Toolkit adaption is targeted at Ideogram 4 with reference images and JSON prompts in the dataset editor, so you may not be able to use it to
|
|
|
|
| 34 |
|
| 35 |
-
I will add a small example dataset at some point.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
It also includes a fix for the UTF-8 / ANSII error lately popping up on Windows that has jobs fail at startup.
|
| 32 |
|
| 33 |
+
Note that this AI-Toolkit adaption is targeted at Ideogram 4 with reference images and JSON prompts in the dataset editor, so you may not be able to use it to
|
| 34 |
+
train regular LoRAs.
|
| 35 |
|
| 36 |
+
I will add a small example dataset at some point.
|
| 37 |
+
|
| 38 |
+
### Buzzwords (technical details)
|
| 39 |
+
|
| 40 |
+
What we changed in AI-Toolkit besides the dataset editor:
|
| 41 |
+
|
| 42 |
+
We added reference-latent token concatenation for Ideogram 4: each clean reference image is VAE-encoded and appended to the packed sequence as
|
| 43 |
+
[text | noisy target | clean reference], with its own indicator, MRoPE time coordinate, and clean timestep. The transformer output and
|
| 44 |
+
diffusion loss are sliced to target tokens only, while bounding-box JSON prompts provide spatial edit conditioning.
|
| 45 |
+
|
| 46 |
+
These changes have to be mirrored in ComfyUI as well:
|
| 47 |
+
|
| 48 |
+
ComfyUI core: Extended the native Ideogram 4 model to accept reference latents and reproduce the training sequence [text | noisy output | clean reference],
|
| 49 |
+
including the separate indicator, MRoPE coordinate, clean timestep, and output-only prediction slicing.
|
| 50 |
+
|
| 51 |
+
Custom node: Ideogram4ReferenceConditioning resizes and VAE-encodes a reference image to match the target latent, then attaches it only to positive
|
| 52 |
+
conditioning so the separate unconditional model remains unchanged.
|