BitPoet commited on
Commit
8717741
·
verified ·
1 Parent(s): 9d0ddd6

Update README.md

Browse files

Add technical details.

Files changed (1) hide show
  1. README.md +19 -2
README.md CHANGED
@@ -30,6 +30,23 @@ You can find my bitpoet-ideogram4-refimages branch [here on GitHub](https://gith
30
 
31
  It also includes a fix for the UTF-8 / ANSII error lately popping up on Windows that has jobs fail at startup.
32
 
33
- Note that this AI-Toolkit adaption is targeted at Ideogram 4 with reference images and JSON prompts in the dataset editor, so you may not be able to use it to train regular LoRAs.
 
34
 
35
- I will add a small example dataset at some point.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  It also includes a fix for the UTF-8 / ANSII error lately popping up on Windows that has jobs fail at startup.
32
 
33
+ Note that this AI-Toolkit adaption is targeted at Ideogram 4 with reference images and JSON prompts in the dataset editor, so you may not be able to use it to
34
+ train regular LoRAs.
35
 
36
+ I will add a small example dataset at some point.
37
+
38
+ ### Buzzwords (technical details)
39
+
40
+ What we changed in AI-Toolkit besides the dataset editor:
41
+
42
+ We added reference-latent token concatenation for Ideogram 4: each clean reference image is VAE-encoded and appended to the packed sequence as
43
+ [text | noisy target | clean reference], with its own indicator, MRoPE time coordinate, and clean timestep. The transformer output and
44
+ diffusion loss are sliced to target tokens only, while bounding-box JSON prompts provide spatial edit conditioning.
45
+
46
+ These changes have to be mirrored in ComfyUI as well:
47
+
48
+ ComfyUI core: Extended the native Ideogram 4 model to accept reference latents and reproduce the training sequence [text | noisy output | clean reference],
49
+ including the separate indicator, MRoPE coordinate, clean timestep, and output-only prediction slicing.
50
+
51
+ Custom node: Ideogram4ReferenceConditioning resizes and VAE-encodes a reference image to match the target latent, then attaches it only to positive
52
+ conditioning so the separate unconditional model remains unchanged.