--- license: other license_name: ideogram-non-commercial-model-agreement license_link: https://huggingface.co/ideogram-ai/ideogram-4-fp8/blob/main/LICENSE.md tags: - comfyui - ai-toolkit - lora --- This is an early experimental LoRA that adds bbox guided inpainting / editing to the Ideogram 4 model. It is a work in progress, so the files here are snapshots at different points in time while I adjust training parameters and build a better dataset. I currently get the most stable results with the [checkpoint at step 4000](https://huggingface.co/BitPoet/Ideogram4-Inpaint-LoRA/blob/main/IdoInpaint_2_00004000.safetensors) of the second training run. The dataset is very small, so do not expect any magic or precision. It is a starting point that hopefully evolves over the next weeks as I prepare a bigger dataset and start over with training with larger rank and finetuned parameters. ## Prerequisites ### Custom Node You can find my custom node set on GitHub at [ComfyUI-bitpoet-IG4Inpaint](https://github.com/BitPoet/ComfyUI-bitpoet-IG4Inpaint). The necessary workflow is included in the node or can be downloaded [here](https://github.com/BitPoet/ComfyUI-bitpoet-IG4Inpaint/blob/main/workflows/ideogram4_reference_workflow.json). ### ComfyUI Changes Check out or download the [dev-ideogram4-inpaint branch](https://github.com/BitPoet/ComfyUI/tree/dev-ideogram4-inpaint) of my Comfy fork. ## Training To train with reference images, you currently need to use a slightly adapted fork of AI-Toolkit. You can find my bitpoet-ideogram4-refimages branch [here on GitHub](https://github.com/BitPoet/ai-toolkit/tree/bitpoet-ideogram4-refimages) It also includes a fix for the UTF-8 / ANSII error lately popping up on Windows that has jobs fail at startup. Note that this AI-Toolkit adaption has a switch for reference image support at the top of the dataset editor. You have to switch this on every time you open a dataset with reference images. An example training config for AI-Toolkit is also [in this repository](https://huggingface.co/BitPoet/Ideogram4-Inpaint-LoRA/blob/main/ai-toolkit_example_job_config.json). I will add a small example dataset at some point. If you want to assemble your own dataset, you might find my simple [node.js based dataset editor IdeoInCap](https://github.com/BitPoet/IdeoInCap) handy (that's short for Ideogram4 Inpaint Captioning. I know, not my most creative moment.) It's tailored especially for Ideogram 4 image-reference-prompt datasets with a graphical bbox editor and completion indication. ### Buzzwords (technical details) What we changed in AI-Toolkit besides the dataset editor: We added reference-latent token concatenation for Ideogram 4: each clean reference image is VAE-encoded and appended to the packed sequence as `[text | noisy target | clean reference]`, with its own indicator, MRoPE time coordinate, and clean timestep. The transformer output and diffusion loss are sliced to target tokens only, while bounding-box JSON prompts provide spatial edit conditioning. These changes have to be mirrored in ComfyUI as well: ComfyUI core: Extended the native Ideogram 4 model to accept reference latents and reproduce the training sequence `[text | noisy output | clean reference]`, including the separate indicator, MRoPE coordinate, clean timestep, and output-only prediction slicing. Custom node: Ideogram4ReferenceConditioning resizes and VAE-encodes a reference image to match the target latent, then attaches it only to positive conditioning so the separate unconditional model remains unchanged. ## Credits Credits go to: - [ideogram-ai](https://huggingface.co/ideogram-ai) for releasing a highly interesting and high quality new image model. - Ostris for [AI-Toolkit](https://github.com/ostris/ai-toolkit) - [Comfy-Org](https://github.com/Comfy-Org) and [Kijai](https://huggingface.co/Kijai) for [ComfyUI](https://github.com/comfy-org/ComfyUI) itself and zero day support for Ideogram 4 ## Disclaimer I am in no way affiliated with Ideogram, Inc. The LoRAs provided here are my own experimental work. Please see the license linked above.