Training for FLUX
Table of Contents
- Training for FLUX
Environment Setup
Create and activate a new conda environment:
conda create -n omini python=3.10 conda activate ominiInstall required packages:
pip install -r requirements.txt
Dataset Preparation
Download Subject200K dataset for subject-driven generation:
bash train/script/data_download/data_download1.shDownload text-to-image-2M dataset for spatial alignment control tasks:
bash train/script/data_download/data_download2.shNote: By default, only a few files will be downloaded. You can edit
data_download2.shto download more data, and update the config file accordingly.
Quick Start
Use these scripts to start training immediately:
Subject-driven generation:
bash train/script/train_subject.shSpatial control tasks (Canny-to-image, colorization, depth map, etc.):
bash train/script/train_spatial_alignment.shMulti-condition training:
bash train/script/train_multi_condition.shFeature reuse (OminiControl2):
bash train/script/train_feature_reuse.shCompact token representation (OminiControl2):
bash train/script/train_compact_token_representation.shToken integration (OminiControl2):
bash train/script/train_token_intergration.sh
Basic Training
Tasks from OminiControl
Subject-driven generation:
bash train/script/train_subject.shSpatial control tasks (using canny-to-image as example):
bash train/script/train_spatial_alignment.shSupported tasks
- Canny edge to image (
canny) - Image colorization (
coloring) - Image deblurring (
deblurring) - Depth map to image (
depth) - Image to depth map (
depth_pred) - Image inpainting (
fill) - Super resolution (
sr)
🌟 Change the
condition_typeparameter in the config file to switch between tasks.- Canny edge to image (
Note: Check the script files (train/script/) and config files (train/configs/) for WanDB and GPU settings.
Creating Your Own Task
You can create a custom task by building a new dataset and modifying the test code:
Create a custom dataset: Your custom dataset should follow the format of
Subject200KDatasetinomini/train_flux/train_subject.py. Each sample should contain:- Image: the target image (
image) - Text: description of the image (
description) - Conditions: image conditions for generation
- Position delta:
- Use
position_delta = (0, 0)to align the condition with the generated image - Use
position_delta = (0, -a)to separate them (a = condition width / 16)
- Use
Explanation:
The model places both the condition and generated image in a shared coordinate system.position_deltashifts the condition image in this space.Each unit equals one patch (16 pixels). For a 512px-wide condition image (32 patches),
position_delta = (0, -32)moves it fully to the left.This controls whether conditions and generated images share space or appear side-by-side.
- Image: the target image (
Modify the test code: Define
test_function()intrain_custom.py. Refer to the function intrain_subject.pyfor examples. Make sure to keep theposition_deltaparameter consistent with your dataset.
Training Configuration
Batch Size
We recommend a batch size of 1 for stable training. And you can set accumulate_grad_batches to n to simulate a batch size of n.
Optimizer
The default optimizer is Prodigy. To use AdamW instead, modify the config file:
optimizer:
type: AdamW
lr: 1e-4
weight_decay: 0.001
LoRA Configuration
Default LoRA rank is 4. Increase it for complex tasks (keep r and lora_alpha parameters the same):
lora_config:
r: 128
lora_alpha: 128
Trainable Modules
The target_modules parameter uses regex patterns to specify which modules to train. See PEFT Documentation for details.
Default configuration trains all modules affecting image tokens:
target_modules: "(.*x_embedder|.*(?<!single_)transformer_blocks\\.[0-9]+\\.norm1\\.linear|.*(?<!single_)transformer_blocks\\.[0-9]+\\.attn\\.to_k|.*(?<!single_)transformer_blocks\\.[0-9]+\\.attn\\.to_q|.*(?<!single_)transformer_blocks\\.[0-9]+\\.attn\\.to_v|.*(?<!single_)transformer_blocks\\.[0-9]+\\.attn\\.to_out\\.0|.*(?<!single_)transformer_blocks\\.[0-9]+\\.ff\\.net\\.2|.*single_transformer_blocks\\.[0-9]+\\.norm\\.linear|.*single_transformer_blocks\\.[0-9]+\\.proj_mlp|.*single_transformer_blocks\\.[0-9]+\\.proj_out|.*single_transformer_blocks\\.[0-9]+\\.attn.to_k|.*single_transformer_blocks\\.[0-9]+\\.attn.to_q|.*single_transformer_blocks\\.[0-9]+\\.attn.to_v|.*single_transformer_blocks\\.[0-9]+\\.attn.to_out)"
To train only attention components (to_q, to_k, to_v), use:
target_modules: "(.*(?<!single_)transformer_blocks\\.[0-9]+\\.attn\\.to_k|.*(?<!single_)transformer_blocks\\.[0-9]+\\.attn\\.to_q|.*(?<!single_)transformer_blocks\\.[0-9]+\\.attn\\.to_v|.*single_transformer_blocks\\.[0-9]+\\.attn.to_k|.*single_transformer_blocks\\.[0-9]+\\.attn.to_q|.*single_transformer_blocks\\.[0-9]+\\.attn.to_v)"
Advanced Training
Multi-condition
A basic multi-condition implementation is available in train_multi_condition.py:
bash train/script/train_multi_condition.sh
Efficient Generation (OminiControl2)
OminiControl2 introduces techniques to improve generation efficiency:
Feature Reuse (KV-Cache)
Enable
independent_conditionin the config file during training:model: independent_condition: trueDuring inference, set
kv_cache = Truein thegeneratefunction to speed up generation.
Example:
bash train/script/train_feature_reuse.sh
Note: Feature reuse speeds up generation but may slightly reduce performance and increase training time.
Compact Encoding Representation
Reduce the condition image resolution and use position_scale to align it with the output image:
train:
dataset:
condition_size:
- - 512
- - 512
+ - 256
+ - 256
+ position_scale: 2
target_size:
- 512
- 512
Example:
bash train/script/train_compact_token_representation.sh
Token Integration (for Fill task)
Further reduce tokens by merging condition and generation tokens into a unified sequence. (Refer to the paper for details.)
Example:
bash train/script/train_token_intergration.sh
Citation
If you find this code useful, please cite our papers:
@article{tan2024ominicontrol,
title={OminiControl: Minimal and Universal Control for Diffusion Transformer},
author={Tan, Zhenxiong and Liu, Songhua and Yang, Xingyi and Xue, Qiaochu and Wang, Xinchao},
journal={arXiv preprint arXiv:2411.15098},
year={2024}
}
@article{tan2025ominicontrol2,
title={OminiControl2: Efficient Conditioning for Diffusion Transformers},
author={Tan, Zhenxiong and Xue, Qiaochu and Yang, Xingyi and Liu, Songhua and Wang, Xinchao},
journal={arXiv preprint arXiv:2503.08280},
year={2025}
}