--- license: unlicense tags: - flux2 - flux - image-to-image - image-upscaling - latent-upscaling - super-resolution - diffusion - flow-matching - rectified-flow - generative-models - image-generation - pytorch library_name: pytorch base_model: - black-forest-labs/FLUX.2-klein-4B pipeline_tag: image-to-image --- # Flow Upscaler **Flow Upscaler** is a fast latent upscaler model that works in the [Flux.2](https://bfl.ai/models/flux-2) latent space. Under the hood, it is a lightweight **Rectified Flow** model with **59M** parameters that generates upscaled latents in a single denoising step. **[ComfyUI Node](https://github.com/TensorForger/comfyui-flow-upscaler)** Features: * Upscaling from **512x512** to **1024x1024** takes **8ms*** * The model is trained for **2X** upscaling, but multiple passes can be chained to reach up to **8K** resolution * A full pipeline with Flux generation, upscaling to **8K**, and decoding runs in just **25 seconds** (on RTX 5090) * The training process uses **Flow Distillation** with Flux.2 as a teacher, forcing the model to learn strong image semantics *On RTX 5090, in latent space, without decoding, see benchmark [here](https://github.com/tensorforger/CTGMWorkshop). Here is one **4X** upscaled image (two passes): ![comparison](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/upscaler_comparison.png) ## How it works Architecturally, Flow Upscaler is a U-Net with SDXL-style ResNet blocks. It takes a noisy sample as input and predicts velocity as output. The generation process happens directly in high-resolution latent space. The low-resolution latents are passed through a separate conditioning encoder that produces control signals, which are injected into the main U-Net encoder using FiLM conditioning. No attention layers are used, so compute scales linearly with image area. This makes generation at **8K** resolution possible. ![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_architecture.PNG) The model is trained using **Flow Distillation** with Flux.2-klein-4B as a teacher. We generated **20K** diverse images with Flux, storing the initial noise, generated latents, and downscaled latents used for conditioning. The downscaled latents are created by decoding high-resolution latents, downscaling them in pixel space, and encoding them back into latents. Direct latent downscaling introduces artifacts and breaks latent patterns, resulting in blurry decoded images. ![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_training_approach.PNG) ## Training code If you want to explore the training code or use the model outside ComfyUI, see: `notebooks/flow_upscaler` in [https://github.com/tensorforger/CTGMWorkshop](https://github.com/tensorforger/CTGMWorkshop)