---
license: unlicense
tags:
  - flux2
  - flux
  - image-to-image
  - image-upscaling
  - latent-upscaling
  - super-resolution
  - diffusion
  - flow-matching
  - rectified-flow
  - generative-models
  - image-generation
  - pytorch
library_name: pytorch
base_model:
  - black-forest-labs/FLUX.2-klein-4B
pipeline_tag: image-to-image
---


# Flow Upscaler


**Flow Upscaler** is a fast latent upscaler model that works in the [Flux.2](https://bfl.ai/models/flux-2) latent space.

Under the hood, it is a lightweight **Rectified Flow** model with **59M** parameters that generates upscaled latents in a single denoising step.

**[ComfyUI Node](https://github.com/TensorForger/comfyui-flow-upscaler)**

Features:

* Upscaling from **512x512** to **1024x1024** takes **8ms*** 
* The model is trained for **2X** upscaling, but multiple passes can be chained to reach up to **8K** resolution
* A full pipeline with Flux generation, upscaling to **8K**, and decoding runs in just **25 seconds** (on RTX 5090)
* The training process uses **Flow Distillation** with Flux.2 as a teacher, forcing the model to learn strong image semantics

*On RTX 5090, in latent space, without decoding, see benchmark [here](https://github.com/tensorforger/CTGMWorkshop).

Here is one **4X** upscaled image (two passes):

![comparison](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/upscaler_comparison.png)

## How it works

Architecturally, Flow Upscaler is a U-Net with SDXL-style ResNet blocks. It takes a noisy sample as input and predicts velocity as output. The generation process happens directly in high-resolution latent space.

The low-resolution latents are passed through a separate conditioning encoder that produces control signals, which are injected into the main U-Net encoder using FiLM conditioning.

No attention layers are used, so compute scales linearly with image area. This makes generation at **8K** resolution possible.

![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_architecture.PNG)

The model is trained using **Flow Distillation** with Flux.2-klein-4B as a teacher. We generated **20K** diverse images with Flux, storing the initial noise, generated latents, and downscaled latents used for conditioning.

The downscaled latents are created by decoding high-resolution latents, downscaling them in pixel space, and encoding them back into latents. Direct latent downscaling introduces artifacts and breaks latent patterns, resulting in blurry decoded images.

![example](https://raw.githubusercontent.com/tensorforger/tensorforger/main/assets/flow_upscaler_training_approach.PNG)

## Training code

If you want to explore the training code or use the model outside ComfyUI, see:

`notebooks/flow_upscaler` in [https://github.com/tensorforger/CTGMWorkshop](https://github.com/tensorforger/CTGMWorkshop)