Buckets:
| # Quickstart | |
| Modular Diffusers is a framework for quickly building flexible and customizable pipelines. These pipelines can go beyond what standard `DiffusionPipeline`s can do. At the core of Modular Diffusers are [ModularPipelineBlocks](/docs/diffusers/main/en/api/modular_diffusers/pipeline_blocks#diffusers.ModularPipelineBlocks) that can be combined with other blocks to adapt to new workflows. The blocks are converted into a [ModularPipeline](/docs/diffusers/main/en/api/modular_diffusers/pipeline#diffusers.ModularPipeline), a friendly user-facing interface for running generation tasks. | |
| This guide shows you how to run a modular pipeline, understand its structure, and customize it by modifying the blocks that compose it. | |
| ## Run a pipeline | |
| [ModularPipeline](/docs/diffusers/main/en/api/modular_diffusers/pipeline#diffusers.ModularPipeline) is the main interface for loading, running, and managing modular pipelines. | |
| ```py | |
| import torch | |
| from diffusers import ModularPipeline, ComponentsManager | |
| # Use ComponentsManager to enable auto CPU offloading for memory efficiency | |
| manager = ComponentsManager() | |
| manager.enable_auto_cpu_offload(device="cuda:0") | |
| pipe = ModularPipeline.from_pretrained("Qwen/Qwen-Image", components_manager=manager) | |
| pipe.load_components(torch_dtype=torch.bfloat16) | |
| image = pipe( | |
| prompt="cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney", | |
| ).images[0] | |
| image | |
| ``` | |
| [from_pretrained()](/docs/diffusers/main/en/api/modular_diffusers/pipeline#diffusers.ModularPipeline.from_pretrained) uses lazy loading - it reads the configuration to learn where to load each component from, but doesn't actually load the model weights until you call [load_components()](/docs/diffusers/main/en/api/modular_diffusers/pipeline#diffusers.ModularPipeline.load_components). This gives you control over when and how components are loaded. | |
| > [!TIP] | |
| > `ComponentsManager` with `enable_auto_cpu_offload` automatically moves models between CPU and GPU as needed, reducing memory usage for large models like Qwen-Image. Learn more in the [ComponentsManager](./components_manager) guide. | |
| > | |
| > If you don't need offloading, remove the `components_manager` argument and move the pipeline to your device manually with `to("cuda")`. | |
| Learn more about creating and loading pipelines in the [Creating a pipeline](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#creating-a-pipeline) and [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guides. | |
| ## Understand the structure | |
| A [ModularPipeline](/docs/diffusers/main/en/api/modular_diffusers/pipeline#diffusers.ModularPipeline) has two parts: a **definition** (the blocks) and a **state** (the loaded components and configs). | |
| Print the pipeline to see its state — the components and their loading status and configuration. | |
| ```py | |
| print(pipe) | |
| ``` | |
| ``` | |
| QwenImageModularPipeline { | |
| "_blocks_class_name": "QwenImageAutoBlocks", | |
| "_class_name": "QwenImageModularPipeline", | |
| "_diffusers_version": "0.37.0.dev0", | |
| "transformer": [ | |
| "diffusers", | |
| "QwenImageTransformer2DModel", | |
| { | |
| "pretrained_model_name_or_path": "Qwen/Qwen-Image", | |
| "revision": null, | |
| "subfolder": "transformer", | |
| "type_hint": [ | |
| "diffusers", | |
| "QwenImageTransformer2DModel" | |
| ], | |
| "variant": null | |
| } | |
| ], | |
| ... | |
| } | |
| ``` | |
| Access the definition through `pipe.blocks` — this is the [ModularPipelineBlocks](/docs/diffusers/main/en/api/modular_diffusers/pipeline_blocks#diffusers.ModularPipelineBlocks) that defines the pipeline's workflows, inputs, outputs, and computation logic. | |
| ```py | |
| print(pipe.blocks) | |
| ``` | |
| ``` | |
| QwenImageAutoBlocks( | |
| Class: SequentialPipelineBlocks | |
| Description: Auto Modular pipeline for text-to-image, image-to-image, inpainting, and controlnet tasks using QwenImage. | |
| Supported workflows: | |
| - `text2image`: requires `prompt` | |
| - `image2image`: requires `prompt`, `image` | |
| - `inpainting`: requires `prompt`, `mask_image`, `image` | |
| - `controlnet_text2image`: requires `prompt`, `control_image` | |
| ... | |
| Components: | |
| text_encoder (`Qwen2_5_VLForConditionalGeneration`) | |
| vae (`AutoencoderKLQwenImage`) | |
| transformer (`QwenImageTransformer2DModel`) | |
| ... | |
| Sub-Blocks: | |
| [0] text_encoder (QwenImageAutoTextEncoderStep) | |
| [1] vae_encoder (QwenImageAutoVaeEncoderStep) | |
| [2] controlnet_vae_encoder (QwenImageOptionalControlNetVaeEncoderStep) | |
| [3] denoise (QwenImageAutoCoreDenoiseStep) | |
| [4] decode (QwenImageAutoDecodeStep) | |
| ) | |
| ``` | |
| The output returns: | |
| - The supported workflows (text2image, image2image, inpainting, etc.) | |
| - The Sub-Blocks it's composed of (text_encoder, vae_encoder, denoise, decode) | |
| ### Workflows | |
| This pipeline supports multiple workflows and adapts its behavior based on the inputs you provide. For example, if you pass `image` to the pipeline, it runs an image-to-image workflow instead of text-to-image. Learn more about how this works under the hood in the [AutoPipelineBlocks](https://huggingface.co/docs/diffusers/modular_diffusers/auto_pipeline_blocks) guide. | |
| ```py | |
| from diffusers.utils import load_image | |
| input_image = load_image("https://github.com/Trgtuan10/Image_storage/blob/main/cute_cat.png?raw=true") | |
| image = pipe( | |
| prompt="cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney", | |
| image=input_image, | |
| ).images[0] | |
| ``` | |
| Use `get_workflow()` to extract the blocks for a specific workflow. Pass the workflow name (e.g., `"image2image"`, `"inpainting"`, `"controlnet_text2image"`) to get only the blocks relevant to that workflow. This is useful when you want to customize or debug a specific workflow. You can check `pipe.blocks.available_workflows` to see all available workflows. | |
| ```py | |
| img2img_blocks = pipe.blocks.get_workflow("image2image") | |
| ``` | |
| ### Sub-blocks | |
| Blocks can contain other blocks. `pipe.blocks` gives you the top-level block definition (here, `QwenImageAutoBlocks`), while `sub_blocks` lets you access the smaller blocks inside it. | |
| `QwenImageAutoBlocks` is composed of: `text_encoder`, `vae_encoder`, `controlnet_vae_encoder`, `denoise`, and `decode`. | |
| These sub-blocks run one after another and data flows linearly from one block to the next — each block's `intermediate_outputs` become available as `inputs` to the next block. This is how [`SequentialPipelineBlocks`](./sequential_pipeline_blocks) work. | |
| You can access them through the `sub_blocks` property. The `doc` property is useful for seeing the full documentation of any block, including its inputs, outputs, and components. | |
| ```py | |
| vae_encoder_block = pipe.blocks.sub_blocks["vae_encoder"] | |
| print(vae_encoder_block.doc) | |
| ``` | |
| This block can be converted to a pipeline so that it can run on its own with [init_pipeline()](/docs/diffusers/main/en/api/modular_diffusers/pipeline_blocks#diffusers.ModularPipelineBlocks.init_pipeline). | |
| ```py | |
| vae_encoder_pipe = vae_encoder_block.init_pipeline() | |
| # Reuse the VAE we already loaded, we can reuse it with update_components() method | |
| vae_encoder_pipe.update_components(vae=pipe.vae) | |
| # Run just this block | |
| image_latents = vae_encoder_pipe(image=input_image).image_latents | |
| print(image_latents.shape) | |
| ``` | |
| It reuses the VAE from our original pipeline instead of reloading it, keeping memory usage efficient. Learn more in the [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guide. | |
| Since blocks are composable, you can modify the pipeline's definition by adding, removing, or swapping blocks to create new workflows. In the next section, we'll add a canny edge detection block to a ControlNet pipeline, so you can pass a regular image instead of a pre-processed canny edge map. | |
| ## Compose new workflows | |
| Let's add a canny edge detection block to a ControlNet pipeline. First, load a pre-built canny block from the Hub (see [Building Custom Blocks](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks) to create your own). | |
| ```py | |
| from diffusers.modular_pipelines import ModularPipelineBlocks | |
| # Load a canny block from the Hub | |
| canny_block = ModularPipelineBlocks.from_pretrained( | |
| "diffusers-internal-dev/canny-filtering", | |
| trust_remote_code=True, | |
| ) | |
| print(canny_block.doc) | |
| ``` | |
| ``` | |
| class CannyBlock | |
| Inputs: | |
| image (`Union[Image, ndarray]`): | |
| Image to compute canny filter on | |
| low_threshold (`int`, *optional*, defaults to 50): | |
| Low threshold for the canny filter. | |
| high_threshold (`int`, *optional*, defaults to 200): | |
| High threshold for the canny filter. | |
| ... | |
| Outputs: | |
| control_image (`PIL.Image`): | |
| Canny map for input image | |
| ``` | |
| Use `get_workflow` to extract the ControlNet workflow from `QwenImageAutoBlocks`. | |
| ```py | |
| # Get the controlnet workflow that we want to work with | |
| blocks = pipe.blocks.get_workflow("controlnet_text2image") | |
| print(blocks.doc) | |
| ``` | |
| ``` | |
| class SequentialPipelineBlocks | |
| Inputs: | |
| prompt (`str`): | |
| The prompt or prompts to guide image generation. | |
| control_image (`Image`): | |
| Control image for ControlNet conditioning. | |
| ... | |
| ``` | |
| The extracted workflow is a [`SequentialPipelineBlocks`](./sequential_pipeline_blocks) and it currently requires `control_image` as input. Insert the canny block at the beginning so the pipeline accepts a regular image instead. | |
| ```py | |
| # Insert canny at the beginning | |
| blocks.sub_blocks.insert("canny", canny_block, 0) | |
| # Check the updated structure: CannyBlock is now listed as first sub-block | |
| print(blocks) | |
| # Check the updated doc | |
| print(blocks.doc) | |
| ``` | |
| ``` | |
| class SequentialPipelineBlocks | |
| Inputs: | |
| image (`Union[Image, ndarray]`): | |
| Image to compute canny filter on | |
| low_threshold (`int`, *optional*, defaults to 50): | |
| Low threshold for the canny filter. | |
| high_threshold (`int`, *optional*, defaults to 200): | |
| High threshold for the canny filter. | |
| prompt (`str`): | |
| The prompt or prompts to guide image generation. | |
| ... | |
| ``` | |
| Now the pipeline takes `image` as input instead of `control_image`. Because blocks in a sequence share data automatically, the canny block's output (`control_image`) flows to the denoise block that needs it, and the canny block's input (`image`) becomes a pipeline input since no earlier block provides it. | |
| Create a pipeline from the modified blocks and load a ControlNet model. The ControlNet isn't part of the original model repository, so load it separately and add it with [update_components()](/docs/diffusers/main/en/api/modular_diffusers/pipeline#diffusers.ModularPipeline.update_components). | |
| ```py | |
| pipeline = blocks.init_pipeline("Qwen/Qwen-Image", components_manager=manager) | |
| pipeline.load_components(torch_dtype=torch.bfloat16) | |
| # Load the ControlNet model | |
| controlnet_spec = pipeline.get_component_spec("controlnet") | |
| controlnet_spec.pretrained_model_name_or_path = "InstantX/Qwen-Image-ControlNet-Union" | |
| controlnet = controlnet_spec.load(torch_dtype=torch.bfloat16) | |
| pipeline.update_components(controlnet=controlnet) | |
| ``` | |
| Now run the pipeline - the canny block preprocesses the image for ControlNet. | |
| ```py | |
| from diffusers.utils import load_image | |
| prompt = "cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney" | |
| image = load_image("https://github.com/Trgtuan10/Image_storage/blob/main/cute_cat.png?raw=true") | |
| output = pipeline( | |
| prompt=prompt, | |
| image=image, | |
| ).images[0] | |
| output | |
| ``` | |
| ## Next steps | |
| Understand the core building blocks of Modular Diffusers: | |
| - [ModularPipelineBlocks](./pipeline_block): The basic unit for defining a step in a pipeline. | |
| - [SequentialPipelineBlocks](./sequential_pipeline_blocks): Chain blocks to run in sequence. | |
| - [AutoPipelineBlocks](./auto_pipeline_blocks): Create pipelines that support multiple workflows. | |
| - [States](./modular_diffusers_states): How data is shared between blocks. | |
| Learn how to create your own blocks with custom logic in the [Building Custom Blocks](./custom_blocks) guide. | |
| Use [`ComponentsManager`](./components_manager) to share models across multiple pipelines and manage memory efficiently. | |
| Connect modular pipelines to [Mellon](https://github.com/cubiq/Mellon), a visual node-based interface for building workflows. Custom blocks built with Modular Diffusers work out of the box with Mellon - no UI code required. Read more in the Mellon guide. | |
Xet Storage Details
- Size:
- 12.5 kB
- Xet hash:
- 343afe36b00efa39834438341308aee26026e5f42d6430f08afa3ba68d8cb2be
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.