| | --- |
| | license: mit |
| | --- |
| | # Flow Matching & Diffusion Prediction Types |
| | ## A Practical Guide to Sol, Lune, and Epsilon Prediction |
| |
|
| | --- |
| |
|
| | ## Overview |
| |
|
| | This document covers three distinct prediction paradigms used in diffusion and flow-matching models. Each was designed for different purposes and requires specific sampling procedures. |
| |
|
| | | Model | Prediction Type | What It Learned | Output Character | |
| | |-------|----------------|-----------------|------------------| |
| | | **Standard SD1.5** | Ξ΅ (epsilon/noise) | Remove noise | General purpose | |
| | | **Sol** | v (velocity) via DDPM | Geometric structure | Flat silhouettes, mass placement | |
| | | **Lune** | v (velocity) via flow | Texture and detail | Rich, detailed images | |
| |
|
| | --- |
| |
|
| | SD15-Flow-Sol (velocity prediction epsilon converted): |
| |
|
| | https://huggingface.co/AbstractPhil/tinyflux-experts/resolve/main/inference_sd15_flow_sol.py |
| | |
| |  |
| | |
| | |
| | SD15-Flow-Lune (rectified flow shift=2): |
| | |
| | https://huggingface.co/AbstractPhil/tinyflux-experts/resolve/main/inference_sd15_flow_lune.py |
| |
|
| |  |
| |
|
| |
|
| | TinyFlux-Lailah |
| |
|
| | tinyflux is currently in training and planning and is not yet ready to be used for production capacity. |
| |
|
| | https://huggingface.co/AbstractPhil/tiny-flux-deep |
| |
|
| |  |
| |
|
| |
|
| | ## 1. Epsilon (Ξ΅) Prediction β Standard Diffusion |
| |
|
| | ### Core Concept |
| | > **"Predict the noise that was added"** |
| |
|
| | The model learns to identify and remove noise from corrupted images. |
| |
|
| | ### The Formula (Simplified) |
| |
|
| | ``` |
| | TRAINING: |
| | x_noisy = β(Ξ±) * x_clean + β(1-Ξ±) * noise |
| | β |
| | Model predicts: Ξ΅Μ = "what noise was added?" |
| | β |
| | Loss = ||Ξ΅Μ - noise||Β² |
| | |
| | SAMPLING: |
| | Start with pure noise |
| | Repeatedly ask: "what noise is in this?" |
| | Subtract a fraction of predicted noise |
| | Repeat until clean |
| | ``` |
| |
|
| | ### Reading the Math |
| |
|
| | - **Ξ± (alpha)**: "How much original image remains" (1 = all original, 0 = all noise) |
| | - **β(1-Ξ±)**: "How much noise was mixed in" |
| | - **Ξ΅**: The actual noise that was added |
| | - **Ξ΅Μ**: Model's guess of what noise was added |
| |
|
| | ### Training Process |
| |
|
| | ```python |
| | # Forward diffusion (corruption) |
| | noise = torch.randn_like(x_clean) |
| | Ξ± = scheduler.alphas_cumprod[t] |
| | x_noisy = βΞ± * x_clean + β(1-Ξ±) * noise |
| | |
| | # Model predicts noise |
| | Ξ΅_pred = model(x_noisy, t) |
| | |
| | # Loss: "Did you correctly identify the noise?" |
| | loss = MSE(Ξ΅_pred, noise) |
| | ``` |
| |
|
| | ### Sampling Process |
| |
|
| | ```python |
| | # DDPM/DDIM sampling |
| | for t in reversed(timesteps): # 999 β 0 |
| | Ξ΅_pred = model(x, t) |
| | x = scheduler.step(Ξ΅_pred, t, x) # Removes predicted noise |
| | ``` |
| |
|
| | ### Utility & Behavior |
| |
|
| | - **Strength**: General-purpose image generation |
| | - **Weakness**: No explicit understanding of image structure |
| | - **Use case**: Standard text-to-image generation |
| |
|
| | --- |
| |
|
| | ## 2. Velocity (v) Prediction β Sol (DDPM Framework) |
| |
|
| | ### Core Concept |
| | > **"Predict the direction from noise to data"** |
| |
|
| | Sol predicts velocity but operates within the DDPM scheduler framework, requiring conversion from velocity to epsilon for sampling. |
| |
|
| | ### The Formula (Simplified) |
| |
|
| | ``` |
| | TRAINING: |
| | x_t = Ξ± * x_clean + Ο * noise (same as DDPM) |
| | v = Ξ± * noise - Ο * x_clean (velocity target) |
| | β |
| | Model predicts: vΜ = "which way is the image?" |
| | β |
| | Loss = ||vΜ - v||Β² |
| | |
| | SAMPLING: |
| | Convert velocity β epsilon |
| | Use standard DDPM scheduler stepping |
| | ``` |
| |
|
| | ### Reading the Math |
| |
|
| | - **v (velocity)**: Direction vector in latent space |
| | - **Ξ± (alpha)**: β(Ξ±_cumprod) β signal strength |
| | - **Ο (sigma)**: β(1 - Ξ±_cumprod) β noise strength |
| | - **The velocity formula**: `v = Ξ± * Ξ΅ - Ο * xβ` |
| | - "Velocity is the signal-weighted noise minus noise-weighted data" |
| |
|
| | ### Why Velocity in DDPM? |
| |
|
| | Sol was trained with David (the geometric assessor) providing loss weighting. This setup used: |
| | - DDPM noise schedule for interpolation |
| | - Velocity prediction for training target |
| | - Knowledge distillation from a teacher |
| |
|
| | The result: Sol learned **geometric structure** rather than textures. |
| |
|
| | ### Training Process (David-Weighted) |
| |
|
| | ```python |
| | # DDPM-style corruption |
| | noise = torch.randn_like(latents) |
| | t = torch.randint(0, 1000, (batch,)) |
| | Ξ± = sqrt(scheduler.alphas_cumprod[t]) |
| | Ο = sqrt(1 - scheduler.alphas_cumprod[t]) |
| | |
| | x_t = Ξ± * latents + Ο * noise |
| | |
| | # Velocity target (NOT epsilon!) |
| | v_target = Ξ± * noise - Ο * latents |
| | |
| | # Model predicts velocity |
| | v_pred = model(x_t, t) |
| | |
| | # David assesses geometric quality β adjusts loss weights |
| | loss_weights = david_assessor(features, t) |
| | loss = weighted_MSE(v_pred, v_target, loss_weights) |
| | ``` |
| |
|
| | ### Sampling Process (CRITICAL: v β Ξ΅ conversion) |
| |
|
| | ```python |
| | # Must convert velocity to epsilon for DDPM scheduler |
| | scheduler = DDPMScheduler(num_train_timesteps=1000) |
| | |
| | for t in scheduler.timesteps: # 999, 966, 933, ... β 0 |
| | v_pred = model(x, t) |
| | |
| | # Convert velocity β epsilon |
| | Ξ± = sqrt(scheduler.alphas_cumprod[t]) |
| | Ο = sqrt(1 - scheduler.alphas_cumprod[t]) |
| | |
| | # Solve: v = Ξ±*Ξ΅ - Ο*xβ and x_t = Ξ±*xβ + Ο*Ξ΅ |
| | # Result: xβ = (Ξ±*x_t - Ο*v) / (Ξ±Β² + ΟΒ²) |
| | # Ξ΅ = (x_t - Ξ±*xβ) / Ο |
| | |
| | x0_hat = (Ξ± * x - Ο * v_pred) / (Ξ±Β² + ΟΒ²) |
| | Ξ΅_hat = (x - Ξ± * x0_hat) / Ο |
| | |
| | x = scheduler.step(Ξ΅_hat, t, x) # Standard DDPM step with epsilon |
| | ``` |
| |
|
| | ### Utility & Behavior |
| |
|
| | - **What Sol learned**: Platonic forms, silhouettes, mass distribution |
| | - **Visual output**: Flat geometric shapes, correct spatial layout, no texture |
| | - **Why this happened**: David rewarded geometric coherence, Sol optimized for clean David classification |
| | - **Use case**: Structural guidance, composition anchoring, "what goes where" |
| |
|
| | ### Sol's Unique Property |
| |
|
| | Sol never "collapsed" β it learned the **skeleton** of images: |
| | - Castle prompt β Castle silhouette, horizon line, sky gradient |
| | - Portrait prompt β Head oval, shoulder mass, figure-ground separation |
| | - City prompt β Building masses, street perspective, light positions |
| |
|
| | This is the "WHAT before HOW" that most diffusion models skip. |
| |
|
| | --- |
| |
|
| | ## 3. Velocity (v) Prediction β Lune (Rectified Flow) |
| |
|
| | ### Core Concept |
| | > **"Predict the straight-line direction from noise to data"** |
| |
|
| | Lune uses true rectified flow matching where data travels in straight lines through latent space. |
| |
|
| | ### The Formula (Simplified) |
| |
|
| | ``` |
| | TRAINING: |
| | x_t = Ο * noise + (1-Ο) * data (linear interpolation) |
| | v = noise - data (constant velocity) |
| | β |
| | Model predicts: vΜ = "straight line to noise" |
| | β |
| | Loss = ||vΜ - v||Β² |
| | |
| | SAMPLING: |
| | Start at Ο=1 (noise) |
| | Walk OPPOSITE to velocity (toward data) |
| | End at Ο=0 (clean image) |
| | ``` |
| |
|
| | ### Reading the Math |
| |
|
| | - **Ο (sigma)**: Interpolation parameter (1 = noise, 0 = data) |
| | - **x_t = ΟΒ·noise + (1-Ο)Β·data**: Linear blend between noise and data |
| | - **v = noise - data**: The velocity is CONSTANT along the path |
| | - **Shift function**: `Ο' = shiftΒ·Ο / (1 + (shift-1)Β·Ο)` |
| | - Biases sampling toward cleaner images (spends more steps refining) |
| | |
| | ### Key Difference from Sol |
| | |
| | | Aspect | Sol | Lune | |
| | |--------|-----|------| |
| | | Interpolation | DDPM (Ξ±, Ο from scheduler) | Linear (Ο, 1-Ο) | |
| | | Velocity meaning | Complex (Ξ±Β·Ξ΅ - ΟΒ·xβ) | Simple (noise - data) | |
| | | Sampling | Convert vβΞ΅, use scheduler | Direct Euler integration | |
| | | Output | Geometric skeletons | Detailed images | |
| | |
| | ### Training Process |
| | |
| | ```python |
| | # Linear interpolation (NOT DDPM schedule!) |
| | noise = torch.randn_like(latents) |
| | Ο = torch.rand(batch) # Random sigma in [0, 1] |
| | |
| | # Apply shift during training |
| | Ο_shifted = (shift * Ο) / (1 + (shift - 1) * Ο) |
| | Ο = Ο_shifted.view(-1, 1, 1, 1) |
| | |
| | x_t = Ο * noise + (1 - Ο) * latents |
| | |
| | # Velocity target: direction FROM data TO noise |
| | v_target = noise - latents |
| | |
| | # Model predicts velocity |
| | v_pred = model(x_t, Ο * 1000) # Timestep = Ο * 1000 |
| | |
| | loss = MSE(v_pred, v_target) |
| | ``` |
| | |
| | ### Sampling Process (Direct Euler) |
| | |
| | ```python |
| | # Start from pure noise (Ο = 1) |
| | x = torch.randn(1, 4, 64, 64) |
| | |
| | # Sigma schedule: 1 β 0 with shift |
| | sigmas = torch.linspace(1, 0, steps + 1) |
| | sigmas = shift_sigma(sigmas, shift=3.0) |
| | |
| | for i in range(steps): |
| | Ο = sigmas[i] |
| | Ο_next = sigmas[i + 1] |
| | dt = Ο - Ο_next # Positive (going from 1 toward 0) |
| | |
| | timestep = Ο * 1000 |
| | v_pred = model(x, timestep) |
| | |
| | # SUBTRACT velocity (v points toward noise, we go toward data) |
| | x = x - v_pred * dt |
| | |
| | # x is now clean image latent |
| | ``` |
| | |
| | ### Why SUBTRACT the Velocity? |
| | |
| | ``` |
| | v = noise - data (points FROM data TO noise) |
| | |
| | We want to go FROM noise TO data (opposite direction!) |
| | |
| | So: x_new = x_current - v * dt |
| | = x_current - (noise - data) * dt |
| | = x_current + (data - noise) * dt β Moving toward data β |
| | ``` |
| | |
| | ### Utility & Behavior |
| | |
| | - **What Lune learned**: Rich textures, fine details, realistic rendering |
| | - **Visual output**: Full detailed images with lighting, materials, depth |
| | - **Training focus**: Portrait/pose data with caption augmentation |
| | - **Use case**: High-quality image generation, detail refinement |
| | |
| | --- |
| | |
| | ## Comparison Summary |
| | |
| | ### Training Targets |
| | |
| | ``` |
| | EPSILON (Ξ΅): target = noise |
| | "What random noise was added?" |
| | |
| | VELOCITY (Sol): target = Ξ±Β·noise - ΟΒ·data |
| | "What's the DDPM-weighted direction?" |
| | |
| | VELOCITY (Lune): target = noise - data |
| | "What's the straight-line direction?" |
| | ``` |
| | |
| | ### Sampling Directions |
| | |
| | ``` |
| | EPSILON: x_new = scheduler.step(Ξ΅_pred, t, x) |
| | Scheduler handles noise removal internally |
| | |
| | VELOCITY (Sol): Convert v β Ξ΅, then scheduler.step(Ξ΅, t, x) |
| | Must translate to epsilon for DDPM math |
| | |
| | VELOCITY (Lune): x_new = x - v_pred * dt |
| | Direct Euler integration, subtract velocity |
| | ``` |
| | |
| | ### Visual Intuition |
| | |
| | ``` |
| | EPSILON: |
| | "There's noise hiding the image" |
| | "I'll predict and remove the noise layer by layer" |
| | β General-purpose denoising |
| | |
| | VELOCITY (Sol): |
| | "I know which direction the image is" |
| | "But I speak through DDPM's noise schedule" |
| | β Learned structure, outputs skeletons |
| | |
| | VELOCITY (Lune): |
| | "Straight line from noise to image" |
| | "I'll walk that line step by step" |
| | β Learned detail, outputs rich images |
| | ``` |
| | |
| | --- |
| | |
| | ## Practical Implementation Checklist |
| | |
| | ### For Epsilon Models (Standard SD1.5) |
| | - [ ] Use DDPM/DDIM/Euler scheduler |
| | - [ ] Pass timestep as integer [0, 999] |
| | - [ ] Scheduler handles everything |
| | |
| | ### For Sol (Velocity + DDPM) |
| | - [ ] Use DDPMScheduler |
| | - [ ] Model outputs velocity, NOT epsilon |
| | - [ ] Convert: `x0 = (Ξ±Β·x - ΟΒ·v) / (Ξ±Β² + ΟΒ²)`, then `Ξ΅ = (x - Ξ±Β·x0) / Ο` |
| | - [ ] Call `scheduler.step(Ξ΅, t, x)` |
| | - [ ] Expect geometric/structural output |
| | |
| | ### For Lune (Velocity + Flow) |
| | - [ ] NO scheduler needed β direct Euler |
| | - [ ] Sigma goes 1 β 0 (not 0 β 1!) |
| | - [ ] Apply shift: `Ο' = shiftΒ·Ο / (1 + (shift-1)Β·Ο)` |
| | - [ ] Timestep to model: `Ο * 1000` |
| | - [ ] SUBTRACT velocity: `x = x - v * dt` |
| | - [ ] Expect detailed textured output |
| | |
| | --- |
| | |
| | ## Why This Matters for TinyFlux |
| | |
| | TinyFlux can leverage both experts: |
| | |
| | 1. **Sol (early timesteps)**: Provides geometric anchoring |
| | - "Where should the castle be?" |
| | - "What's the horizon line?" |
| | - "How is mass distributed?" |
| | |
| | 2. **Lune (mid/late timesteps)**: Provides detail refinement |
| | - "What texture is the stone?" |
| | - "How does light fall?" |
| | - "What color is the sky?" |
| | |
| | By combining geometric structure (Sol) with textural detail (Lune), TinyFlux can achieve better composition AND quality than either alone. |
| | |
| | --- |
| | |
| | ## Quick Reference Card |
| | |
| | ``` |
| | βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| | β PREDICTION TYPES β |
| | βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| | β EPSILON (Ξ΅) β |
| | β Train: target = noise β |
| | β Sample: scheduler.step(Ξ΅_pred, t, x) β |
| | β Output: General images β |
| | βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| | β VELOCITY - SOL (DDPM framework) β |
| | β Train: target = Ξ±Β·Ξ΅ - ΟΒ·xβ β |
| | β Sample: vβΞ΅ conversion, then scheduler.step(Ξ΅, t, x) β |
| | β Output: Geometric skeletons β |
| | βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| | β VELOCITY - LUNE (Rectified Flow) β |
| | β Train: target = noise - data β |
| | β Sample: x = x - vΒ·dt (Euler, Ο: 1β0) β |
| | β Output: Detailed textured images β |
| | βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| | ``` |
| | |
| | --- |
| | |
| | *Document Version: 1.0* |
| | *Last Updated: January 2026* |
| | *Authors: AbstractPhil & Claude OPUS 4.5* |
| | |
| | License: MIT |