DiscreteDDIMScheduler

The DiscreteDDIMScheduler samples each canvas position from the exact discrete posterior of the uniform corruption process (D3PM), following Structured Denoising Diffusion Models in Discrete State-Spaces. It is parameter free, and the final step deterministically commits the predicted tokens. An optional predictor-corrector mode adds the leave-one-out Gibbs sweeps of Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation through corrector_steps.

This scheduler is used by DiffusionGemmaPipeline.

DiscreteDDIMScheduler

class diffusers.DiscreteDDIMScheduler

< source >

( num_inference_steps: int = 32 corrector_steps: int = 0 corrector_k: int = 1 corrector_selection: str = 'lowest_log_margin' corrector_selection_tau: float = 1.0 )

Parameters

num_inference_steps (int, defaults to 32) — The number of denoising steps, defining the linear time grid the posterior is evaluated on.
corrector_steps (int, defaults to 0) — Number of Gibbs corrector sweeps run after each predictor step. 0 recovers plain ancestral DDIM sampling.
corrector_k (int, defaults to 1) — Number of positions resampled per corrector sweep.
corrector_selection (str, defaults to "lowest_log_margin") — How the resampled positions are chosen: "lowest_log_margin", "lowest_maxprob", "lowest_current_prob", or "random".
corrector_selection_tau (float, defaults to 1.0) — Temperature of the Gumbel-top-k position selection (lower is greedier).

Discrete DDIM scheduler for the uniform corruption process, following “Structured Denoising Diffusion Models in Discrete State-Spaces” (D3PM, https://huggingface.co/papers/2107.03006).

On the linear schedule the survival probability of a clean token at time t is alpha(t) = 1 - t. One denoising step from time t to s < t samples every block position from the exact posterior q(x_s | x_t, x0), which for the uniform kernel decomposes into three routes: jump to the predicted clean token x0, stay on the current token, or jump to a uniformly random token. Unlike masked diffusion, there is no mask token; uncommitted positions carry random tokens.

An optional predictor-corrector mode follows “Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation” via the leave-one-out (LOO) denoiser (https://huggingface.co/papers/2605.22765). When corrector_steps > 0, the pipeline runs that many Gibbs corrector sweeps after each predictor step (see step_correct()), resampling the least-confident positions from the one-coordinate conditional Cat(alpha_s * x0_loo + (1 - alpha_s) / K) while holding the rest fixed, which leaves the marginal p_s invariant and improves generation at no training cost.

step

< source >

( model_output: torch.Tensor timestep: int | torch.Tensor sample: torch.LongTensor temperature: float = 0.0 generator: torch.Generator | None = None return_dict: bool = True )

Parameters

model_output (torch.Tensor of shape (batch_size, block_length, vocab_size)) — Raw logits from the model for the current block.
timestep (int or torch.Tensor) — Current step index within the denoising schedule, in [0, num_inference_steps - 1].
sample (torch.LongTensor of shape (batch_size, block_length)) — Current block token IDs x_t.
temperature (float) — Sampling temperature applied to the logits when drawing x0.
generator (torch.Generator, optional) — RNG for sampling.
return_dict (bool) — Whether to return a DiscreteDDIMSchedulerOutput or a plain tuple.

Sample the next block from the posterior q(x_s | x_t, x0) of the uniform corruption process.

With a = alpha_t / alpha_s (survival probability from s to t) and b = alpha_s, the posterior mass of each route is

clean: b * (1 - a) / K + a * b * 1[x_t = x0], stay: a * (1 - b) / K, noise: (1 - a) * (1 - b) / K,

so the last step (b = 1) deterministically commits the predicted clean tokens.

step_correct

< source >

( model_output: torch.Tensor timestep: int | torch.Tensor sample: torch.LongTensor generator: torch.Generator | None = None return_dict: bool = True )

Parameters

model_output (torch.Tensor of shape (batch_size, block_length, vocab_size)) — Raw logits from the model recomputed on the current (post-predictor) sample.
timestep (int or torch.Tensor) — The predictor step index just completed; the corrector runs at the following grid point s.
sample (torch.LongTensor of shape (batch_size, block_length)) — Current block token IDs to refine.
generator (torch.Generator, optional) — RNG for sampling.
return_dict (bool) — Whether to return a DiscreteDDIMSchedulerOutput or a plain tuple.

Run one Gibbs corrector sweep at the post-predictor time s, following the leave-one-out predictor-corrector of https://huggingface.co/papers/2605.22765.

The model logits (recomputed on the current sample) are converted to the LOO denoiser, the one-coordinate conditional p_s(x^l | x^{-l}) = Cat(alpha_s * x0_loo + (1 - alpha_s) / K) is formed, the least-confident corrector_k positions are selected, and those positions are resampled while the rest are held fixed. The sweep preserves p_s, so it refines the sample without changing its marginal and needs no extra training.