Diffusers documentation
DiscreteDDIMScheduler
DiscreteDDIMScheduler
The DiscreteDDIMScheduler samples each canvas position from the exact discrete posterior of the uniform corruption
process (D3PM), following Structured Denoising Diffusion Models in Discrete State-Spaces.
It is parameter free, and the final step deterministically commits the predicted tokens. An optional predictor-corrector
mode adds the leave-one-out Gibbs sweeps of Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation
through corrector_steps.
This scheduler is used by DiffusionGemmaPipeline.
DiscreteDDIMScheduler
class diffusers.DiscreteDDIMScheduler
< source >( num_inference_steps: int = 32 corrector_steps: int = 0 corrector_k: int = 1 corrector_selection: str = 'lowest_log_margin' corrector_selection_tau: float = 1.0 )
Parameters
- num_inference_steps (
int, defaults to 32) — The number of denoising steps, defining the linear time grid the posterior is evaluated on. - corrector_steps (
int, defaults to 0) — Number of Gibbs corrector sweeps run after each predictor step.0recovers plain ancestral DDIM sampling. - corrector_k (
int, defaults to 1) — Number of positions resampled per corrector sweep. - corrector_selection (
str, defaults to"lowest_log_margin") — How the resampled positions are chosen:"lowest_log_margin","lowest_maxprob","lowest_current_prob", or"random". - corrector_selection_tau (
float, defaults to 1.0) — Temperature of the Gumbel-top-k position selection (lower is greedier).
Discrete DDIM scheduler for the uniform corruption process, following “Structured Denoising Diffusion Models in Discrete State-Spaces” (D3PM, https://huggingface.co/papers/2107.03006).
On the linear schedule the survival probability of a clean token at time t is alpha(t) = 1 - t. One denoising
step from time t to s < t samples every block position from the exact posterior q(x_s | x_t, x0), which for
the uniform kernel decomposes into three routes: jump to the predicted clean token x0, stay on the current token,
or jump to a uniformly random token. Unlike masked diffusion, there is no mask token; uncommitted positions carry
random tokens.
An optional predictor-corrector mode follows “Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and
Absorbing State Reformulation” via the leave-one-out (LOO) denoiser (https://huggingface.co/papers/2605.22765).
When corrector_steps > 0, the pipeline runs that many Gibbs corrector sweeps after each predictor step (see
step_correct()), resampling the least-confident positions from the one-coordinate
conditional Cat(alpha_s * x0_loo + (1 - alpha_s) / K) while holding the rest fixed, which leaves the marginal
p_s invariant and improves generation at no training cost.
step
< source >( model_output: torch.Tensor timestep: int | torch.Tensor sample: torch.LongTensor temperature: float = 0.0 generator: torch.Generator | None = None return_dict: bool = True )
Parameters
- model_output (
torch.Tensorof shape(batch_size, block_length, vocab_size)) — Raw logits from the model for the current block. - timestep (
intortorch.Tensor) — Current step index within the denoising schedule, in[0, num_inference_steps - 1]. - sample (
torch.LongTensorof shape(batch_size, block_length)) — Current block token IDsx_t. - temperature (
float) — Sampling temperature applied to the logits when drawingx0. - generator (
torch.Generator, optional) — RNG for sampling. - return_dict (
bool) — Whether to return a DiscreteDDIMSchedulerOutput or a plain tuple.
Sample the next block from the posterior q(x_s | x_t, x0) of the uniform corruption process.
With a = alpha_t / alpha_s (survival probability from s to t) and b = alpha_s, the posterior mass of
each route is
clean: b * (1 - a) / K + a * b * 1[x_t = x0], stay: a * (1 - b) / K, noise: (1 - a) * (1 - b) / K,
so the last step (b = 1) deterministically commits the predicted clean tokens.
step_correct
< source >( model_output: torch.Tensor timestep: int | torch.Tensor sample: torch.LongTensor generator: torch.Generator | None = None return_dict: bool = True )
Parameters
- model_output (
torch.Tensorof shape(batch_size, block_length, vocab_size)) — Raw logits from the model recomputed on the current (post-predictor)sample. - timestep (
intortorch.Tensor) — The predictor step index just completed; the corrector runs at the following grid points. - sample (
torch.LongTensorof shape(batch_size, block_length)) — Current block token IDs to refine. - generator (
torch.Generator, optional) — RNG for sampling. - return_dict (
bool) — Whether to return a DiscreteDDIMSchedulerOutput or a plain tuple.
Run one Gibbs corrector sweep at the post-predictor time s, following the leave-one-out predictor-corrector
of https://huggingface.co/papers/2605.22765.
The model logits (recomputed on the current sample) are converted to the LOO denoiser, the one-coordinate
conditional p_s(x^l | x^{-l}) = Cat(alpha_s * x0_loo + (1 - alpha_s) / K) is formed, the least-confident
corrector_k positions are selected, and those positions are resampled while the rest are held fixed. The
sweep preserves p_s, so it refines the sample without changing its marginal and needs no extra training.
DiscreteDDIMSchedulerOutput
class diffusers.DiscreteDDIMSchedulerOutput
< source >( prev_sample: torch.LongTensor sampled_tokens: torch.LongTensor sampled_probs: torch.Tensor pred_logits: torch.Tensor )
Parameters
- prev_sample (
torch.LongTensorof shape(batch_size, block_length)) — Updated block tokens after the current denoising step. - sampled_tokens (
torch.LongTensorof shape(batch_size, block_length)) — Token IDs sampled from the model logits, i.e. the predicted clean tokensx0. - sampled_probs (
torch.Tensorof shape(batch_size, block_length)) — Probabilities of the sampled tokens. - pred_logits (
torch.Tensorof shape(batch_size, block_length, vocab_size)) — The denoiser logits, passed through for self-conditioning the next step.
Output class for the discrete DDIM scheduler.