Instructions to use stabilityai/stable-diffusion-3.5-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use stabilityai/stable-diffusion-3.5-large with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
Drops to single thread for a long time after iterations
I run StableDiffusion3Pipeline from a checkout of stable-diffusion-3.5-large at commit ceddf0a7 with num_inference_steps=80 on CPU, not GPU. It loads the checkpoint shards and pipeline components then runs through the 80 iterations on a few CPU cores. However, after the 80 iterations, it then drops to a single core for a very long time before actually writing the image.
I don't see the same from FluxPipeline: that runs through the 80 iterations on a few cores then promptly writes the image (pipeline.images[0].save).
Any idea why stable diffusion is adding this extra single-core phase after the iterations are complete, and if there's anything I can do about that?
In case it matters, here are my current versions of some relevant packages:
accelerate==1.4.0
diffusers==0.32.2
huggingface-hub==0.29.1
mpmath==1.3.0
numpy==2.2.3
peft==0.14.0
protobuf==5.29.3
safetensors==0.5.2
sentencepiece==0.2.0
tokenizers==0.21.0
torch==2.6.0+cpu
transformers==4.49.0
xformers==0.0.29.post3