| license: mit | |
| pipeline_tag: image-to-text | |
| widget: | |
| - output: | |
| url: images/ddim_images and images/ddpm_images | |
| text: A futuristic city | |
| ## Model Description | |
| Implemented and trained a stable diffusion model from scratch (CLIP + VAE + UNet + Cross-Attention), optimizing schedulers (DDPM, DDIM, Euler Ancestral, DPM-Solver++) and attention mechanisms (Flash Attention, xFormers) to reduce generation time by 22% and improve image clarity using MPS on RunPod GPUs. |