--- license: mit datasets: - svjack/pokemon-blip-captions-en-zh pipeline_tag: unconditional-image-generation tags: - diffusion - tiny - pokemon - U-Net - from_scratch - 9m - pokepixels - pixels - diff - diffusers --- # PokéPixels1-9M (CPU) A minimal diffusion model trained **from scratch on CPU**. This project explores the lower limits of diffusion models: **How small and simple can a diffusion model be while still producing recognizable images?** --- Here are some "Fakemons" generated by the model: (64x64 Resolution) ![image](https://cdn-uploads.huggingface.co/production/uploads/68df176c403a7bf9e8ae85a8/V53ucdxepERZ0RDhVp_hl.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/68df176c403a7bf9e8ae85a8/pkP0-pWDjxDZkvXrBnJYQ.png) ## 🧠 Overview TinyPokemonDiffusion is a lightweight DDPM-based generative model trained on Pokémon images. Despite its small size and CPU-only training, the model learns: - Color distributions - Basic shapes - Early-stage object structure --- ## ⚙️ Specifications | Component | Value | |------------------|------| | Parameters | ~9M | | Resolution | 64x64 | | Training Device | CPU (Ryzen 5 5600G) | | Training Time | ~5.5 hours | | Dataset | pokemon-blip-captions | | Architecture | Custom UNet | | Precision | float32 | --- ## 🧪 Features - Full DDPM implementation from scratch - Custom UNet with attention blocks - CPU-optimized training - Deterministic sampling (seed support) - Config-driven architecture --- ## 🖼️ Results The model generates: - Coherent color palettes - Recognizable Pokémon-like silhouettes - Early-stage structure formation Limitations: - Blurry outputs - Weak spatial consistency - No semantic understanding --- ## THE INITIAL IDEA WAS A STUDENT U-NET FROM A TEACHER U-NET, BUT THIS WAS DISCONTINUED BECAUSE THE TEACHER WAS INITIALIZATED WITH RANDOM WEIGHTS, THAT WOULD KILL THE STUDENT LEARNING ## 🚀 Usage ### Generate images ```python import torch from pathlib import Path from PIL import Image # ===== CONFIG ===== CHECKPOINT = "model.pt" N_IMAGES = 8 STEPS = 50 SEED = 42 OUT = "generated.png" # ===== IMPORT MODEL ===== from train import StudentUNet, DDPMScheduler, Config # ===== LOAD ===== torch.manual_seed(SEED) ckpt = torch.load(CHECKPOINT, map_location="cpu") cfg = ckpt.get("config", Config()) model = StudentUNet(cfg) model.load_state_dict(ckpt["model_state"]) model.eval() scheduler = DDPMScheduler(cfg.timesteps, cfg.beta_start, cfg.beta_end) # ===== SAMPLING ===== @torch.no_grad() def sample(model, scheduler, n, steps): x = torch.randn(n, 3, cfg.image_size, cfg.image_size) step_size = scheduler.T // steps timesteps = list(range(0, scheduler.T, step_size))[::-1] for t_val in timesteps: t = torch.full((n,), t_val, dtype=torch.long) noise_pred = model(x, t) if t_val > 0: ab = scheduler.alpha_bar[t_val] prev_t = max(t_val - step_size, 0) ab_prev = scheduler.alpha_bar[prev_t] beta_t = 1.0 - (ab / ab_prev) alpha_t = 1.0 - beta_t mean = (1.0 / alpha_t.sqrt()) * ( x - (beta_t / (1.0 - ab).sqrt()) * noise_pred ) x = mean + beta_t.sqrt() * torch.randn_like(x) else: x = scheduler.predict_x0(x, noise_pred, t) return x.clamp(-1, 1) samples = sample(model, scheduler, N_IMAGES, STEPS) # ===== SAVE ===== samples = (samples + 1) / 2 samples = (samples * 255).byte().permute(0, 2, 3, 1).numpy() grid = Image.new("RGB", (cfg.image_size * N_IMAGES, cfg.image_size)) for i, img in enumerate(samples): grid.paste(Image.fromarray(img), (i * cfg.image_size, 0)) grid.save(OUT) print(f"✅ Saved to {OUT}") ``` ```bash python generate.py \ --checkpoint model.pt \ --n_images 8 \ --steps 50 \ --seed 42 ``` 📁 Output Generated images are saved as a horizontal grid: outputs/generated.png >> ⚠️ Limitations Unconditional model (no prompts) Limited dataset diversity Early training stage No DDIM (yet) >> 🔬 Research Direction This project demonstrates that: Diffusion models can learn meaningful visual structure even at extremely small scales. Future work: Conditional generation (class-based) Text-to-image (v2.0) DDIM sampling Larger model variants 💡 Motivation Most diffusion research focuses on scaling up. This project explores the opposite direction: What is the minimum viable diffusion model? 📜 License MIT 🙌 Acknowledgments Hugging Face datasets PyTorch The open-source AI community ⭐ If you like this project: Give it a star and follow the evolution to v2.0(conditional) 🚀