Update README.md

363c917 verified 1 day ago

4.67 kB

	---
	license: mit
	datasets:
	- svjack/pokemon-blip-captions-en-zh
	pipeline_tag: unconditional-image-generation
	tags:
	- diffusion
	- tiny
	- pokemon
	- U-Net
	- from_scratch
	- 9m
	- pokepixels
	- pixels
	- diff
	- diffusers
	---

	# PokéPixels1-9M (CPU)

	A minimal diffusion model trained from scratch on CPU.

	This project explores the lower limits of diffusion models:
	How small and simple can a diffusion model be while still producing recognizable images?

	---

	Here are some "Fakemons" generated by the model: (64x64 Resolution)

	![image](https://cdn-uploads.huggingface.co/production/uploads/68df176c403a7bf9e8ae85a8/V53ucdxepERZ0RDhVp_hl.png)

	![image](https://cdn-uploads.huggingface.co/production/uploads/68df176c403a7bf9e8ae85a8/pkP0-pWDjxDZkvXrBnJYQ.png)

	## Overview

	TinyPokemonDiffusion is a lightweight DDPM-based generative model trained on Pokémon images.

	Despite its small size and CPU-only training, the model learns:
	- Color distributions
	- Basic shapes
	- Early-stage object structure

	---

	## Specifications

	\| Component \| Value \|
	\|------------------\|------\|
	\| Parameters \| ~9M \|
	\| Resolution \| 64x64 \|
	\| Training Device \| CPU (Ryzen 5 5600G) \|
	\| Training Time \| ~5.5 hours \|
	\| Dataset \| pokemon-blip-captions \|
	\| Architecture \| Custom UNet \|
	\| Precision \| float32 \|

	---

	## Features

	- Full DDPM implementation from scratch
	- Custom UNet with attention blocks
	- CPU-optimized training
	- Deterministic sampling (seed support)
	- Config-driven architecture

	---

	## Results

	The model generates:

	- Coherent color palettes
	- Recognizable Pokémon-like silhouettes
	- Early-stage structure formation

	Limitations:
	- Blurry outputs
	- Weak spatial consistency
	- No semantic understanding

	---


	## Usage

	### Generate images



	```python

	import torch
	from pathlib import Path
	from PIL import Image

	# ===== CONFIG =====
	CHECKPOINT = "model.pt"
	N_IMAGES = 8
	STEPS = 50
	SEED = 42
	OUT = "generated.png"

	# ===== IMPORT MODEL =====
	from train import StudentUNet, DDPMScheduler, Config

	# ===== LOAD =====
	torch.manual_seed(SEED)

	ckpt = torch.load(CHECKPOINT, map_location="cpu")
	cfg = ckpt.get("config", Config())

	model = StudentUNet(cfg)
	model.load_state_dict(ckpt["model_state"])
	model.eval()

	scheduler = DDPMScheduler(cfg.timesteps, cfg.beta_start, cfg.beta_end)

	# ===== SAMPLING =====
	@torch.no_grad()
	def sample(model, scheduler, n, steps):
	x = torch.randn(n, 3, cfg.image_size, cfg.image_size)

	step_size = scheduler.T // steps
	timesteps = list(range(0, scheduler.T, step_size))[::-1]

	for t_val in timesteps:
	t = torch.full((n,), t_val, dtype=torch.long)

	noise_pred = model(x, t)

	if t_val > 0:
	ab = scheduler.alpha_bar[t_val]
	prev_t = max(t_val - step_size, 0)
	ab_prev = scheduler.alpha_bar[prev_t]

	beta_t = 1.0 - (ab / ab_prev)
	alpha_t = 1.0 - beta_t

	mean = (1.0 / alpha_t.sqrt()) * (
	x - (beta_t / (1.0 - ab).sqrt()) * noise_pred
	)

	x = mean + beta_t.sqrt() * torch.randn_like(x)
	else:
	x = scheduler.predict_x0(x, noise_pred, t)

	return x.clamp(-1, 1)

	samples = sample(model, scheduler, N_IMAGES, STEPS)

	# ===== SAVE =====
	samples = (samples + 1) / 2
	samples = (samples * 255).byte().permute(0, 2, 3, 1).numpy()

	grid = Image.new("RGB", (cfg.image_size * N_IMAGES, cfg.image_size))

	for i, img in enumerate(samples):
	grid.paste(Image.fromarray(img), (i * cfg.image_size, 0))

	grid.save(OUT)

	print(f"✅ Saved to {OUT}")


	```

	```bash
	python generate.py \
	--checkpoint model.pt \
	--n_images 8 \
	--steps 50 \
	--seed 42
	```

	Output

	Generated images are saved as a horizontal grid:

	outputs/generated.png

	>> Limitations

	Unconditional model (no prompts)

	Limited dataset diversity
	Early training stage
	No DDIM (yet)

	>> Research Direction

	This project demonstrates that:

	Diffusion models can learn meaningful visual structure even at extremely small scales.

	Future work:

	Conditional generation (class-based)
	Text-to-image (v2.0)
	DDIM sampling
	Larger model variants
	Motivation

	Most diffusion research focuses on scaling up.

	This project explores the opposite direction:

	What is the minimum viable diffusion model?

	License

	MIT

	Acknowledgments

	Hugging Face datasets
	PyTorch
	The open-source AI community

	If you like this project:

	Give it a star and follow the evolution to v2.0(conditional)

	## Other
	THE INITIAL IDEA WAS A STUDENT U-NET FROM A TEACHER U-NET, BUT THIS WAS DISCONTINUED BECAUSE THE TEACHER WAS INITIALIZATED WITH RANDOM WEIGHTS, THAT WOULD KILL THE STUDENT LEARNING