tokenization
video generation
vae