| # PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation | |
| <div align="center"> | |
| <a href="https://plan-lab.github.io/pyratok"><img src="https://img.shields.io/badge/Project-Website-blue?style=for-the-badge&logo=googlechrome"></a> | |
| <a href="https://arxiv.org/abs/2601.16210"><img src="https://img.shields.io/badge/arXiv-2601.16210-b31b1b.svg?style=for-the-badge"></a> | |
| <a href="https://github.com/PLAN-Lab/PyraTok"><img src="https://img.shields.io/badge/Code-GitHub-black?style=for-the-badge&logo=github"></a> | |
| </div> | |
| --- | |
| ### π’ Official Announcement | |
| **PyraTok** has been officially accepted to **CVPR 2026**! π | |
| This repository contains the pretrained weights and model implementation for the Language-aligned Pyramidal Tokenizer. | |
| --- | |
| ## π Overview | |
| **PyraTok** is a state-of-the-art video tokenizer that bridges the gap between video understanding and generation. Unlike traditional VAEs that operate at a single visual scale, PyraTok introduces a **Language-aligned Pyramidal Quantization (LaPQ)** module. | |
| ### Key Innovations: | |
| * **Pyramidal Structure:** Learns semantically structured discrete latents across multiple spatiotemporal resolutions. | |
| * **Language Alignment:** Tightly couples visual tokens with language using a shared, large binary codebook (up to 48K tokens). | |
| * **Scalability:** Robustly scales from standard resolutions to **4K/8K video** processing. | |
| * **Unified Backbone:** A single model that excels in Video QA, Zero-Shot Segmentation, and high-fidelity Text-to-Video generation. | |
| ``` | |
| @inproceedings{susladkar2026pyratok, | |
| title={PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation}, | |
| author={Susladkar, Onkar and Prakash, Tushar and Juvekar, Adheesh and Nguyen, Kiet A. and Jang, Dong-Hwan and Dhillon, Inderjit S. and Lourentzou, Ismini}, | |
| booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, | |
| year={2026} | |
| } | |
| ``` |