GenLIP
Collection
Model weights of paper "Let ViT Speak: Generative Language-Image Pre-training" • 6 items • Updated • 2
This repository serves as the official model zoo for Let ViT Speak: Generative Language-Image Pre-training.
We use siglip image preprocessor for our fixed low resolution models (*-224), and use a Qwen2-VL style image preprocessor for our NaViT models (*-NaViT).
Pretraining and implementation details can be found in our codebase [GenLIP].