This repository serves as the official model zoo for Let ViT Speak: Generative Language-Image Pre-training.

Currently released models

Mdels from fixed low resolution pretraining:

GenLIP-L16-224
GenLIP-So16-224
GenLIP-g16-224

NaViT models:

GenLIP-L16-NaViT
GenLIP-So16-NaViT
GenLIP-g16-NaViT

We use siglip image preprocessor for our fixed low resolution models (*-224), and use a Qwen2-VL style image preprocessor for our NaViT models (*-NaViT).

Pretraining and implementation details can be found in our codebase [GenLIP].

Downloads last month: 3

Safetensors

Model size

0.6B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including YanFang/GenLIP-L16-NaViT

GenLIP

Collection

Model weights of paper "Let ViT Speak: Generative Language-Image Pre-training" • 6 items • Updated May 5 • 8