Instructions to use google/siglip2-base-patch16-224 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/siglip2-base-patch16-224 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="google/siglip2-base-patch16-224") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("google/siglip2-base-patch16-224", dtype="auto") - Notebooks
- Google Colab
- Kaggle
question about number of parameters in vision encoder
#6
by weidu - opened
The base model of siglip2-base-patch16-224 is said to have 86M parameters, but using this code to count, I have 92M. Why is that?
def count_parameters(model):
return sum(p.numel() for p in model.parameters())
image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip2-base-patch16-224")
vision_model = image_classifier.model.vision_model
n_params = count_parameters(vision_model)
print(f"Parameters in vision encoder: {vision_params:,}")
TL;DR the difference is due to the MAP/attention pooling head. More details here: https://github.com/google-research/big_vision/issues/159