Instructions to use Salesforce/blip-image-captioning-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Salesforce/blip-image-captioning-base with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base") model = AutoModelForImageTextToText.from_pretrained("Salesforce/blip-image-captioning-base") - Notebooks
- Google Colab
- Kaggle
Add TF weights
Model converted by the transformers' pt_to_tf CLI. All converted model outputs and hidden layers were validated against its PyTorch counterpart.
Maximum crossload output difference=5.970e-03; Maximum crossload hidden layer difference=1.053e-01;
Maximum conversion output difference=5.970e-03; Maximum conversion hidden layer difference=1.053e-01;
CAUTION: The maximum admissible error was manually increased to 0.2!
Thanks!
Hi, sorry to bump this thread.
Could you please give me some pointers on how to use the .h5 file with TF? It only contains weights so I understand the model structure needs to be created in code beforehand. What is the transformers module that the PyTorch example imports?