Instructions to use microsoft/xclip-base-patch32 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/xclip-base-patch32 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("video-classification", model="microsoft/xclip-base-patch32")# Load model directly from transformers import AutoProcessor, AutoModel processor = AutoProcessor.from_pretrained("microsoft/xclip-base-patch32") model = AutoModel.from_pretrained("microsoft/xclip-base-patch32") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -31,6 +31,8 @@ Disclaimer: The team releasing X-CLIP did not write a model card for this model
|
|
| 31 |
|
| 32 |
X-CLIP is a minimal extension of [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) for general video-language understanding. The model is trained in a contrastive way on (video, text) pairs.
|
| 33 |
|
|
|
|
|
|
|
| 34 |
This allows the model to be used for tasks like zero-shot, few-shot or fully supervised video classification and video-text retrieval.
|
| 35 |
|
| 36 |
## Intended uses & limitations
|
|
|
|
| 31 |
|
| 32 |
X-CLIP is a minimal extension of [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) for general video-language understanding. The model is trained in a contrastive way on (video, text) pairs.
|
| 33 |
|
| 34 |
+

|
| 35 |
+
|
| 36 |
This allows the model to be used for tasks like zero-shot, few-shot or fully supervised video classification and video-text retrieval.
|
| 37 |
|
| 38 |
## Intended uses & limitations
|