Video Classification
Transformers
PyTorch
Safetensors
English
xclip
feature-extraction
vision
Eval Results (legacy)
Instructions to use microsoft/xclip-base-patch16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/xclip-base-patch16 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("video-classification", model="microsoft/xclip-base-patch16")# Load model directly from transformers import AutoProcessor, AutoModel processor = AutoProcessor.from_pretrained("microsoft/xclip-base-patch16") model = AutoModel.from_pretrained("microsoft/xclip-base-patch16") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ model-index:
|
|
| 21 |
|
| 22 |
# X-CLIP (base-sized model)
|
| 23 |
|
| 24 |
-
X-CLIP model (base-sized, patch resolution of
|
| 25 |
|
| 26 |
This model was trained using 8 frames per video, at a resolution of 224x224.
|
| 27 |
|
|
|
|
| 21 |
|
| 22 |
# X-CLIP (base-sized model)
|
| 23 |
|
| 24 |
+
X-CLIP model (base-sized, patch resolution of 16) trained fully-supervised on [Kinetics-400](https://www.deepmind.com/open-source/kinetics). It was introduced in the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Ni et al. and first released in [this repository](https://github.com/microsoft/VideoX/tree/master/X-CLIP).
|
| 25 |
|
| 26 |
This model was trained using 8 frames per video, at a resolution of 224x224.
|
| 27 |
|