GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
Paper • 2112.07577 • Published
How to use GPL/msmarco-distilbert-margin-mse with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("feature-extraction", model="GPL/msmarco-distilbert-margin-mse") # Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("GPL/msmarco-distilbert-margin-mse")
model = AutoModel.from_pretrained("GPL/msmarco-distilbert-margin-mse")YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
This is the zero-shot baseline model in the paper "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval"
The training setup:
distilbert-base-uncased;sentence-transformers/msmarco-distilbert-base-v3 and sentence-transformers/msmarco-MiniLM-L-6-v3;cross-encoder/ms-marco-MiniLM-L-6-v2 for 70K steps with batch size 75, max. sequence-length 350.