Instructions to use hkunlp/instructor-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use hkunlp/instructor-base with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("hkunlp/instructor-base") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use hkunlp/instructor-base with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("hkunlp/instructor-base") model = AutoModel.from_pretrained("hkunlp/instructor-base") - Notebooks
- Google Colab
- Kaggle
Fine-tuning and multilingual capability
First I find this embedding very very interesting.
Indeed, I've always been frustrated by the fact that it was not possible to "explain" to an embedding what is the purpose of the embedding. Thanks to your work it is now possible.
I would like to know if you plan to make this model multilingual and how would it be possible to fine-tune it to be multilingual and to fine tuning to more specific task ?
Thanks in advance
Thank you very much for your interests!
We are considering making this model multilingual. It is very easy to finetune the model on more specific tasks. You may prepare the data following the format in https://github.com/HKUNLP/instructor-embedding#training, store them as a json file and name it as medi-data.json. Next, just follow the README: https://github.com/HKUNLP/instructor-embedding#train-instructor, and train the model!
If you encounter any problem, feel free to leave your question here or contact me at hjsu@cs.hku.hk!