| | --- |
| | library_name: model2vec |
| | license: mit |
| | model_name: cnmoro/custom-model2vec-tokenlearn-medium |
| | tags: |
| | - embeddings |
| | - static-embeddings |
| | - sentence-transformers |
| | language: |
| | - pt |
| | - en |
| | --- |
| | A custom model2vec model, trained using a modified version of the [tokenlearn](https://github.com/MinishLab/tokenlearn) library. |
| |
|
| | Base model is nomic-ai/nomic-embed-text-v2-moe. |
| |
|
| | The output dimension is 256, and the vocabulary size is 249.999 |
| |
|
| | The training process used a mix of English (10%) and Portuguese (90%) texts. |
| |
|
| | ```python |
| | from model2vec import StaticModel |
| | |
| | # Load a pretrained Sentence Transformer model |
| | model = StaticModel.from_pretrained("cnmoro/custom-model2vec-tokenlearn-medium") |
| | |
| | # Compute text embeddings |
| | embeddings = model.encode(["Example sentence"]) |
| | ``` |