| | --- |
| | inference: false |
| | datasets: |
| | - answerdotai/MMARCO-japanese-32-scored-triplets |
| | - unicamp-dl/mmarco |
| | language: |
| | - ja |
| | pipeline_tag: sentence-similarity |
| | tags: |
| | - ColBERT |
| | base_model: |
| | - cl-tohoku/bert-base-japanese-v3 |
| | - bclavie/JaColBERT |
| | license: mit |
| | library_name: RAGatouille |
| | --- |
| | |
| | Model weights for the JaColBERTv2.4 checkpoint, which is the pre-post-training version of JaColBERTv2.5, using an entirely overhauled training recipe and trained on just 40% of the data of JaColBERTv2. |
| |
|
| | This model largely outperforms all previous approaches, including JaColBERTV2 multilingual models such as BGE-M3, on all datasets. |
| |
|
| | This page will be updated with the full details and the model report in the next few days. |
| |
|
| | ``` |
| | @misc{clavié2024jacolbertv25optimisingmultivectorretrievers, |
| | title={JaColBERTv2.5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources}, |
| | author={Benjamin Clavié}, |
| | year={2024}, |
| | eprint={2407.20750}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.IR}, |
| | url={https://arxiv.org/abs/2407.20750}, |
| | } |
| | ``` |
| |
|