| | --- |
| | license: cc-by-4.0 |
| | datasets: |
| | - NingLab/MMECInstruct |
| | base_model: |
| | - meta-llama/Llama-3.2-3B-Instruct |
| | --- |
| | |
| | # CASLIE-S |
| |
|
| | This repo contains the models for "Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data" |
| |
|
| | ## CASLIE Models |
| | The CASLIE-S model is instruction-tuned from the small base models [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct). |
| |
|
| | ## Citation |
| | ```bibtex |
| | @article{ling2024captions, |
| | title={Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data}, |
| | author={Ling, Xinyi and Peng, Bo and Du, Hanwen and Zhu, Zhihui and Ning, Xia}, |
| | journal={arXiv preprint arXiv:2410.17337}, |
| | year={2024} |
| | } |
| | ``` |