| | --- |
| | license: cc-by-sa-4.0 |
| | --- |
| | buffer-embedding-002是一个文本嵌入模型,该模型可以不进行任何微调来生成针对任何任务(例如,分类、检索、聚类、文本评估等)和领域(例如,科学、金融等)定制的文本嵌入。 |
| |
|
| | ``` |
| | from transformers import AutoTokenizer, AutoModel |
| | tokenizer = AutoTokenizer.from_pretrained("csdc-atl/buffer-embedding-002", trust_remote_code=True) |
| | model = AutoModel.from_pretrained("csdc-atl/buffer-embedding-002", trust_remote_code=True) |
| | |
| | text = '通过海外区账号的CLI将海外区的AMI下载到S3中,再将AMI上传到中国区S3中' |
| | input_ids = tokenizer.encode( |
| | text, |
| | add_special_tokens=True, |
| | return_tensors='pt' |
| | ) |
| | with torch.no_grad(): |
| | embedding = model(input_ids) |
| | print(y.shape) |
| | ``` |