Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use lufercho/AxvBert-Sentente-Transformer with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("lufercho/AxvBert-Sentente-Transformer")
sentences = [
"A Comprehensive Approach to Universal Piecewise Nonlinear Regression\n Based on Trees",
" In sparse recovery we are given a matrix $A$ (the dictionary) and a vector of\nthe form $A X$ where $X$ is sparse, and the goal is to recover $X$. This is a\ncentral notion in signal processing, statistics and machine learning. But in\napplications such as sparse coding, edge detection, compression and super\nresolution, the dictionary $A$ is unknown and has to be learned from random\nexamples of the form $Y = AX$ where $X$ is drawn from an appropriate\ndistribution --- this is the dictionary learning problem. In most settings, $A$\nis overcomplete: it has more columns than rows. This paper presents a\npolynomial-time algorithm for learning overcomplete dictionaries; the only\npreviously known algorithm with provable guarantees is the recent work of\nSpielman, Wang and Wright who gave an algorithm for the full-rank case, which\nis rarely the case in applications. Our algorithm applies to incoherent\ndictionaries which have been a central object of study since they were\nintroduced in seminal work of Donoho and Huo. In particular, a dictionary is\n$\\mu$-incoherent if each pair of columns has inner product at most $\\mu /\n\\sqrt{n}$.\n The algorithm makes natural stochastic assumptions about the unknown sparse\nvector $X$, which can contain $k \\leq c \\min(\\sqrt{n}/\\mu \\log n, m^{1/2\n-\\eta})$ non-zero entries (for any $\\eta > 0$). This is close to the best $k$\nallowable by the best sparse recovery algorithms even if one knows the\ndictionary $A$ exactly. Moreover, both the running time and sample complexity\ndepend on $\\log 1/\\epsilon$, where $\\epsilon$ is the target accuracy, and so\nour algorithms converge very quickly to the true dictionary. Our algorithm can\nalso tolerate substantial amounts of noise provided it is incoherent with\nrespect to the dictionary (e.g., Gaussian). In the noisy setting, our running\ntime and sample complexity depend polynomially on $1/\\epsilon$, and this is\nnecessary.\n",
" In this paper, we investigate adaptive nonlinear regression and introduce\ntree based piecewise linear regression algorithms that are highly efficient and\nprovide significantly improved performance with guaranteed upper bounds in an\nindividual sequence manner. We use a tree notion in order to partition the\nspace of regressors in a nested structure. The introduced algorithms adapt not\nonly their regression functions but also the complete tree structure while\nachieving the performance of the \"best\" linear mixture of a doubly exponential\nnumber of partitions, with a computational complexity only polynomial in the\nnumber of nodes of the tree. While constructing these algorithms, we also avoid\nusing any artificial \"weighting\" of models (with highly data dependent\nparameters) and, instead, directly minimize the final regression error, which\nis the ultimate performance goal. The introduced methods are generic such that\nthey can readily incorporate different tree construction methods such as random\ntrees in their framework and can use different regressor or partitioning\nfunctions as demonstrated in the paper.\n",
" In this paper we propose a multi-task linear classifier learning problem\ncalled D-SVM (Dictionary SVM). D-SVM uses a dictionary of parameter covariance\nshared by all tasks to do multi-task knowledge transfer among different tasks.\nWe formally define the learning problem of D-SVM and show two interpretations\nof this problem, from both the probabilistic and kernel perspectives. From the\nprobabilistic perspective, we show that our learning formulation is actually a\nMAP estimation on all optimization variables. We also show its equivalence to a\nmultiple kernel learning problem in which one is trying to find a re-weighting\nkernel for features from a dictionary of basis (despite the fact that only\nlinear classifiers are learned). Finally, we describe an alternative\noptimization scheme to minimize the objective function and present empirical\nstudies to valid our algorithm.\n"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from lufercho/my-finetuned-bert-mlm. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("lufercho/AxvBert-Sentente-Transformer")
# Run inference
sentences = [
'Multi-Armed Bandits in Metric Spaces',
' In a multi-armed bandit problem, an online algorithm chooses from a set of\nstrategies in a sequence of trials so as to maximize the total payoff of the\nchosen strategies. While the performance of bandit algorithms with a small\nfinite strategy set is quite well understood, bandit problems with large\nstrategy sets are still a topic of very active investigation, motivated by\npractical applications such as online auctions and web advertisement. The goal\nof such research is to identify broad and natural classes of strategy sets and\npayoff functions which enable the design of efficient solutions. In this work\nwe study a very general setting for the multi-armed bandit problem in which the\nstrategies form a metric space, and the payoff function satisfies a Lipschitz\ncondition with respect to the metric. We refer to this problem as the\n"Lipschitz MAB problem". We present a complete solution for the multi-armed\nproblem in this setting. That is, for every metric space (L,X) we define an\nisometry invariant which bounds from below the performance of Lipschitz MAB\nalgorithms for X, and we present an algorithm which comes arbitrarily close to\nmeeting this bound. Furthermore, our technique gives even better results for\nbenign payoff functions.\n',
' Applications such as face recognition that deal with high-dimensional data\nneed a mapping technique that introduces representation of low-dimensional\nfeatures with enhanced discriminatory power and a proper classifier, able to\nclassify those complex features. Most of traditional Linear Discriminant\nAnalysis suffer from the disadvantage that their optimality criteria are not\ndirectly related to the classification ability of the obtained feature\nrepresentation. Moreover, their classification accuracy is affected by the\n"small sample size" problem which is often encountered in FR tasks. In this\nshort paper, we combine nonlinear kernel based mapping of data called KDDA with\nSupport Vector machine classifier to deal with both of the shortcomings in an\nefficient and cost effective manner. The proposed here method is compared, in\nterms of classification accuracy, to other commonly used FR methods on UMIST\nface database. Results indicate that the performance of the proposed method is\noverall superior to those of traditional FR approaches, such as the Eigenfaces,\nFisherfaces, and D-LDA methods and traditional linear classifiers.\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
Validation of nonlinear PCA |
Linear principal component analysis (PCA) can be extended to a nonlinear PCA |
Learning Attitudes and Attributes from Multi-Aspect Reviews |
The majority of online reviews consist of plain-text feedback together with a |
Bayesian Differential Privacy through Posterior Sampling |
Differential privacy formalises privacy-preserving mechanisms that provide |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 2multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss |
|---|---|---|
| 1.5974 | 500 | 0.3039 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
lufercho/my-finetuned-bert-mlm