| | --- |
| | license: apache-2.0 |
| | base_model: sentence-transformers/all-MiniLM-L6-v2 |
| | library_name: sentence-transformers |
| | pipeline_tag: sentence-similarity |
| | --- |
| | # HAI - HelpingAI Semantic Similarity Model |
| |
|
| | This is a **custom Sentence Transformer model** fine-tuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). Designed as part of the **HelpingAI ecosystem**, it enhances **semantic similarity and contextual understanding**, with an emphasis on **emotionally intelligent responses**. |
| |
|
| | ## Model Highlights |
| |
|
| | - **Base Model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) |
| |
|
| | ## Model Details |
| |
|
| | ### Features: |
| | - **Input Dimensionality:** Handles up to 256 tokens per input. |
| | - **Output Dimensionality:** 384-dimensional dense embeddings. |
| |
|
| | ### Full Architecture |
| | ```python |
| | SentenceTransformer( |
| | (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) |
| | (1): Pooling({'pooling_mode_mean_tokens': True}) |
| | (2): Normalize() |
| | ) |
| | ``` |
| |
|
| |
|
| | ## Training Overview |
| |
|
| | ### Dataset: |
| | - **Size:** 75897 samples |
| | - **Structure:** `<sentence_0, sentence_1, similarity_score>` |
| | - **Labels:** Float values between 0 (no similarity) and 1 (high similarity). |
| |
|
| | ### Training Method: |
| | - **Loss Function:** Cosine Similarity Loss |
| | - **Batch Size:** 16 |
| | - **Epochs:** 20 |
| | - **Optimization:** AdamW optimizer with a learning rate of `5e-5`. |
| |
|
| | ## Getting Started |
| |
|
| | ### Installation |
| | Ensure you have the `sentence-transformers` library installed: |
| | ```bash |
| | pip install -U sentence-transformers |
| | ``` |
| |
|
| | ### Quick Start |
| | Load and use the model in your Python environment: |
| | ```python |
| | from sentence_transformers import SentenceTransformer |
| | |
| | # Load the HelpingAI semantic similarity model |
| | model = SentenceTransformer("HelpingAI/HAI") |
| | |
| | # Encode sentences |
| | sentences = [ |
| | "A woman is slicing a pepper.", |
| | "A girl is styling her hair.", |
| | "The sun is shining brightly today." |
| | ] |
| | embeddings = model.encode(sentences) |
| | print(embeddings.shape) # Output: (3, 384) |
| | |
| | # Calculate similarity |
| | from sklearn.metrics.pairwise import cosine_similarity |
| | similarity_scores = cosine_similarity([embeddings[0]], embeddings[1:]) |
| | print(similarity_scores) |
| | ``` |
| | high accuracy in sentiment-informed response tests. |
| |
|
| | ## Citation |
| |
|
| | If you use the HAI model, please cite the original Sentence-BERT paper: |
| |
|
| | ```bibtex |
| | @inproceedings{reimers-2019-sentence-bert, |
| | title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
| | author = "Reimers, Nils and Gurevych, Iryna", |
| | booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
| | year = "2019", |
| | publisher = "Association for Computational Linguistics", |
| | url = "https://arxiv.org/abs/1908.10084", |
| | } |
| | ``` |