| | --- |
| | license: mit |
| | language: |
| | - en |
| | tags: |
| | - sentence-transformer |
| | - embeddings |
| | - mental-health |
| | - intent-classification |
| | pipeline_tag: feature-extraction |
| | base_model: sentence-transformers/all-MiniLM-L6-v2 |
| | --- |
| | |
| | # Intent Encoder (MindPadi) |
| |
|
| | The `intent_encoder` is a Sentence Transformer model used in the MindPadi mental health assistant for **encoding user messages into dense embeddings**. These embeddings support intent classification, similarity search, and memory recall workflows. It plays a foundational role in the semantic understanding of user inputs across various MindPadi features. |
| |
|
| |
|
| | ## π§ Model Overview |
| |
|
| | - **Architecture:** Sentence-BERT (`all-MiniLM-L6-v2` base) |
| | - **Task:** Sentence Embedding / Semantic Similarity |
| | - **Purpose:** Embed user queries for intent classification, vector search, and memory retrieval |
| | - **Size:** ~80M parameters |
| | - **Files:** |
| | - `config.json` |
| | - `pytorch_model.bin` or `model.safetensors` |
| | - `tokenizer.json`, `vocab.txt` |
| | - `1_Pooling/`, `2_Normalize/` (Sentence-BERT components) |
| |
|
| |
|
| | ## π§Ύ Intended Use |
| |
|
| | ### βοΈ Primary Use Cases |
| | - Semantic embedding of user inputs for intent recognition |
| | - Matching new messages against known intent samples (`data/processed_intents.json`) |
| | - Supporting vector similarity in MongoDB Atlas Search or ChromaDB |
| | - Powering memory in LangGraph agentic workflows |
| |
|
| | ### π« Not Recommended For |
| | - Direct intent classification (this model returns embeddings, not classes) |
| | - Use outside of NLP (e.g., image, audio) |
| |
|
| |
|
| | ## π§ͺ Integration in MindPadi |
| |
|
| | - `app/chatbot/intent_classifier.py`: Uses this model to compute sentence embeddings |
| | - `app/chatbot/intent_router.py`: Leverages vector similarity for intent matching |
| | - `database/vector_search.py`: Embeddings are stored or queried from MongoDB vector index |
| | - `app/utils/embedding_search.py`: Embeds utterances for real-time nearest-neighbor lookup |
| |
|
| |
|
| | ## ποΈ Training Details |
| |
|
| | - **Base Model:** `sentence-transformers/all-MiniLM-L6-v2` (pretrained) |
| | - **Fine-tuning:** Optional domain-specific contrastive learning using pairs in `training/datasets/fallback_pairs.json` |
| | - **Script:** `training/fine_tune_encoder.py` (if fine-tuned) |
| | - **Tokenizer:** BERT-based WordPiece tokenizer |
| | - **Max Token Length:** 128 |
| |
|
| |
|
| | ## π Evaluation |
| |
|
| | While this model is not evaluated via classification metrics, its **embedding quality** was assessed through: |
| |
|
| | - **Cosine similarity tests** (intent embedding similarity) |
| | - **Intent clustering accuracy** with `KMeans` in vector space |
| | - **Recall@K** for correct intent retrieval |
| | - **Visualizations:** UMAP plots (`logs/intent_umap.png`) |
| |
|
| | Results indicate: |
| | - High-quality clustering of semantically similar intents |
| | - ~91% Top-3 Recall for known intents |
| |
|
| |
|
| | ## π¬ Example Usage |
| |
|
| | ```python |
| | from sentence_transformers import SentenceTransformer |
| | |
| | model = SentenceTransformer("mindpadi/intent_encoder") |
| | |
| | texts = ["I want to talk to a therapist", "Book a session", "I'm feeling anxious"] |
| | embeddings = model.encode(texts) |
| | |
| | print(embeddings.shape) # (3, 384) |
| | ```` |
| |
|
| |
|
| | ## π§ͺ Deployment (API Example) |
| |
|
| | ```python |
| | import requests |
| | |
| | endpoint = "https://api-inference.huggingface.co/models/mindpadi/intent_encoder" |
| | headers = {"Authorization": f"Bearer <your-token>"} |
| | payload = {"inputs": "I need help managing stress"} |
| | |
| | response = requests.post(endpoint, json=payload, headers=headers) |
| | embedding = response.json() |
| | ``` |
| |
|
| |
|
| | ## β οΈ Limitations |
| |
|
| | * English-only |
| | * Short, clean sentences work best (not optimized for long documents) |
| | * Does not directly return intent labels β must be paired with clustering or classification logic |
| | * May yield ambiguous vectors for multi-intent or vague inputs |
| |
|
| |
|
| | ## π License |
| |
|
| | MIT License β open for personal, academic, and commercial use with attribution. |
| |
|
| |
|
| | ## π¬ Contact |
| |
|
| | * **Project:** [MindPadi Mental Health Assistant](https://huggingface.co/mindpadi) |
| | * **Team:** MindPadi Developers |
| | * **Email:** \[[you@example.com](mailto:you@example.com)] |
| | * **GitHub:** \[[https://github.com/mindpadi](https://github.com/mindpadi)] |
| |
|
| | *Last updated: May 2025* |