---
license: apache-2.0
language:
- en
tags:
- bio-to-tags
- tag-generation
- smollm2
- text-generation
- personality
- interests
- spiceechat
pipeline_tag: text-generation
library_name: transformers
---
---
# π·οΈ Bio2Tags-Lite
**Because reading between the lines shouldn't require a psychology degree.**
Bio2Tags-Lite is a fine-tuned SmolLM2-360M model that reads personal biographies and returns clean, structured personality tags. Feed it a dating bio, a LinkedIn summary, or whatever someone wrote about themselves at 2am β it'll tell you what kind of person they actually are.
No rambling. No fluff. Just tags.
---
## β¨ Features
- **Lightweight**: 360M parameters β runs on hardware that would make a gamer cry
- **Fast**: Inference in milliseconds, because nobody has time to wait
- **Structured Output**: Clean comma-separated tags, every time
- **Plug & Play**: Works with Transformers out of the box, no PhD required
- **SpiceeChat Pipeline**: Pairs with Cinder-1.5B like peanut butter and heartbreak
---
## π§ͺ Example
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"SpiceeChat/Bio2Tags-Lite",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("SpiceeChat/Bio2Tags-Lite")
def get_tags(bio):
prompt = f"Extract personality tags from the bio below. Output ONLY comma-separated tags, nothing else.\n\nBio: {bio}\n\nTags:"
messages = [{"role": "user", "content": prompt}]
formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
return tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True).strip()
# Try it
print(get_tags("I love hiking at dawn, painting watercolors, and deep conversations about philosophy."))
# Output: nature-lover, artist, intellectual, deep-thinker
```
---
## π Sample Outputs
| Bio | Tags |
|-----|------|
| "I'm a software engineer who loves late-night coding and playing jazz piano." | tech-savvy, creative, night-owl, music-enthusiast, artistic |
| "I spend my weekends trail running and evenings reading classic literature." | adventurous, nature-lover, bookworm, intellectual, quiet |
| "I'm a retired teacher who gardens, reads history books, and bakes sourdough." | intellectual, family-oriented, gardener, history-buff, old-soul |
| "As a digital nomad, my office changes weekly β from Bali cafes to Alpine cabins." | adventurous, creative, digital-nomad, spontaneous, tech-savvy |
*(Yes, the sourdough one is a stereotype. Yes, it's also always accurate.)*
---
## π¦ Installation
```bash
pip install transformers torch accelerate
```
That's it. No ritual sacrifices, no config files, no Stack Overflow rabbit holes.
---
## π― Use Cases
- **Dating Apps**: Tag user bios automatically for smarter matching β because "I like long walks on the beach" means something very different than "I like long walks on the beach at 3am alone"
- **Social Media**: Generate relevant hashtags from profile descriptions
- **Recommender Systems**: Build personality-based recommendation engines
- **Content Analysis**: Extract structured metadata from unstructured text
- **SpiceeChat Pipeline**: Feed extracted tags into Cinder-1.5B for personalized compatibility advice
---
## π οΈ Technical Details
| Detail | Value |
|--------|-------|
| **Base Model** | [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) |
| **Fine-tuning Method** | QLoRA (4-bit quantization, rank-16 adapters) |
| **Training Framework** | Unsloth |
| **Training Data** | 1,387 hand-crafted (bio, tags) pairs |
| **Epochs** | 3 |
| **Learning Rate** | 1e-4 |
| **Sequence Length** | 512 tokens |
| **Hardware Used** | Google Colab T4 (free tier β yes, really) |
| **Final Size** | 724 MB (FP16) |
| **Min VRAM Required** | ~1.5 GB |
---
## β οΈ Limitations
- **English only**: Other languages may produce results ranging from "creative" to "confidently wrong"
- **Training data size**: 1,387 examples is a solid start β more data is always on the roadmap
- **Tag granularity**: Captures the salient stuff, not every quirk (the model can't detect if someone is secretly obsessed with true crime podcasts)
- **Edge cases**: Very short bios, emoji-heavy text, or deeply abstract descriptions may surprise you
---
## π§ Part of the SpiceeChat Ecosystem
Bio2Tags-Lite is a core component of the SpiceeChat AI pipeline:
- π·οΈ **Bio2Tags-Lite** β Extracts personality tags from bios
- π₯ **[Cinder-1.5B](https://huggingface.co/SpiceeChat/Cinder-1.5B)** β Personalized dating advice powered by those tags
- π **[dating-fatigue.com](https://dating-fatigue.com)** β Live tools for real humans trying to find real love
---
## π License
Apache 2.0 β use it, modify it, ship it. Just give SpiceeChat a nod.
---