--- license: apache-2.0 language: - en tags: - bio-to-tags - tag-generation - smollm2 - text-generation - personality - interests - spiceechat pipeline_tag: text-generation library_name: transformers ---

SpiceeChat

SmolLM2 QLoRA SpiceeChat License

--- # 🏷️ Bio2Tags-Lite **Because reading between the lines shouldn't require a psychology degree.** Bio2Tags-Lite is a fine-tuned SmolLM2-360M model that reads personal biographies and returns clean, structured personality tags. Feed it a dating bio, a LinkedIn summary, or whatever someone wrote about themselves at 2am β€” it'll tell you what kind of person they actually are. No rambling. No fluff. Just tags. --- ## ✨ Features - **Lightweight**: 360M parameters β€” runs on hardware that would make a gamer cry - **Fast**: Inference in milliseconds, because nobody has time to wait - **Structured Output**: Clean comma-separated tags, every time - **Plug & Play**: Works with Transformers out of the box, no PhD required - **SpiceeChat Pipeline**: Pairs with Cinder-1.5B like peanut butter and heartbreak --- ## πŸ§ͺ Example ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "SpiceeChat/Bio2Tags-Lite", torch_dtype="auto", device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("SpiceeChat/Bio2Tags-Lite") def get_tags(bio): prompt = f"Extract personality tags from the bio below. Output ONLY comma-separated tags, nothing else.\n\nBio: {bio}\n\nTags:" messages = [{"role": "user", "content": prompt}] formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(formatted, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True) return tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True).strip() # Try it print(get_tags("I love hiking at dawn, painting watercolors, and deep conversations about philosophy.")) # Output: nature-lover, artist, intellectual, deep-thinker ``` --- ## πŸ“Š Sample Outputs | Bio | Tags | |-----|------| | "I'm a software engineer who loves late-night coding and playing jazz piano." | tech-savvy, creative, night-owl, music-enthusiast, artistic | | "I spend my weekends trail running and evenings reading classic literature." | adventurous, nature-lover, bookworm, intellectual, quiet | | "I'm a retired teacher who gardens, reads history books, and bakes sourdough." | intellectual, family-oriented, gardener, history-buff, old-soul | | "As a digital nomad, my office changes weekly β€” from Bali cafes to Alpine cabins." | adventurous, creative, digital-nomad, spontaneous, tech-savvy | *(Yes, the sourdough one is a stereotype. Yes, it's also always accurate.)* --- ## πŸ“¦ Installation ```bash pip install transformers torch accelerate ``` That's it. No ritual sacrifices, no config files, no Stack Overflow rabbit holes. --- ## 🎯 Use Cases - **Dating Apps**: Tag user bios automatically for smarter matching β€” because "I like long walks on the beach" means something very different than "I like long walks on the beach at 3am alone" - **Social Media**: Generate relevant hashtags from profile descriptions - **Recommender Systems**: Build personality-based recommendation engines - **Content Analysis**: Extract structured metadata from unstructured text - **SpiceeChat Pipeline**: Feed extracted tags into Cinder-1.5B for personalized compatibility advice --- ## πŸ› οΈ Technical Details | Detail | Value | |--------|-------| | **Base Model** | [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) | | **Fine-tuning Method** | QLoRA (4-bit quantization, rank-16 adapters) | | **Training Framework** | Unsloth | | **Training Data** | 1,387 hand-crafted (bio, tags) pairs | | **Epochs** | 3 | | **Learning Rate** | 1e-4 | | **Sequence Length** | 512 tokens | | **Hardware Used** | Google Colab T4 (free tier β€” yes, really) | | **Final Size** | 724 MB (FP16) | | **Min VRAM Required** | ~1.5 GB | --- ## ⚠️ Limitations - **English only**: Other languages may produce results ranging from "creative" to "confidently wrong" - **Training data size**: 1,387 examples is a solid start β€” more data is always on the roadmap - **Tag granularity**: Captures the salient stuff, not every quirk (the model can't detect if someone is secretly obsessed with true crime podcasts) - **Edge cases**: Very short bios, emoji-heavy text, or deeply abstract descriptions may surprise you --- ## 🧠 Part of the SpiceeChat Ecosystem Bio2Tags-Lite is a core component of the SpiceeChat AI pipeline: - 🏷️ **Bio2Tags-Lite** β†’ Extracts personality tags from bios - πŸ”₯ **[Cinder-1.5B](https://huggingface.co/SpiceeChat/Cinder-1.5B)** β†’ Personalized dating advice powered by those tags - 🌐 **[dating-fatigue.com](https://dating-fatigue.com)** β†’ Live tools for real humans trying to find real love --- ## πŸ“œ License Apache 2.0 β€” use it, modify it, ship it. Just give SpiceeChat a nod. ---
Built with ❀️ by SpiceeChat
πŸ”— huggingface.co/SpiceeChat