Instructions to use limloop/whiff-mamba2-20M-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use limloop/whiff-mamba2-20M-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="limloop/whiff-mamba2-20M-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("limloop/whiff-mamba2-20M-v2") model = AutoModelForCausalLM.from_pretrained("limloop/whiff-mamba2-20M-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use limloop/whiff-mamba2-20M-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "limloop/whiff-mamba2-20M-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "limloop/whiff-mamba2-20M-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/limloop/whiff-mamba2-20M-v2
- SGLang
How to use limloop/whiff-mamba2-20M-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "limloop/whiff-mamba2-20M-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "limloop/whiff-mamba2-20M-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "limloop/whiff-mamba2-20M-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "limloop/whiff-mamba2-20M-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use limloop/whiff-mamba2-20M-v2 with Docker Model Runner:
docker model run hf.co/limloop/whiff-mamba2-20M-v2
WHIFF 20M
🇷🇺 Русский...
Змеиный щепот в кустах, движимый легким порывом ветра
whiff-20M — это небольшая экспериментальная языковая модель на архитектуре Mamba2 с 20.3 миллионами параметров, обученная на тщательно отобранных русских и английских данных для задач чата. Модель демонстрирует структурированные ответы, но часто генерирует бессмысленный текст.
Технические детали
- Архитектура: Mamba2ForCausalLM из 🤗 Transformers
- Параметры: 20.3M
- Языки: русский/английский (двуязычная)
- Токенизатор: (специальный мини-BPE токенизатор)
- Лицензия: Apache 2.0
Конфигурация модели
Mamba2Config(
vocab_size=8192,
hidden_size=512,
state_size=64,
num_heads=12,
num_hidden_layers=9,
conv_kernel=4,
expand=1.5,
n_groups=2
)
Использование
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("limloop/whiff-mamba2-20M")
model = AutoModelForCausalLM.from_pretrained("limloop/whiff-mamba2-20M")
def chat(messages, temp=0.5):
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(
inputs,
max_length=512,
top_k=40,
top_p=0.9,
repetition_penalty=1.1,
num_return_sequences=1,
temperature=temp,
do_sample=True,
eos_token_id=1
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Пример
dialog = [
{"role": "system", "content": "Ты — мудрый эльф."},
{"role": "user", "content": "Объясни квантовую физику."}
]
response = chat(dialog, temp=0.4)
print(response)
Данные обучения
19 927 тщательно отфильтрованных строк с диалогами:
- 154 306 (39.5%) — Английские
- 187 204 (48.0%) — Русские
- 48 528 (12.5%) — Смешанные
Источники:
limloop/characters_dialogsIlyaGusev/gpt_roleplay_realmtamohannes/llm-roleplayradce/communication_datasetdatabricks/databricks-dolly-15kch1eph/RuGeoBenchnyuuzyou/ruschatgpt-qa0x22almostEvil/ru-riddles-3770x22almostEvil/tatoeba-mt-qna-oaDen4ikAI/ru_sberquad_long_answersVikhrmodels/GrandMaster-PRO-MAXHuggingFaceH4/ultrachat_200kOpenAssistant/oasst1OpenAssistant/oasst2PJMixers/hieunguyenminh_roleplay-deduped-ShareGPTArketov/hieunguyenminh_roleplay-deduped-ShareGPT_rulimloop/logic_duolimloop/ru_en_linguistic_exchangelimloop/multi_engagement_roleplay_corpus
Все датасеты были дополнительно очищены и отфильтрованы для улучшения качества чат-взаимодействия.
Ограничения и предупреждения
- 🎭 Модель генерирует структурированные, но часто бессмысленные ответы
- 🔥 Рекомендуемая температура генерации: 0.1-0.6
- ⚠️ Может демонстрировать артефакты обучения (повторы, противоречия)
- ⚠️ Не предназначена для production-использования
Эта модель — как лесной ручей: вроде течёт куда-то, но куда именно — известно только белкам
A serpentine whisper in the bushes, carried by a gentle gust of wind
whiff-20M is a small experimental language model based on the Mamba2 architecture with 20.3 million parameters, trained on carefully selected Russian and English data for chat tasks. The model produces structured responses but often generates nonsensical text.
Technical Details
- Architecture: Mamba2ForCausalLM from 🤗 Transformers
- Parameters: 20.3M
- Languages: Russian/English (bilingual)
- Tokenizer: (custom mini-BPE tokenizer)
- License: Apache 2.0
Model Configuration
Mamba2Config(
vocab_size=8192,
hidden_size=512,
state_size=64,
num_heads=12,
num_hidden_layers=9,
conv_kernel=4,
expand=1.5,
n_groups=2
)
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("limloop/whiff-mamba2-20M")
model = AutoModelForCausalLM.from_pretrained("limloop/whiff-mamba2-20M")
def chat(messages, temp=0.5):
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(
inputs,
max_length=512,
top_k=40,
top_p=0.9,
repetition_penalty=1.1,
num_return_sequences=1,
temperature=temp,
do_sample=True,
eos_token_id=1
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example
dialog = [
{"role": "system", "content": "You are a wise elf."},
{"role": "user", "content": "Explain quantum physics."}
]
response = chat(dialog, temp=0.4)
print(response)
Training Data
19 927 carefully filtered dialogue lines:
- 154 306 (39.5%) — English
- 187 204 (48.0%) — Russian
- 48 528 (12.5%) — Mixed
Sources:
limloop/characters_dialogsIlyaGusev/gpt_roleplay_realmtamohannes/llm-roleplayradce/communication_datasetdatabricks/databricks-dolly-15kch1eph/RuGeoBenchnyuuzyou/ruschatgpt-qa0x22almostEvil/ru-riddles-3770x22almostEvil/tatoeba-mt-qna-oaDen4ikAI/ru_sberquad_long_answersVikhrmodels/GrandMaster-PRO-MAXHuggingFaceH4/ultrachat_200kOpenAssistant/oasst1OpenAssistant/oasst2PJMixers/hieunguyenminh_roleplay-deduped-ShareGPTArketov/hieunguyenminh_roleplay-deduped-ShareGPT_rulimloop/logic_duolimloop/ru_en_linguistic_exchangelimloop/multi_engagement_roleplay_corpus
All datasets were additionally cleaned and filtered to improve chat interaction quality.
Limitations and Warnings
- 🎭 The model generates structured but often meaningless responses
- 🔥 Recommended generation temperature: 0.1-0.6
- ⚠️ May exhibit training artifacts (repetitions, contradictions)
- ⚠️ Not intended for production use
This model is like a forest stream: it seems to flow somewhere, but where exactly - only the squirrels know
- Downloads last month
- 11
Model tree for limloop/whiff-mamba2-20M-v2
Base model
limloop/whiff-mamba2-20M