Instructions to use yasserrmd/lfm2.5-1.5b-sdft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yasserrmd/lfm2.5-1.5b-sdft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="yasserrmd/lfm2.5-1.5b-sdft") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("yasserrmd/lfm2.5-1.5b-sdft") model = AutoModelForCausalLM.from_pretrained("yasserrmd/lfm2.5-1.5b-sdft") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use yasserrmd/lfm2.5-1.5b-sdft with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "yasserrmd/lfm2.5-1.5b-sdft" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yasserrmd/lfm2.5-1.5b-sdft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/yasserrmd/lfm2.5-1.5b-sdft
- SGLang
How to use yasserrmd/lfm2.5-1.5b-sdft with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "yasserrmd/lfm2.5-1.5b-sdft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yasserrmd/lfm2.5-1.5b-sdft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "yasserrmd/lfm2.5-1.5b-sdft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yasserrmd/lfm2.5-1.5b-sdft", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use yasserrmd/lfm2.5-1.5b-sdft with Docker Model Runner:
docker model run hf.co/yasserrmd/lfm2.5-1.5b-sdft
LFM2.5-1.2B-SDFT: Self-Distillation Fine-Tuned Model
This model is a Self-Distillation Fine-Tuned (SDFT) version of LiquidAI/LFM2.5-1.2B-Instruct, trained using the methodology from the paper "Self-Distillation Enables Continual Learning".
Model Description
- Base Model: LiquidAI/LFM2.5-1.2B-Instruct
- Training Method: Self-Distillation Fine-Tuning (SDFT)
- Training Data: ~5K samples from OpenAssistant dataset
- Training Hardware: Single NVIDIA A100 GPU
- Parameters: LoRA rank=8, alpha=16, targeting q_proj and v_proj
What is SDFT?
Self-Distillation Fine-Tuning (SDFT) is a continual learning technique that:
- Uses the model's in-context learning ability to create a demonstration-aware teacher
- Generates training data on-policy from the student model
- Minimizes KL divergence between student and demonstration-conditioned teacher
- Enables learning new tasks while reducing catastrophic forgetting
Key advantages:
- โ Learns from demonstrations without explicit reward functions
- โ Maintains prior knowledge while acquiring new skills
- โ On-policy learning improves generalization
- โ Efficient training with EMA teacher updates
Quick Start
Installation
pip install torch transformers peft accelerate bitsandbytes
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-1.2B-Instruct")
# Load model with quantization
base_model = AutoModelForCausalLM.from_pretrained(
"yasserrmd/lfm2.5-1.5b-sdft",
torch_dtype=torch.float16,
device_map="auto"
)
model.eval()
# Generate
prompt = """<|im_start|>user
Explain how photosynthesis works.
<|im_end|>
<|im_start|>assistant
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Use official LiquidAI parameters
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.1,
top_k=50,
top_p=0.1,
repetition_penalty=1.05,
pad_token_id=tokenizer.pad_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
With Demonstration (In-Context Learning)
prompt = """<|im_start|>user
Explain how databases work.
Here is an example response to guide you:
Example: Databases store data in tables. You can query them to get information back.
Now provide your own response following a similar approach:
<|im_end|>
<|im_start|>assistant
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.1,
top_k=50,
top_p=0.1,
repetition_penalty=1.05
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Details
Dataset
- Source: OpenAssistant conversations
- Size: ~5,000 query-demonstration pairs
- Preprocessing:
- Filtered demonstrations: 20-2048 characters
- Train/Val/Test split: 75%/10%/15%
Training Configuration
# Model Architecture
- Base: LiquidAI/LFM2.5-1.2B-Instruct
- Quantization: 8-bit with bitsandbytes
- LoRA: rank=8, alpha=16, dropout=0.05
- Target modules: q_proj, v_proj
# Training Parameters
- Learning rate: 5e-6
- Optimizer: AdamW (weight_decay=0.01)
- Batch size: 1 (with gradient accumulation)
- Gradient accumulation steps: 16
- Epochs: 3
- Max sequence length: 512
- Max generation length: 128
# SDFT-Specific
- EMA alpha: 0.02
- Temperature: 1.0
- KL divergence: Analytic (full vocabulary)
- On-policy generation: Yes
Prompt Format (Teacher vs Student)
Student Prompt (query only):
<|im_start|>user
{query}
<|im_end|>
<|im_start|>assistant
Teacher Prompt (query + demonstration):
<|im_start|>user
{query}
Here is an example response to guide you:
<|im_start|>assistant
{demonstration}
<|im_end|>
<|im_start|>user
Now provide your own response following a similar approach and reasoning:
<|im_end|>
<|im_start|>assistant
Evaluation Results
Tested on Multiple Dimensions:
| Category | Description | Performance |
|---|---|---|
| ICL Adaptation | Following demonstration style | โ Good |
| Task Improvement | Learning from examples | โ Good |
| Retention | No catastrophic forgetting | โ ~80% |
| Polarity Control | Following demo viewpoint | โ ๏ธ Moderate |
Key Findings:
- โ Maintains Knowledge: No significant forgetting on general tasks
- โ Adapts to Demos: Successfully follows demonstration styles
- โ Improved Over Training: Epoch 3 shows stable, coherent outputs
- โ ๏ธ Model Size Limitation: 1.2B parameters limits complex reasoning
Comparison to Base Model:
- With Demonstrations: SDFT shows better style matching and task following
- Without Demonstrations: Maintains base model capabilities
- Response Quality: More consistent and focused outputs
Generation Parameters
โ ๏ธ Important: Use official LiquidAI parameters for best results:
generation_config = {
"max_new_tokens": 256,
"do_sample": True,
"temperature": 0.1, # Official LiquidAI recommendation
"top_k": 50, # Official LiquidAI recommendation
"top_p": 0.1, # Official LiquidAI recommendation
"repetition_penalty": 1.05 # Official LiquidAI recommendation
}
These parameters are specifically tuned for LFM2.5 and provide:
- Focused, factual responses
- Minimal hallucinations
- Consistent output quality
Limitations
Model Constraints:
- Size: 1.2B parameters (smaller capacity than 7B+ models)
- Training Data: 5K samples (vs paper's 20K+)
- Hardware: Single A100 (vs paper's multi-GPU setup)
- Complexity: Limited reasoning on very complex tasks
Known Issues:
- May require proper ChatML formatting for best results
- Performance degrades on tasks requiring deep technical knowledge
- Smaller model size limits polarity control effectiveness
Appropriate Use Cases:
- โ Conversational AI with example-guided responses
- โ Task learning from demonstrations
- โ Style-adaptive text generation
- โ Educational/research purposes
Not Recommended For:
- โ Production systems requiring 100% reliability
- โ Tasks requiring strong reasoning (use 7B+ models)
- โ Safety-critical applications
- โ Tasks outside training distribution without demonstrations
Bias and Ethical Considerations
- Inherits biases from base LFM2.5 model and OpenAssistant dataset
- May generate inconsistent responses on controversial topics
- Should not be used for medical, legal, or financial advice
- Outputs should be reviewed by humans for critical applications
Citation
If you use this model, please cite:
SDFT Paper:
@article{shenfeld2026sdft,
title={Self-Distillation Enables Continual Learning},
author={Shenfeld, Idan and Damani, Mehul and H{\"u}botter, Jonas and Agrawal, Pulkit},
journal={arXiv preprint arXiv:2601.19897},
year={2026}
}
Base Model:
@misc{lfm25,
title={LFM2.5: Liquid Foundation Models},
author={LiquidAI},
year={2024},
url={https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct}
}
Acknowledgments
- Paper: "Self-Distillation Enables Continual Learning" by Shenfeld et al.
- Base Model: LiquidAI/LFM2.5-1.2B-Instruct
- Dataset: OpenAssistant conversations
- Framework: HuggingFace Transformers, PEFT, bitsandbytes
License
This model is released under the Apache 2.0 license, following the base model's licensing.
Model Card Authors
[Your Name/Organization]
Contact
For questions or issues, please open an issue on the model repository.
Additional Resources
- ๐ SDFT Paper
- ๐ป Training Code
- ๐ค Base Model
- ๐ Evaluation Results
Version History
- v1.0 (2024-XX-XX): Initial release
- Trained on 5K OpenAssistant samples
- 3 epochs with gradient accumulation
- LoRA rank 8, alpha 16
- Downloads last month
- 122
Model tree for yasserrmd/lfm2.5-1.5b-sdft
Base model
LiquidAI/LFM2.5-1.2B-Base