Alibrown's picture
Update README.md
e872955 verified
---
title: SmolLM2 360M Instruct
emoji: ๐Ÿƒ
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: 'SmolLM2-360M-Instruct '
---
# SmolLM2 360M Instruct Demo
This Space demonstrates the SmolLM2-360M-Instruct model with a CPU fallback mechanism. It is designed to run efficiently even on the Hugging Face Free Tier (2 vCPUs).
## Overview
A minimal but production-ready LLM service built on:
* **Model:** SmolLM2-360M-Instruct (approx. 269MB, Apache 2.0).
* **Efficiency:** Optimized to run on 2 CPUs and minimum 2 GB RAM (HF tier supports up to 16 GB).
* **Scalability:** Perfect for local training and testing.
## Related Project: SmolLM2-customs
If you are interested in training small LLMs the lazy way, check out:
[https://github.com/VolkanSah/SmolLM2-customs](https://github.com/VolkanSah/SmolLM2-customs)
**Features of the custom implementation:**
* **FastAPI:** OpenAI-compatible `/v1/chat/completions` endpoint.
* **ADI (Anti-Dump Index):** Filters low-quality requests before they hit the model.
* **HF Dataset Integration:** Logs every request for later analysis and finetuning.
---
## Deployment & Usage
You do not need an API key for this public demo, but rate limits apply.
### How to run your own instance:
1. **Duplicate/Clone** this Space.
2. **Environment Variables:** To use your own model access or private weights, add one of the following keys to your **Secrets**:
* `HF_TOKEN`
* `TEST_TOKEN`
* `HUGGINGFACE_TOKEN`
* `HF_API_TOKEN`
The code uses a flexible token resolution logic to ensure compatibility with older or custom keys.
## Technical Details
The inference pipeline uses `transformers` with `torch`. It automatically detects if a GPU is available; otherwise, it falls back to CPU execution without breaking the Gradio interface.