--- title: SmolLM2 360M Instruct emoji: 🏃 colorFrom: yellow colorTo: green sdk: gradio sdk_version: 6.9.0 app_file: app.py pinned: false license: apache-2.0 short_description: 'SmolLM2-360M-Instruct ' --- # SmolLM2 360M Instruct Demo This Space demonstrates the SmolLM2-360M-Instruct model with a CPU fallback mechanism. It is designed to run efficiently even on the Hugging Face Free Tier (2 vCPUs). ## Overview A minimal but production-ready LLM service built on: * **Model:** SmolLM2-360M-Instruct (approx. 269MB, Apache 2.0). * **Efficiency:** Optimized to run on 2 CPUs and minimum 2 GB RAM (HF tier supports up to 16 GB). * **Scalability:** Perfect for local training and testing. ## Related Project: SmolLM2-customs If you are interested in training small LLMs the lazy way, check out: [https://github.com/VolkanSah/SmolLM2-customs](https://github.com/VolkanSah/SmolLM2-customs) **Features of the custom implementation:** * **FastAPI:** OpenAI-compatible `/v1/chat/completions` endpoint. * **ADI (Anti-Dump Index):** Filters low-quality requests before they hit the model. * **HF Dataset Integration:** Logs every request for later analysis and finetuning. --- ## Deployment & Usage You do not need an API key for this public demo, but rate limits apply. ### How to run your own instance: 1. **Duplicate/Clone** this Space. 2. **Environment Variables:** To use your own model access or private weights, add one of the following keys to your **Secrets**: * `HF_TOKEN` * `TEST_TOKEN` * `HUGGINGFACE_TOKEN` * `HF_API_TOKEN` The code uses a flexible token resolution logic to ensure compatibility with older or custom keys. ## Technical Details The inference pipeline uses `transformers` with `torch`. It automatically detects if a GPU is available; otherwise, it falls back to CPU execution without breaking the Gradio interface.