Spaces:
Running
Running
metadata
title: SmolLM2 360M Instruct
emoji: 🏃
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: 'SmolLM2-360M-Instruct '
SmolLM2 360M Instruct Demo
This Space demonstrates the SmolLM2-360M-Instruct model with a CPU fallback mechanism. It is designed to run efficiently even on the Hugging Face Free Tier (2 vCPUs).
Overview
A minimal but production-ready LLM service built on:
- Model: SmolLM2-360M-Instruct (approx. 269MB, Apache 2.0).
- Efficiency: Optimized to run on 2 CPUs and minimum 2 GB RAM (HF tier supports up to 16 GB).
- Scalability: Perfect for local training and testing.
Related Project: SmolLM2-customs
If you are interested in training small LLMs the lazy way, check out: https://github.com/VolkanSah/SmolLM2-customs
Features of the custom implementation:
- FastAPI: OpenAI-compatible
/v1/chat/completionsendpoint. - ADI (Anti-Dump Index): Filters low-quality requests before they hit the model.
- HF Dataset Integration: Logs every request for later analysis and finetuning.
Deployment & Usage
You do not need an API key for this public demo, but rate limits apply.
How to run your own instance:
- Duplicate/Clone this Space.
- Environment Variables: To use your own model access or private weights, add one of the following keys to your Secrets:
HF_TOKENTEST_TOKENHUGGINGFACE_TOKENHF_API_TOKEN
The code uses a flexible token resolution logic to ensure compatibility with older or custom keys.
Technical Details
The inference pipeline uses transformers with torch. It automatically detects if a GPU is available; otherwise, it falls back to CPU execution without breaking the Gradio interface.