Spaces:

codey-lab
/

SmolLM2-360M-Instruct

Running

App Files Files Community

SmolLM2-360M-Instruct / README.md

Alibrown

Update README.md

e872955 verified 4 days ago

preview code

raw

history blame contribute delete

1.86 kB

metadata

title: SmolLM2 360M Instruct
emoji: 🏃
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: 'SmolLM2-360M-Instruct '

SmolLM2 360M Instruct Demo

This Space demonstrates the SmolLM2-360M-Instruct model with a CPU fallback mechanism. It is designed to run efficiently even on the Hugging Face Free Tier (2 vCPUs).

Overview

A minimal but production-ready LLM service built on:

Model: SmolLM2-360M-Instruct (approx. 269MB, Apache 2.0).
Efficiency: Optimized to run on 2 CPUs and minimum 2 GB RAM (HF tier supports up to 16 GB).
Scalability: Perfect for local training and testing.

Related Project: SmolLM2-customs

If you are interested in training small LLMs the lazy way, check out: https://github.com/VolkanSah/SmolLM2-customs

Features of the custom implementation:

FastAPI: OpenAI-compatible /v1/chat/completions endpoint.
ADI (Anti-Dump Index): Filters low-quality requests before they hit the model.
HF Dataset Integration: Logs every request for later analysis and finetuning.

Deployment & Usage

You do not need an API key for this public demo, but rate limits apply.

How to run your own instance:

Duplicate/Clone this Space.
Environment Variables: To use your own model access or private weights, add one of the following keys to your Secrets:
- HF_TOKEN
- TEST_TOKEN
- HUGGINGFACE_TOKEN
- HF_API_TOKEN

The code uses a flexible token resolution logic to ensure compatibility with older or custom keys.

Technical Details

The inference pipeline uses transformers with torch. It automatically detects if a GPU is available; otherwise, it falls back to CPU execution without breaking the Gradio interface.