Alibrown's picture
Update README.md
e872955 verified
metadata
title: SmolLM2 360M Instruct
emoji: 🏃
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: 'SmolLM2-360M-Instruct '

SmolLM2 360M Instruct Demo

This Space demonstrates the SmolLM2-360M-Instruct model with a CPU fallback mechanism. It is designed to run efficiently even on the Hugging Face Free Tier (2 vCPUs).

Overview

A minimal but production-ready LLM service built on:

  • Model: SmolLM2-360M-Instruct (approx. 269MB, Apache 2.0).
  • Efficiency: Optimized to run on 2 CPUs and minimum 2 GB RAM (HF tier supports up to 16 GB).
  • Scalability: Perfect for local training and testing.

Related Project: SmolLM2-customs

If you are interested in training small LLMs the lazy way, check out: https://github.com/VolkanSah/SmolLM2-customs

Features of the custom implementation:

  • FastAPI: OpenAI-compatible /v1/chat/completions endpoint.
  • ADI (Anti-Dump Index): Filters low-quality requests before they hit the model.
  • HF Dataset Integration: Logs every request for later analysis and finetuning.

Deployment & Usage

You do not need an API key for this public demo, but rate limits apply.

How to run your own instance:

  1. Duplicate/Clone this Space.
  2. Environment Variables: To use your own model access or private weights, add one of the following keys to your Secrets:
    • HF_TOKEN
    • TEST_TOKEN
    • HUGGINGFACE_TOKEN
    • HF_API_TOKEN

The code uses a flexible token resolution logic to ensure compatibility with older or custom keys.

Technical Details

The inference pipeline uses transformers with torch. It automatically detects if a GPU is available; otherwise, it falls back to CPU execution without breaking the Gradio interface.