Spaces:

codey-lab
/

SmolLM2-360M-Instruct

Running

App Files Files Community

SmolLM2-360M-Instruct / README.md

Alibrown

Update README.md

e872955 verified 4 days ago

preview code

raw

history blame contribute delete

1.86 kB

	---
	title: SmolLM2 360M Instruct
	emoji: 🏃
	colorFrom: yellow
	colorTo: green
	sdk: gradio
	sdk_version: 6.9.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: 'SmolLM2-360M-Instruct '
	---
	# SmolLM2 360M Instruct Demo

	This Space demonstrates the SmolLM2-360M-Instruct model with a CPU fallback mechanism. It is designed to run efficiently even on the Hugging Face Free Tier (2 vCPUs).

	## Overview

	A minimal but production-ready LLM service built on:
	* Model: SmolLM2-360M-Instruct (approx. 269MB, Apache 2.0).
	* Efficiency: Optimized to run on 2 CPUs and minimum 2 GB RAM (HF tier supports up to 16 GB).
	* Scalability: Perfect for local training and testing.

	## Related Project: SmolLM2-customs

	If you are interested in training small LLMs the lazy way, check out:
	[https://github.com/VolkanSah/SmolLM2-customs](https://github.com/VolkanSah/SmolLM2-customs)

	Features of the custom implementation:
	* FastAPI: OpenAI-compatible `/v1/chat/completions` endpoint.
	* ADI (Anti-Dump Index): Filters low-quality requests before they hit the model.
	* HF Dataset Integration: Logs every request for later analysis and finetuning.

	---

	## Deployment & Usage

	You do not need an API key for this public demo, but rate limits apply.

	### How to run your own instance:
	1. Duplicate/Clone this Space.
	2. Environment Variables: To use your own model access or private weights, add one of the following keys to your Secrets:
	* `HF_TOKEN`
	* `TEST_TOKEN`
	* `HUGGINGFACE_TOKEN`
	* `HF_API_TOKEN`

	The code uses a flexible token resolution logic to ensure compatibility with older or custom keys.

	## Technical Details

	The inference pipeline uses `transformers` with `torch`. It automatically detects if a GPU is available; otherwise, it falls back to CPU execution without breaking the Gradio interface.