Text Generation
Transformers
English
French
structured-generation
function-calling
tool-use
json
edge
offline
robotics
iot
agentic
small-language-model
Eval Results (legacy)
Instructions to use AMFORGE/samg with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AMFORGE/samg with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AMFORGE/samg")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AMFORGE/samg", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AMFORGE/samg with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AMFORGE/samg" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AMFORGE/samg
- SGLang
How to use AMFORGE/samg with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AMFORGE/samg" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AMFORGE/samg" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AMFORGE/samg with Docker Model Runner:
docker model run hf.co/AMFORGE/samg
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,142 @@
|
|
| 1 |
---
|
| 2 |
license: bsl-1.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: bsl-1.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- fr
|
| 6 |
+
library_name: transformers
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
tags:
|
| 9 |
+
- structured-generation
|
| 10 |
+
- function-calling
|
| 11 |
+
- tool-use
|
| 12 |
+
- json
|
| 13 |
+
- edge
|
| 14 |
+
- offline
|
| 15 |
+
- robotics
|
| 16 |
+
- iot
|
| 17 |
+
- agentic
|
| 18 |
+
- small-language-model
|
| 19 |
+
model-index:
|
| 20 |
+
- name: SAM-G
|
| 21 |
+
results:
|
| 22 |
+
- task:
|
| 23 |
+
type: structured-action-generation
|
| 24 |
+
name: Instruction-to-JSON (10 domains, zero-shot)
|
| 25 |
+
metrics:
|
| 26 |
+
- type: json_valid
|
| 27 |
+
value: 100
|
| 28 |
+
name: Valid JSON (%)
|
| 29 |
+
- type: exact_match
|
| 30 |
+
value: 76
|
| 31 |
+
name: Exact match (%)
|
| 32 |
+
- type: exact_match_fr
|
| 33 |
+
value: 77
|
| 34 |
+
name: Exact match, French (%)
|
| 35 |
+
- task:
|
| 36 |
+
type: text-generation
|
| 37 |
+
name: Language modeling (FineWeb-Edu held-out)
|
| 38 |
+
metrics:
|
| 39 |
+
- type: bits_per_byte
|
| 40 |
+
value: 1.179
|
| 41 |
+
name: Bits per byte
|
| 42 |
---
|
| 43 |
+
|
| 44 |
+
# SAM-G
|
| 45 |
+
|
| 46 |
+
**SAM-G** is a 30.3M-parameter dual-mode language model for **offline structured
|
| 47 |
+
action generation**. Given a natural-language instruction it emits compact,
|
| 48 |
+
schema-valid JSON for ten domains; given a question it emits free text. Mode
|
| 49 |
+
selection is learned, not prompted. Built by **AMEFORGE** for robotics, IoT and
|
| 50 |
+
embedded deployment where hosted-LLM APIs are too costly, too slow, or
|
| 51 |
+
unavailable.
|
| 52 |
+
|
| 53 |
+
- **Parameters:** 30.3M · **Footprint:** 121 MB fp32 (~30 MB int8)
|
| 54 |
+
- **Context:** 1024 tokens · **Languages:** English, French (actions)
|
| 55 |
+
- **Throughput:** ~235 tok/s, 16 ms first-token (single GPU); runs on a
|
| 56 |
+
Raspberry-Pi-class CPU
|
| 57 |
+
- **Released:** model weights + inference tokenizer. Training pipeline, data
|
| 58 |
+
generators and architecture are proprietary.
|
| 59 |
+
|
| 60 |
+
## Two modes
|
| 61 |
+
|
| 62 |
+
| Input | Model emits |
|
| 63 |
+
|---|---|
|
| 64 |
+
| `turn on the kitchen lamp` | `[ACTION] {"domain":"home","op":"set_state","params":{"device":"lamp","name":"kitchen","state":"on"}}` |
|
| 65 |
+
| `what is a mutex` | `[CHAT] A mutex is a lock that allows one thread at a time.` |
|
| 66 |
+
|
| 67 |
+
Domains: `ros`, `http`, `mqtt`, `db`, `workflow`, `ecommerce`, `vehicle`,
|
| 68 |
+
`home`, `cal`, `file`.
|
| 69 |
+
|
| 70 |
+
## Benchmark
|
| 71 |
+
|
| 72 |
+
SAM-G is evaluated **zero-shot** in its native format; baselines run **3-shot**
|
| 73 |
+
through their chat template with a system instruction. `bpb` is tokenizer-fair
|
| 74 |
+
(per-token perplexity is not comparable across vocabularies). `exact/M` =
|
| 75 |
+
action exact-match per million parameters — the efficiency axis.
|
| 76 |
+
|
| 77 |
+
| Model | Params | bpb ↓ | JSON valid % | Exact % | Exact FR % | Cloze % | MB | tok/s | exact/M ↑ |
|
| 78 |
+
|---|---|---|---|---|---|---|---|---|---|
|
| 79 |
+
| **SAM-G** | **30.3M** | 1.179 | **100** | **76** | **77** | 83 | **121** | **235** | **2.51** |
|
| 80 |
+
| Pythia-70M | 70M | 1.674 | 2 | 0 | 0 | 75 | 141 | 120 | 0.00 |
|
| 81 |
+
| Qwen2.5-0.5B-Instruct | 494M | 0.814 | 99 | 25 | 7 | 96 | 988 | 27 | 0.05 |
|
| 82 |
+
| SmolLM2-360M-Instruct | 362M | 0.812 | 96 | 14 | 0 | 96 | 724 | 21 | 0.04 |
|
| 83 |
+
| Qwen2.5-1.5B-Instruct | 889M | 0.753 | 98 | 21 | 0 | 96 | 444* | 13 | 0.02 |
|
| 84 |
+
|
| 85 |
+
<sub>*Qwen2.5-1.5B loaded in 4-bit. Larger general models lead on bits-per-byte
|
| 86 |
+
and cloze (they are 12–30× bigger and trained for general knowledge); SAM-G
|
| 87 |
+
leads decisively on structured action, French actions, footprint, speed, and
|
| 88 |
+
exact-match per parameter. Notably Qwen2.5-1.5B scores *below* Qwen2.5-0.5B on
|
| 89 |
+
action exact-match — capability here comes from domain specialization, not
|
| 90 |
+
scale.</sub>
|
| 91 |
+
|
| 92 |
+
## Per-domain exact match (%)
|
| 93 |
+
|
| 94 |
+
| ros | http | mqtt | db | workflow | ecommerce | vehicle | home | cal | file |
|
| 95 |
+
|---|---|---|---|---|---|---|---|---|---|
|
| 96 |
+
| 0 | 100 | 100 | 100 | 60 | 100 | 100 | 50 | 80 | 60 |
|
| 97 |
+
|
| 98 |
+
All general baselines score 0 on most domains, succeeding only partially on the
|
| 99 |
+
most generic ones (home, cal). `ros` (floating-point fields) is SAM-G's weakest
|
| 100 |
+
schema and benefits most from additional training data.
|
| 101 |
+
|
| 102 |
+
## Usage
|
| 103 |
+
|
| 104 |
+
```python
|
| 105 |
+
import sentencepiece as spm, torch
|
| 106 |
+
# Load the released inference tokenizer (samg_tokenizer.model) and weights.
|
| 107 |
+
sp = spm.SentencePieceProcessor(); sp.Load("samg_tokenizer.model")
|
| 108 |
+
|
| 109 |
+
prompt = "publish 21.5 on sensors/temp qos 1 [ACTION]"
|
| 110 |
+
ids = torch.tensor([sp.EncodeAsIds(prompt)])
|
| 111 |
+
# greedy-decode with your loaded model until EOS, then sp.DecodeIds(...)
|
| 112 |
+
# -> {"domain":"mqtt","op":"publish","params":{"topic":"sensors/temp","payload":21.5,"qos":1}}
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
Always parse output as JSON and validate against your schema before execution.
|
| 116 |
+
|
| 117 |
+
## Intended use
|
| 118 |
+
|
| 119 |
+
On-device home automation; NL→ROS robot command layers; MQTT fleet gateways;
|
| 120 |
+
offline vehicle commands; NL-to-SQL on embedded databases; workflow triggers;
|
| 121 |
+
and the structured tool-calling stage of agentic pipelines — as a drop-in
|
| 122 |
+
replacement or a fast router ahead of a larger hosted model.
|
| 123 |
+
|
| 124 |
+
## Limitations
|
| 125 |
+
|
| 126 |
+
- Not a general assistant: factual knowledge and open-ended reasoning are
|
| 127 |
+
limited at this scale; larger general models lead on bits-per-byte and cloze.
|
| 128 |
+
- French covers actions, not extended prose.
|
| 129 |
+
- Schemas outside the ten domains need fine-tuning. The `ros` schema
|
| 130 |
+
(floating-point fields) is the weakest and benefits most from more data.
|
| 131 |
+
- The action benchmark is synthetic, drawn from the training distribution
|
| 132 |
+
family with a disjoint evaluation seed (999).
|
| 133 |
+
|
| 134 |
+
## Citation
|
| 135 |
+
|
| 136 |
+
```bibtex
|
| 137 |
+
@misc{samg2026,
|
| 138 |
+
title = {SAM-G: A 30M-Parameter Dual-Mode Language Model for Offline Structured Action Generation},
|
| 139 |
+
author = {AMEFORGE Lab},
|
| 140 |
+
year = {2026}
|
| 141 |
+
}
|
| 142 |
+
```
|