AMFORGE
/

samg

 ---
 license: bsl-1.0
+language:
+- en
+- fr
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- structured-generation
+- function-calling
+- tool-use
+- json
+- edge
+- offline
+- robotics
+- iot
+- agentic
+- small-language-model
+model-index:
+- name: SAM-G
+  results:
+  - task:
+      type: structured-action-generation
+      name: Instruction-to-JSON (10 domains, zero-shot)
+    metrics:
+    - type: json_valid
+      value: 100
+      name: Valid JSON (%)
+    - type: exact_match
+      value: 76
+      name: Exact match (%)
+    - type: exact_match_fr
+      value: 77
+      name: Exact match, French (%)
+  - task:
+      type: text-generation
+      name: Language modeling (FineWeb-Edu held-out)
+    metrics:
+    - type: bits_per_byte
+      value: 1.179
+      name: Bits per byte
 ---
+# SAM-G
+**SAM-G** is a 30.3M-parameter dual-mode language model for **offline structured
+action generation**. Given a natural-language instruction it emits compact,
+schema-valid JSON for ten domains; given a question it emits free text. Mode
+selection is learned, not prompted. Built by **AMEFORGE** for robotics, IoT and
+embedded deployment where hosted-LLM APIs are too costly, too slow, or
+unavailable.
+- **Parameters:** 30.3M · **Footprint:** 121 MB fp32 (~30 MB int8)
+- **Context:** 1024 tokens · **Languages:** English, French (actions)
+- **Throughput:** ~235 tok/s, 16 ms first-token (single GPU); runs on a
+  Raspberry-Pi-class CPU
+- **Released:** model weights + inference tokenizer. Training pipeline, data
+  generators and architecture are proprietary.
+## Two modes
+| Input | Model emits |
+|---|---|
+| `turn on the kitchen lamp` | `[ACTION] {"domain":"home","op":"set_state","params":{"device":"lamp","name":"kitchen","state":"on"}}` |
+| `what is a mutex` | `[CHAT] A mutex is a lock that allows one thread at a time.` |
+Domains: `ros`, `http`, `mqtt`, `db`, `workflow`, `ecommerce`, `vehicle`,
+`home`, `cal`, `file`.
+## Benchmark
+SAM-G is evaluated **zero-shot** in its native format; baselines run **3-shot**
+through their chat template with a system instruction. `bpb` is tokenizer-fair
+(per-token perplexity is not comparable across vocabularies). `exact/M` =
+action exact-match per million parameters — the efficiency axis.
+| Model | Params | bpb ↓ | JSON valid % | Exact % | Exact FR % | Cloze % | MB | tok/s | exact/M ↑ |
+|---|---|---|---|---|---|---|---|---|---|
+| **SAM-G** | **30.3M** | 1.179 | **100** | **76** | **77** | 83 | **121** | **235** | **2.51** |
+| Pythia-70M | 70M | 1.674 | 2 | 0 | 0 | 75 | 141 | 120 | 0.00 |
+| Qwen2.5-0.5B-Instruct | 494M | 0.814 | 99 | 25 | 7 | 96 | 988 | 27 | 0.05 |
+| SmolLM2-360M-Instruct | 362M | 0.812 | 96 | 14 | 0 | 96 | 724 | 21 | 0.04 |
+| Qwen2.5-1.5B-Instruct | 889M | 0.753 | 98 | 21 | 0 | 96 | 444* | 13 | 0.02 |
+<sub>*Qwen2.5-1.5B loaded in 4-bit. Larger general models lead on bits-per-byte
+and cloze (they are 12–30× bigger and trained for general knowledge); SAM-G
+leads decisively on structured action, French actions, footprint, speed, and
+exact-match per parameter. Notably Qwen2.5-1.5B scores *below* Qwen2.5-0.5B on
+action exact-match — capability here comes from domain specialization, not
+scale.</sub>
+## Per-domain exact match (%)
+| ros | http | mqtt | db | workflow | ecommerce | vehicle | home | cal | file |
+|---|---|---|---|---|---|---|---|---|---|
+| 0 | 100 | 100 | 100 | 60 | 100 | 100 | 50 | 80 | 60 |
+All general baselines score 0 on most domains, succeeding only partially on the
+most generic ones (home, cal). `ros` (floating-point fields) is SAM-G's weakest
+schema and benefits most from additional training data.
+## Usage
+```python
+import sentencepiece as spm, torch
+# Load the released inference tokenizer (samg_tokenizer.model) and weights.
+sp = spm.SentencePieceProcessor(); sp.Load("samg_tokenizer.model")
+prompt = "publish 21.5 on sensors/temp qos 1 [ACTION]"
+ids = torch.tensor([sp.EncodeAsIds(prompt)])
+# greedy-decode with your loaded model until EOS, then sp.DecodeIds(...)
+# -> {"domain":"mqtt","op":"publish","params":{"topic":"sensors/temp","payload":21.5,"qos":1}}
+```
+Always parse output as JSON and validate against your schema before execution.
+## Intended use
+On-device home automation; NL→ROS robot command layers; MQTT fleet gateways;
+offline vehicle commands; NL-to-SQL on embedded databases; workflow triggers;
+and the structured tool-calling stage of agentic pipelines — as a drop-in
+replacement or a fast router ahead of a larger hosted model.
+## Limitations
+- Not a general assistant: factual knowledge and open-ended reasoning are
+  limited at this scale; larger general models lead on bits-per-byte and cloze.
+- French covers actions, not extended prose.
+- Schemas outside the ten domains need fine-tuning. The `ros` schema
+  (floating-point fields) is the weakest and benefits most from more data.
+- The action benchmark is synthetic, drawn from the training distribution
+  family with a disjoint evaluation seed (999).
+## Citation
+```bibtex
+@misc{samg2026,
+  title  = {SAM-G: A 30M-Parameter Dual-Mode Language Model for Offline Structured Action Generation},
+  author = {AMEFORGE Lab},
+  year   = {2026}
+}
+```