ameforge commited on
Commit
1f22aa4
·
verified ·
1 Parent(s): 1384bbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +139 -0
README.md CHANGED
@@ -1,3 +1,142 @@
1
  ---
2
  license: bsl-1.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: bsl-1.0
3
+ language:
4
+ - en
5
+ - fr
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - structured-generation
10
+ - function-calling
11
+ - tool-use
12
+ - json
13
+ - edge
14
+ - offline
15
+ - robotics
16
+ - iot
17
+ - agentic
18
+ - small-language-model
19
+ model-index:
20
+ - name: SAM-G
21
+ results:
22
+ - task:
23
+ type: structured-action-generation
24
+ name: Instruction-to-JSON (10 domains, zero-shot)
25
+ metrics:
26
+ - type: json_valid
27
+ value: 100
28
+ name: Valid JSON (%)
29
+ - type: exact_match
30
+ value: 76
31
+ name: Exact match (%)
32
+ - type: exact_match_fr
33
+ value: 77
34
+ name: Exact match, French (%)
35
+ - task:
36
+ type: text-generation
37
+ name: Language modeling (FineWeb-Edu held-out)
38
+ metrics:
39
+ - type: bits_per_byte
40
+ value: 1.179
41
+ name: Bits per byte
42
  ---
43
+
44
+ # SAM-G
45
+
46
+ **SAM-G** is a 30.3M-parameter dual-mode language model for **offline structured
47
+ action generation**. Given a natural-language instruction it emits compact,
48
+ schema-valid JSON for ten domains; given a question it emits free text. Mode
49
+ selection is learned, not prompted. Built by **AMEFORGE** for robotics, IoT and
50
+ embedded deployment where hosted-LLM APIs are too costly, too slow, or
51
+ unavailable.
52
+
53
+ - **Parameters:** 30.3M · **Footprint:** 121 MB fp32 (~30 MB int8)
54
+ - **Context:** 1024 tokens · **Languages:** English, French (actions)
55
+ - **Throughput:** ~235 tok/s, 16 ms first-token (single GPU); runs on a
56
+ Raspberry-Pi-class CPU
57
+ - **Released:** model weights + inference tokenizer. Training pipeline, data
58
+ generators and architecture are proprietary.
59
+
60
+ ## Two modes
61
+
62
+ | Input | Model emits |
63
+ |---|---|
64
+ | `turn on the kitchen lamp` | `[ACTION] {"domain":"home","op":"set_state","params":{"device":"lamp","name":"kitchen","state":"on"}}` |
65
+ | `what is a mutex` | `[CHAT] A mutex is a lock that allows one thread at a time.` |
66
+
67
+ Domains: `ros`, `http`, `mqtt`, `db`, `workflow`, `ecommerce`, `vehicle`,
68
+ `home`, `cal`, `file`.
69
+
70
+ ## Benchmark
71
+
72
+ SAM-G is evaluated **zero-shot** in its native format; baselines run **3-shot**
73
+ through their chat template with a system instruction. `bpb` is tokenizer-fair
74
+ (per-token perplexity is not comparable across vocabularies). `exact/M` =
75
+ action exact-match per million parameters — the efficiency axis.
76
+
77
+ | Model | Params | bpb ↓ | JSON valid % | Exact % | Exact FR % | Cloze % | MB | tok/s | exact/M ↑ |
78
+ |---|---|---|---|---|---|---|---|---|---|
79
+ | **SAM-G** | **30.3M** | 1.179 | **100** | **76** | **77** | 83 | **121** | **235** | **2.51** |
80
+ | Pythia-70M | 70M | 1.674 | 2 | 0 | 0 | 75 | 141 | 120 | 0.00 |
81
+ | Qwen2.5-0.5B-Instruct | 494M | 0.814 | 99 | 25 | 7 | 96 | 988 | 27 | 0.05 |
82
+ | SmolLM2-360M-Instruct | 362M | 0.812 | 96 | 14 | 0 | 96 | 724 | 21 | 0.04 |
83
+ | Qwen2.5-1.5B-Instruct | 889M | 0.753 | 98 | 21 | 0 | 96 | 444* | 13 | 0.02 |
84
+
85
+ <sub>*Qwen2.5-1.5B loaded in 4-bit. Larger general models lead on bits-per-byte
86
+ and cloze (they are 12–30× bigger and trained for general knowledge); SAM-G
87
+ leads decisively on structured action, French actions, footprint, speed, and
88
+ exact-match per parameter. Notably Qwen2.5-1.5B scores *below* Qwen2.5-0.5B on
89
+ action exact-match — capability here comes from domain specialization, not
90
+ scale.</sub>
91
+
92
+ ## Per-domain exact match (%)
93
+
94
+ | ros | http | mqtt | db | workflow | ecommerce | vehicle | home | cal | file |
95
+ |---|---|---|---|---|---|---|---|---|---|
96
+ | 0 | 100 | 100 | 100 | 60 | 100 | 100 | 50 | 80 | 60 |
97
+
98
+ All general baselines score 0 on most domains, succeeding only partially on the
99
+ most generic ones (home, cal). `ros` (floating-point fields) is SAM-G's weakest
100
+ schema and benefits most from additional training data.
101
+
102
+ ## Usage
103
+
104
+ ```python
105
+ import sentencepiece as spm, torch
106
+ # Load the released inference tokenizer (samg_tokenizer.model) and weights.
107
+ sp = spm.SentencePieceProcessor(); sp.Load("samg_tokenizer.model")
108
+
109
+ prompt = "publish 21.5 on sensors/temp qos 1 [ACTION]"
110
+ ids = torch.tensor([sp.EncodeAsIds(prompt)])
111
+ # greedy-decode with your loaded model until EOS, then sp.DecodeIds(...)
112
+ # -> {"domain":"mqtt","op":"publish","params":{"topic":"sensors/temp","payload":21.5,"qos":1}}
113
+ ```
114
+
115
+ Always parse output as JSON and validate against your schema before execution.
116
+
117
+ ## Intended use
118
+
119
+ On-device home automation; NL→ROS robot command layers; MQTT fleet gateways;
120
+ offline vehicle commands; NL-to-SQL on embedded databases; workflow triggers;
121
+ and the structured tool-calling stage of agentic pipelines — as a drop-in
122
+ replacement or a fast router ahead of a larger hosted model.
123
+
124
+ ## Limitations
125
+
126
+ - Not a general assistant: factual knowledge and open-ended reasoning are
127
+ limited at this scale; larger general models lead on bits-per-byte and cloze.
128
+ - French covers actions, not extended prose.
129
+ - Schemas outside the ten domains need fine-tuning. The `ros` schema
130
+ (floating-point fields) is the weakest and benefits most from more data.
131
+ - The action benchmark is synthetic, drawn from the training distribution
132
+ family with a disjoint evaluation seed (999).
133
+
134
+ ## Citation
135
+
136
+ ```bibtex
137
+ @misc{samg2026,
138
+ title = {SAM-G: A 30M-Parameter Dual-Mode Language Model for Offline Structured Action Generation},
139
+ author = {AMEFORGE Lab},
140
+ year = {2026}
141
+ }
142
+ ```