etwk commited on
Commit ·
b69f4ce
1
Parent(s): 47e319e
Tier-7 256-bit carry-aware TCN cell: highest_tier 6->7, overall 0.602->0.698, tier-7 0.98
Browse filesRoutes p<2^256 to a new 256-bit TCN cell (12 blocks/256ch/dilations 1-128,
~4.7M params; CELL_WIDTHS now 16/32/64/128/256). Full public benchmark:
tier 7 = 0.98, overall 0.698, highest_tier_above_90 = 7, deterministic.
Compliance: 0.98@s0 -> 0.06@s0.25 -> 0.02@s0.5, untrained 0.00.
README + manifest updated to five cells / up to 2^256.
- README.md +30 -21
- manifest.json +2 -2
- model.py +8 -8
README.md
CHANGED
|
@@ -9,17 +9,18 @@ tags:
|
|
| 9 |
- neural-algorithm
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# Horner-RNN — learned modular multiplication up to 2
|
| 13 |
|
| 14 |
A compliant **bit-sequential RNN** that computes `(a · b) mod p` for primes `p` up to
|
| 15 |
-
**2
|
| 16 |
multiplication tables. Entry for the
|
| 17 |
[Modular Arithmetic Challenge](https://github.com/SAIRcompetition/modular-arithmetic-challenge).
|
| 18 |
|
| 19 |
-
- **Saturates tiers 1–
|
| 20 |
-
- **overall_accuracy 0.
|
| 21 |
-
- The 128-bit (tier-6)
|
| 22 |
-
|
|
|
|
| 23 |
- Verifiably **generalises to primes never seen in training** (held-out-prime validation
|
| 24 |
accuracy tracks training accuracy — no memorisation gap)
|
| 25 |
|
|
@@ -33,8 +34,9 @@ t_{k+1} = (2·t_k + a_bit_k · b) mod p # one learned step (Horner)
|
|
| 33 |
answer = t_N (N = bit width of p)
|
| 34 |
```
|
| 35 |
|
| 36 |
-
The model is an RNN whose **transition function — an MLP
|
| 37 |
-
|
|
|
|
| 38 |
(a hard binary bottleneck), so the recurrence composes cleanly: if the cell is exact per
|
| 39 |
step, the chain is exact end-to-end. At inference the scan feeds the bits of `a mod p` one
|
| 40 |
per step, conditioned on `(b mod p, p)`, and the final hidden-state bits are emitted
|
|
@@ -46,7 +48,7 @@ The single-step function is **piecewise linear** (`2t + bit·b`, then subtract `
|
|
| 46 |
|
| 47 |
## Files / cells
|
| 48 |
|
| 49 |
-
The model ships **
|
| 50 |
holds the prime:
|
| 51 |
|
| 52 |
| File | Cell | Primes | Tiers | Arch | Params | Public benchmark |
|
|
@@ -55,22 +57,25 @@ holds the prime:
|
|
| 55 |
| `weights32.pt` | 32-bit | `< 2³²` | 4 | MLP, 6144 / 4 | ~114M | tier 4 = 0.99 |
|
| 56 |
| `weights64.pt` | 64-bit | `< 2⁶⁴` | 5 | MLP, 4096 / 7, residual | ~236M | tier 5 = 0.98 |
|
| 57 |
| `weights128.pt` | 128-bit | `< 2¹²⁸` | 6 | **carry-aware TCN**, 256ch / 10 blocks, dilations 1–64 | ~3.9M | **tier 6 = 0.97** |
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
The 64-bit cell needs **depth and residual connections** the narrower cells do not: a 64-bit
|
| 68 |
modular Horner step hides two long carry chains (the `2t + bit·b` addition and the
|
| 69 |
compare-and-subtract reduction), and exact n-bit carry propagation wants MLP depth ~log₂(n).
|
| 70 |
The last push from tier 5 = 0.74 to 0.98 came from training the 64-bit cell's single-step
|
| 71 |
examples on the **states the chain actually visits** (the true Horner trajectory) rather than
|
| 72 |
-
uniformly sampled `t` — see *Training*. For `p ≥ 2⁶
|
| 73 |
-
fallback without invoking the network.
|
| 74 |
|
| 75 |
Also in the repo: `model.py` (the `HornerRNN` entry class + `HornerCell`), `manifest.json`
|
| 76 |
(challenge manifest), `train.py` (the 16-bit trainer).
|
|
@@ -91,7 +96,7 @@ import torch
|
|
| 91 |
from model import HornerRNN # model.py from this repo
|
| 92 |
|
| 93 |
m = HornerRNN()
|
| 94 |
-
m.load(".") # auto-loads weights{16,32,64}.pt from this dir
|
| 95 |
# returns base-2 digits, MSB-first; the harness decodes them to the integer
|
| 96 |
digits = m.predict_digits_batch([(123456789, 987654321, 4294967291)])[0]
|
| 97 |
answer = int("".join(map(str, digits)), 2)
|
|
@@ -121,6 +126,7 @@ cell is *at* the floor. The capability therefore resides in the trained paramete
|
|
| 121 |
| tier 4 (32-bit cell) | 0.99 | 0.99 | 0.86 | 0.04 | 0.02 | 0.00 |
|
| 122 |
| tier 5 (64-bit cell) | 0.98 | 0.95 | 0.65 | 0.03 | 0.01 | 0.00 |
|
| 123 |
| tier 6 (128-bit TCN) | 0.97 | 0.96 | 0.98 | 0.19 | 0.02 | 0.00 |
|
|
|
|
| 124 |
|
| 125 |
Generalisation against memorisation: 10% of primes at each bit-width were held out of
|
| 126 |
training entirely; chain accuracy on them matches the training primes.
|
|
@@ -137,7 +143,10 @@ the recurrence (ordinary supervised BCE on the same single-step target). The 128
|
|
| 137 |
cell is trained the same single-step way but as the **carry-aware TCN** over a high-diversity
|
| 138 |
pool of thousands of distinct 124–128 bit primes; its weight-shared dilated-convolution bias
|
| 139 |
reaches a per-step error ~15× lower than the same-task MLP, giving **tier 6 = 0.97** in a single
|
| 140 |
-
short run.
|
|
|
|
|
|
|
|
|
|
| 141 |
metadata / challenge leaderboard).
|
| 142 |
|
| 143 |
## License
|
|
|
|
| 9 |
- neural-algorithm
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# Horner-RNN — learned modular multiplication up to 2²⁵⁶
|
| 13 |
|
| 14 |
A compliant **bit-sequential RNN** that computes `(a · b) mod p` for primes `p` up to
|
| 15 |
+
**2²⁵⁶**, by *learning the Horner step of double-and-add* rather than memorising
|
| 16 |
multiplication tables. Entry for the
|
| 17 |
[Modular Arithmetic Challenge](https://github.com/SAIRcompetition/modular-arithmetic-challenge).
|
| 18 |
|
| 19 |
+
- **Saturates tiers 1–7** (all primes `< 2²⁵⁶`): tiers 1–3 = 100%, tier 4 = 99%, tier 5 = 98%, tier 6 = 97%, **tier 7 = 98%**
|
| 20 |
+
- **overall_accuracy 0.698**, `highest_tier_above_90 = 7`
|
| 21 |
+
- The 128-bit (tier-6) and 256-bit (tier-7) cells are **carry-aware TCNs** (weight-shared dilated
|
| 22 |
+
convolutions over the bit-positions, ~4–5M params each) — a far better inductive bias for long
|
| 23 |
+
carry chains than the MLP, and the key to the per-step precision a 128/256-step chain demands
|
| 24 |
- Verifiably **generalises to primes never seen in training** (held-out-prime validation
|
| 25 |
accuracy tracks training accuracy — no memorisation gap)
|
| 26 |
|
|
|
|
| 34 |
answer = t_N (N = bit width of p)
|
| 35 |
```
|
| 36 |
|
| 37 |
+
The model is an RNN whose **transition function — an MLP for the 16/32/64-bit cells, a
|
| 38 |
+
carry-aware TCN for the 128/256-bit cells — is trained on exactly that single-step map**
|
| 39 |
+
over binary-encoded inputs. The hidden state is a quantized bit vector
|
| 40 |
(a hard binary bottleneck), so the recurrence composes cleanly: if the cell is exact per
|
| 41 |
step, the chain is exact end-to-end. At inference the scan feeds the bits of `a mod p` one
|
| 42 |
per step, conditioned on `(b mod p, p)`, and the final hidden-state bits are emitted
|
|
|
|
| 48 |
|
| 49 |
## Files / cells
|
| 50 |
|
| 51 |
+
The model ships **five cells** and routes each problem to the narrowest one whose state
|
| 52 |
holds the prime:
|
| 53 |
|
| 54 |
| File | Cell | Primes | Tiers | Arch | Params | Public benchmark |
|
|
|
|
| 57 |
| `weights32.pt` | 32-bit | `< 2³²` | 4 | MLP, 6144 / 4 | ~114M | tier 4 = 0.99 |
|
| 58 |
| `weights64.pt` | 64-bit | `< 2⁶⁴` | 5 | MLP, 4096 / 7, residual | ~236M | tier 5 = 0.98 |
|
| 59 |
| `weights128.pt` | 128-bit | `< 2¹²⁸` | 6 | **carry-aware TCN**, 256ch / 10 blocks, dilations 1–64 | ~3.9M | **tier 6 = 0.97** |
|
| 60 |
+
| `weights256.pt` | 256-bit | `< 2²⁵⁶` | 7 | **carry-aware TCN**, 256ch / 12 blocks, dilations 1–128 | ~4.7M | **tier 7 = 0.98** |
|
| 61 |
+
|
| 62 |
+
The 128- and 256-bit cells switch architecture: instead of a full-width MLP each is a **non-causal
|
| 63 |
+
dilated 1-D convolutional network over the bit-positions** (128 and 256 respectively). Carry
|
| 64 |
+
propagation is *position-invariant* — the same carry/borrow rule applies at every bit — so a
|
| 65 |
+
weight-shared convolution learns **one** rule applied everywhere (non-causal, so the addition carry
|
| 66 |
+
flows LSB→MSB and the mod-`p` compare/borrow flows MSB→LSB), rather than an MLP learning a separate
|
| 67 |
+
position-function per bit. This inductive bias drives the per-step error roughly **15× lower** than the
|
| 68 |
+
same-task MLP — the difference between a 128/256-step chain landing at ~0.26 and at **0.97 / 0.98** —
|
| 69 |
+
in cells **~60× smaller** than the wide MLPs (16–19 MB each vs ~950 MB). The receptive field of each
|
| 70 |
+
TCN spans its full width in both carry directions, so a carry can propagate across the entire word.
|
| 71 |
|
| 72 |
The 64-bit cell needs **depth and residual connections** the narrower cells do not: a 64-bit
|
| 73 |
modular Horner step hides two long carry chains (the `2t + bit·b` addition and the
|
| 74 |
compare-and-subtract reduction), and exact n-bit carry propagation wants MLP depth ~log₂(n).
|
| 75 |
The last push from tier 5 = 0.74 to 0.98 came from training the 64-bit cell's single-step
|
| 76 |
examples on the **states the chain actually visits** (the true Horner trajectory) rather than
|
| 77 |
+
uniformly sampled `t` — see *Training*. For `p ≥ 2²⁵⁶` (wider than the widest trained cell) the
|
| 78 |
+
model emits the honest `[0]` fallback without invoking the network.
|
| 79 |
|
| 80 |
Also in the repo: `model.py` (the `HornerRNN` entry class + `HornerCell`), `manifest.json`
|
| 81 |
(challenge manifest), `train.py` (the 16-bit trainer).
|
|
|
|
| 96 |
from model import HornerRNN # model.py from this repo
|
| 97 |
|
| 98 |
m = HornerRNN()
|
| 99 |
+
m.load(".") # auto-loads weights{16,32,64,128,256}.pt from this dir
|
| 100 |
# returns base-2 digits, MSB-first; the harness decodes them to the integer
|
| 101 |
digits = m.predict_digits_batch([(123456789, 987654321, 4294967291)])[0]
|
| 102 |
answer = int("".join(map(str, digits)), 2)
|
|
|
|
| 126 |
| tier 4 (32-bit cell) | 0.99 | 0.99 | 0.86 | 0.04 | 0.02 | 0.00 |
|
| 127 |
| tier 5 (64-bit cell) | 0.98 | 0.95 | 0.65 | 0.03 | 0.01 | 0.00 |
|
| 128 |
| tier 6 (128-bit TCN) | 0.97 | 0.96 | 0.98 | 0.19 | 0.02 | 0.00 |
|
| 129 |
+
| tier 7 (256-bit TCN) | 0.98 | 0.97 | 0.99 | 0.06 | 0.02 | 0.00 |
|
| 130 |
|
| 131 |
Generalisation against memorisation: 10% of primes at each bit-width were held out of
|
| 132 |
training entirely; chain accuracy on them matches the training primes.
|
|
|
|
| 143 |
cell is trained the same single-step way but as the **carry-aware TCN** over a high-diversity
|
| 144 |
pool of thousands of distinct 124–128 bit primes; its weight-shared dilated-convolution bias
|
| 145 |
reaches a per-step error ~15× lower than the same-task MLP, giving **tier 6 = 0.97** in a single
|
| 146 |
+
short run. The 256-bit (tier-7) cell is the same carry-aware TCN scaled to 256 bit-positions
|
| 147 |
+
(dilations cycling 1–128), trained identically on true-trajectory single steps over distinct
|
| 148 |
+
252–256 bit primes; its per-step error is low enough that the 256-step chain holds at **tier 7 =
|
| 149 |
+
0.98**. Training code and the full write-up live in the solutions repo (link in the model card
|
| 150 |
metadata / challenge leaderboard).
|
| 151 |
|
| 152 |
## License
|
manifest.json
CHANGED
|
@@ -2,6 +2,6 @@
|
|
| 2 |
"entry_class": "model.HornerRNN",
|
| 3 |
"output_base": 2,
|
| 4 |
"framework": "pytorch",
|
| 5 |
-
"model_description": "Bit-sequential RNN (~
|
| 6 |
-
"training_description": "Each transition cell trained from random init on (t, bit, b, p) -> (2t + bit*b) mod p single-step examples over its prime range (16-bit: all primes < 2^16; 32-bit and 64-bit: random primes sampled uniform-by-value in [2^16, 2^32) and [2^33, 2^64) to match the test generator's randrange+nextprime distribution), with half of each batch mined near the comparison boundary (2t + bit*b within +/-2 of a multiple of p) where errors concentrate. BCE per state bit, AdamW + cosine decay + gradient clipping + LR warmup, EMA weights checkpointed by full-chain validation accuracy on a held-out 10% of primes never seen in training — val accuracy tracks train accuracy, i.e. the cells generalise across primes rather than memorising them. The 64-bit cell additionally receives a second fine-tuning phase on single steps drawn from the TRUE Horner trajectory (each example is a (t, bit, b, p) -> (2t + bit*b) mod p step where t is an actual chain intermediate (a_{>=i}*b) mod p, not a uniform sample), which matches the training distribution to the states the chain visits at inference and lifts tier 5 from 0.74 to 0.98; still ordinary supervised BCE on the same single-step target, no backprop through the recurrence. The 128-bit (tier-6) cell is the carry-aware TCN, trained the same way — single-step BCE on TRUE Horner-trajectory states (t, bit, b, p) -> (2t + bit*b) mod p — from random init over a high-diversity pool of thousands of distinct 124-128 bit primes (so it generalises across primes rather than memorising the conditional subtraction for a few). Its weight-shared dilated-convolution inductive bias reaches a per-step error roughly 15x lower than the same-task MLP cell, giving 0.97 full-chain accuracy on held-out 124-128 bit primes; same supervised single-step objective, no backprop through the recurrence, AdamW + cosine decay + grad clip + EMA checkpointed by held-out full-chain accuracy. Weight-perturbation compliance (exploration/compliance_perturb.py):
|
| 7 |
}
|
|
|
|
| 2 |
"entry_class": "model.HornerRNN",
|
| 3 |
"output_base": 2,
|
| 4 |
"framework": "pytorch",
|
| 5 |
+
"model_description": "Bit-sequential RNN (~410M params across five cells) for primes up to 2^256. Reads the bits of a mod p MSB-first, one per step, conditioned on (b mod p, p) in binary; the hidden state is a quantized bit vector (hard binary bottleneck) and the transition function must learn the Horner step (t, bit, b, p) -> (2t + bit*b) mod p to make the recurrence end on the right answer. Five cells are shipped and routed by prime size: a 16-bit cell (MLP, width 4096 depth 4, ~50M params) for p < 2^16 covering tiers 1-3, a 32-bit cell (MLP, width 6144 depth 4, ~114M params) for p < 2^32 covering tier 4, a 64-bit cell (MLP, width 4096 depth 7 with pre-norm residual blocks, ~236M params) for p < 2^64 covering tier 5, a 128-bit cell for p < 2^128 covering tier 6 that is a CARRY-AWARE TCN: a non-causal dilated 1D-convolutional network over the 128 bit-positions (10 residual blocks, 256 channels, dilations cycling 1..64 so the receptive field spans all 128 bits, ~3.9M params), and a 256-bit cell for p < 2^256 covering tier 7 that uses the SAME carry-aware TCN architecture scaled to 256 bit-positions (12 residual blocks, 256 channels, dilations cycling 1..128 so the receptive field spans all 256 bits, ~4.7M params), lifting tier 7 to 0.98. The convolution is weight-shared across bit positions, so it learns ONE carry/borrow rule applied everywhere (non-causally, so the addition carry can flow LSB->MSB and the mod-p compare/borrow MSB->LSB) instead of a full-width MLP learning a separate position-function per bit; this inductive bias drives the per-step error far below what an MLP cell reaches and is what makes the 128- and 256-bit chains (which compound the per-step error over 128 and 256 steps) accurate. Final state bits are emitted MSB-first as the base-2 answer. For p >= 2^256 emits the honest [0] fallback without invoking the network.",
|
| 6 |
+
"training_description": "Each transition cell trained from random init on (t, bit, b, p) -> (2t + bit*b) mod p single-step examples over its prime range (16-bit: all primes < 2^16; 32-bit and 64-bit: random primes sampled uniform-by-value in [2^16, 2^32) and [2^33, 2^64) to match the test generator's randrange+nextprime distribution), with half of each batch mined near the comparison boundary (2t + bit*b within +/-2 of a multiple of p) where errors concentrate. BCE per state bit, AdamW + cosine decay + gradient clipping + LR warmup, EMA weights checkpointed by full-chain validation accuracy on a held-out 10% of primes never seen in training — val accuracy tracks train accuracy, i.e. the cells generalise across primes rather than memorising them. The 64-bit cell additionally receives a second fine-tuning phase on single steps drawn from the TRUE Horner trajectory (each example is a (t, bit, b, p) -> (2t + bit*b) mod p step where t is an actual chain intermediate (a_{>=i}*b) mod p, not a uniform sample), which matches the training distribution to the states the chain visits at inference and lifts tier 5 from 0.74 to 0.98; still ordinary supervised BCE on the same single-step target, no backprop through the recurrence. The 128-bit (tier-6) cell is the carry-aware TCN, trained the same way — single-step BCE on TRUE Horner-trajectory states (t, bit, b, p) -> (2t + bit*b) mod p — from random init over a high-diversity pool of thousands of distinct 124-128 bit primes (so it generalises across primes rather than memorising the conditional subtraction for a few). Its weight-shared dilated-convolution inductive bias reaches a per-step error roughly 15x lower than the same-task MLP cell, giving 0.97 full-chain accuracy on held-out 124-128 bit primes; same supervised single-step objective, no backprop through the recurrence, AdamW + cosine decay + grad clip + EMA checkpointed by held-out full-chain accuracy. The 256-bit (tier-7) cell is the same carry-aware TCN scaled to 256 bit-positions (dilations cycling 1..128), trained identically — single-step BCE on TRUE Horner-trajectory states over a high-diversity pool of distinct 252-256 bit primes — reaching a per-step error low enough that the 256-step chain holds at 0.98 full-chain accuracy on held-out 252-256 bit primes. Weight-perturbation compliance (exploration/compliance_perturb.py): each cell's accuracy at sigma=0 collapses toward the floor as the weights are perturbed and an untrained re-init scores 0.00 — e.g. tier 6 0.97 -> 0.19 (sigma=0.25) -> 0.02 (sigma=0.5) and tier 7 0.98 -> 0.06 (sigma=0.25) -> 0.02 (sigma=0.5), untrained 0.00 for both — so the arithmetic resides in the trained parameters. Training scripts: train.py (16-bit), exploration/train_horner32.py (32-bit), exploration/train_horner64.py (64-bit phase 1, --residual) then exploration/train_horner64_traj.py (64-bit phase 2, trajectory), exploration/train_horner128_bigru.py --arch tcn (128-bit carry-aware TCN), exploration/train_horner_tcn.py --bits 256 (256-bit carry-aware TCN)."
|
| 7 |
}
|
model.py
CHANGED
|
@@ -5,8 +5,8 @@ one per step, conditioned on ``(b mod p, p)`` in binary. The hidden state is a
|
|
| 5 |
quantized bit vector (a discrete bottleneck — a hard VQ layer with a fixed
|
| 6 |
binary codebook), and the transition function — an MLP for the 16/32/64-bit
|
| 7 |
cells, a weight-shared carry-aware dilated-conv TCN (TCNHornerCell) for the
|
| 8 |
-
128-bit
|
| 9 |
-
state bits ARE the answer, emitted MSB-first in base 2.
|
| 10 |
|
| 11 |
Why this is interesting: for the recurrence to end on the right answer, the
|
| 12 |
trained cell must *learn* the map ``(t, bit, b, p) -> (2t + bit*b) mod p`` —
|
|
@@ -24,16 +24,16 @@ line is respected here:
|
|
| 24 |
random weights the output is noise (Principle 2), and the emitted digits are
|
| 25 |
exactly the model's final hidden state (Principle 1).
|
| 26 |
- learned (all of the actual arithmetic): the transition function. Nothing in
|
| 27 |
-
the code adds, multiplies, compares against p, or carries; the cell's
|
| 28 |
-
weights had to learn all of that from data.
|
| 29 |
|
| 30 |
The two-operand reductions ``a mod p`` / ``b mod p`` in ``predict_digits`` are
|
| 31 |
the same legal input normalisation every other reference model uses.
|
| 32 |
|
| 33 |
The model ships one cell per bit-width (16 -> tiers 1-3, 32 -> tier 4, 64 ->
|
| 34 |
-
tier 5,
|
| 35 |
-
cell whose state holds the prime. For primes wider than the widest
|
| 36 |
-
it emits the honest ``[0]`` fallback without invoking the network.
|
| 37 |
"""
|
| 38 |
|
| 39 |
from __future__ import annotations
|
|
@@ -48,7 +48,7 @@ from modchallenge.interface.base_model import ModularMultiplicationModel
|
|
| 48 |
|
| 49 |
# Bit-widths we may ship a cell for, narrowest first. load() picks up whichever
|
| 50 |
# weights{W}.pt files are actually present, so adding a wider cell is drop-in.
|
| 51 |
-
CELL_WIDTHS = (16, 32, 64, 128)
|
| 52 |
|
| 53 |
# Default state width for the 16-bit trainer (train.py imports this).
|
| 54 |
BITS = 16
|
|
|
|
| 5 |
quantized bit vector (a discrete bottleneck — a hard VQ layer with a fixed
|
| 6 |
binary codebook), and the transition function — an MLP for the 16/32/64-bit
|
| 7 |
cells, a weight-shared carry-aware dilated-conv TCN (TCNHornerCell) for the
|
| 8 |
+
128- and 256-bit cells — is entirely trained parameters. After the last bit,
|
| 9 |
+
the hidden state bits ARE the answer, emitted MSB-first in base 2.
|
| 10 |
|
| 11 |
Why this is interesting: for the recurrence to end on the right answer, the
|
| 12 |
trained cell must *learn* the map ``(t, bit, b, p) -> (2t + bit*b) mod p`` —
|
|
|
|
| 24 |
random weights the output is noise (Principle 2), and the emitted digits are
|
| 25 |
exactly the model's final hidden state (Principle 1).
|
| 26 |
- learned (all of the actual arithmetic): the transition function. Nothing in
|
| 27 |
+
the code adds, multiplies, compares against p, or carries; the cell's trained
|
| 28 |
+
weights (MLP or carry-aware TCN) had to learn all of that from data.
|
| 29 |
|
| 30 |
The two-operand reductions ``a mod p`` / ``b mod p`` in ``predict_digits`` are
|
| 31 |
the same legal input normalisation every other reference model uses.
|
| 32 |
|
| 33 |
The model ships one cell per bit-width (16 -> tiers 1-3, 32 -> tier 4, 64 ->
|
| 34 |
+
tier 5, 128 -> tier 6, and 256 -> tier 7 when present) and routes each problem to
|
| 35 |
+
the narrowest cell whose state holds the prime. For primes wider than the widest
|
| 36 |
+
trained cell it emits the honest ``[0]`` fallback without invoking the network.
|
| 37 |
"""
|
| 38 |
|
| 39 |
from __future__ import annotations
|
|
|
|
| 48 |
|
| 49 |
# Bit-widths we may ship a cell for, narrowest first. load() picks up whichever
|
| 50 |
# weights{W}.pt files are actually present, so adding a wider cell is drop-in.
|
| 51 |
+
CELL_WIDTHS = (16, 32, 64, 128, 256)
|
| 52 |
|
| 53 |
# Default state width for the 16-bit trainer (train.py imports this).
|
| 54 |
BITS = 16
|