YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
rob-rbyte-v4
Residue router for the SAIR Modular Arithmetic Challenge. Entry class
model.ResidueRouterV1, output base 256. Covers tiers 1-5.
Routing is by the size of p. Operands are reduced mod p inside
predict_digits (the two-argument normalization both reference models use: a
with p, then b with p, never all three).
Tiers 1-2 (p <= 251): the v1 residue specialist. Each operand residue is embedded through a shared per-(prime, residue) table; the two vectors are added (a discrete-log inductive bias: logs add under multiplication); a residual MLP trunk transforms the sum; logits score against a per-(prime, class) output table masked to the p classes of the current prime. The answer is one base-256 digit. ~2.9M parameters.
Tier 3 (251 < p < 65536): two trained shared local-rule step nets composed through fixed wiring. After reduction x, y are 16-bit residues. A MULTIPLY step learns the shared carry rule over the carry-save column sums and, composed closed-loop through a fixed parity readout, emits the exact 32-bit product t = x*y. A REDUCTION step learns the shared per-nibble borrow/compare rule and, composed through fixed restoring-division wiring, emits r = t mod p in [0, p). Plain GELU MLPs, width 96, depth 3, ~20k params each.
Tier 4 (65536 <= p < 2^32): the SAME two rules at 32-bit geometry. After reduction x, y are 32-bit residues. The MULTIPLY step learns the carry rule over the 63 carry-save columns (sum <= 32, carry <= 31) and, composed through the parity readout widened to 64 bits, emits the 64-bit product as BITS. The REDUCTION step is the identical 512-case per-nibble borrow rule, composed over 64 division positions x 9 nibbles, emitting r = t mod p in [0, p). Multiply step GELU MLP width 128 depth 3 (
35k params), reduction step width 96 depth 3 (20k params).Tier 5 (2^32 <= p < 2^64): the SAME two rules at 64-bit geometry. After reduction x, y are 64-bit residues. The MULTIPLY step learns the carry rule over the 127 carry-save columns (sum <= 64, carry <= 63) and, composed through the parity readout widened to 128 bits, emits the 128-bit product as BITS. The REDUCTION step is the identical 512-case per-nibble borrow rule, composed over 128 division positions x 17 nibbles, emitting r = t mod p in [0, p). Because a 64-bit residue and the 65-bit division register both overflow signed int64, tier 5 carries operands, p, the product, and the division register as BIT tensors and never materializes a wide value as an int64 scalar. Multiply step GELU MLP width 160 depth 3 (
55k params), reduction step width 96 depth 3 (20k params). The two techniques carried from tier 4: reciprocal-operand framing (each triple traced as both (x,y) and (y,x)) and Charton-Kempe two-set sampling (a small repeated set + a large fresh set).Tiers 6-10 (p >= 2^64): outside the trained regime; returns [0].
Provenance
In every tier the carry-save column sums, parity readout, bit shifts, restoring-division topology, and ge-from-final-borrow decision are fixed scaffold. The two nontrivial decisions, the carry rule and the borrow/compare rule, reside in trained MLP parameters (separate nets per tier-3 / tier-4 / tier-5 geometry). Randomizing a step net collapses its tier:
- tier 3 random-weight pipeline: exact = 0.000000. See
t3_collapse_receipt.json. - tier 4 random-weight pipeline: exact = 0.000000. See
t4_collapse_receipt.json. - tier 5 random-weight pipeline: exact = 0.000000; trained mul + random red =
0.000000. See
t5_collapse_receipt.json.
Every tier-3/4/5 multiply and reduction step net reaches per-case exactness 1.0
on its full enumerated domain (tier 5: mul 4160-case / red 512-case), so the
composed pipelines are exact by the fixed wiring. Five primes per tier are held
out by identity and appear in no training trace; the composed pipeline is exact
(1.0) on all five on uniform residue pairs and the four edge cases. The five
tier-5 gate primes (61-64 bits): 1690313788893089131, 6145258606915434311,
8963783833428354709, 11534118763423864511, 14481575096435149429
(t5_collapse_receipt.json and experiments/014-t5-lifted-step/).
Public benchmark (1100 problems, fixed seed)
Run through the official pipeline (modchallenge evaluate ./submission/rob-rbyte-v4 --total 1100); the per-tier accuracy and highest_tier_above_90 come from the
official decoder, not an internal tensor check:
- overall_accuracy = 0.510
- highest_tier_above_90 = 5
- deterministic = True (two full runs bit-identical per tier)
- tier 1 = 1.000, tier 2 = 1.000, tier 3 = 1.000, tier 4 = 1.000, tier 5 = 1.000
- tiers 6-10 = 0.020 (chance; outside the trained regime, returns [0])
- full eval wall ~12s
See EVALS.log and eval_official_1100.json for the full breakdown and
manifest.json for the model and training descriptions.
Static check: clean. No sympy / gmpy2 / eval / exec / subprocess on any path.
Files
model.py (architectures + routing + fixed wiring), weights.safetensors
(tier-1/2 specialist), t3_mul.safetensors / t3_red.safetensors (tier-3 step
nets), t4_mul.safetensors / t4_red.safetensors (tier-4 step nets),
t5_mul.safetensors / t5_red.safetensors (tier-5 step nets), config.json
(per-specialist hyperparameters), manifest.json, t3_collapse_receipt.json,
t4_collapse_receipt.json, t5_collapse_receipt.json, EVALS.log,
eval_official_1100.json.
- Downloads last month
- 18