AGILLM 4.3 — Autoregressive + DiffusionBlock + MoE Language Model

Single-file implementation: agillm41.py Parameters: 1.22B (1,221,580,802) Architecture: d_model=1280, layers=28, heads=20, d_k=64, rank=160 (2.5× expansion), tied weights

⚠️ CHECKPOINT PROVENANCE — READ FIRST

Checkpoint filenames (e.g. pretrain_step00050650.pt) reflect the step counter within the current training run, NOT total training steps.

This repo contains multiple checkpoint lineages. The 2026-06-24 pretrain_step00050650.pt artifact did warm-start from step 2,182,564 (~2.1M) of a prior run, but that is historical provenance, not the current recovery base. Do not restart current AGILLM4.3 recovery from raw pretrain_step02182564.pt unless explicitly doing a clean historical rollback experiment.

Artifact / run	Meaning
`pretrain_step00050650.pt`	Historical current-run step 50,650 after the 2,182,564 warm-start.
`pretrain_step00243186_from00050650_20260630T1811Z.pt`	Later-lineage v100a0 checkpoint selected for the 2026-07-01 recovery because its June 30 inference was materially better than the July 1 latest delta.
`pretrain_step00359091.pt` + FedC delta `pretrain_delta_step00030961_from00359091_20260701T0522Z.pt`	July 1 path that produced fragment/date-token regression in AR/SAT/NAT smoke tests; do not report this quality as healthy.

Current recovery checkpoint on HF:

checkpoints/pretrain_step00243186_from00050650_20260630T1811Z.pt

Architecture

Component	Value
Backbone	Autoregressive transformer (AR)
DiffusionBlocks	Active — layers cycle AR/SAT/NAT objectives
Mixture-of-Experts	Active — 14 slots per block
d_model	1280
Layers	28
Attention heads	20
Tied weights	Yes
Tokenizer	Llama-compatible (from checkpoint)

Training Fleet (as of 2026-06-24)

FedA (41441116): 2× V100-SXM2-32GB, ssh2.vast.ai:11116, $0.0593/hr
- a0: role=coverage, B=56, L=1536
- a1: role=hard-blocks, B=48, L=1536
Target: 67.2B tokens total
Budget runway: ~Jul 24, 2026

Current Recovery Run (2026-07-01)

FedC Vast host: ae2bb300509f / RTX 3090 Ti.
Live recovery PID at verification: 7100.
Warm-start: checkpoints/pretrain_step00243186_from00050650_20260630T1811Z.pt (v100a0 later-lineage checkpoint, SHA256 e65d65ba82239f28e10188767fe16ba091dad11c60bb57aac346ded684604349).
Corrected source mix: FineWeb, FineWeb-Edu sample-10BT, Wikipedia 20231101.en, C4 en, OpenWebText, Falcon-RefinedWeb, Proof-Pile-2.
Excluded from AGILLM4.3 pretraining: local AGILLM3 numeracy JSONL (/workspace/agillm_math_numeracy_synth/train.jsonl) and Dolma sample source.
Initial corrected validation: ce=9.1199; first stable progress line step=101, 61962.18 tok/s, loss=6.818.

Inference

# AR mode (standard autoregressive)
python3 agillm41.py infer \
  --ckpt checkpoints/warmstart_step2182564__current_step50650/pretrain_step00050650.pt \
  --prompt "Your prompt here" \
  --mode ar --max_new 100 --plain-output --block_stream

# SAT mode (score-and-threshold diffusion)
python3 agillm41.py infer ... --mode sat

# NAT mode (non-autoregressive diffusion)
python3 agillm41.py infer ... --mode nat

Note: If both GPUs are busy with training, add CUDA_VISIBLE_DEVICES="" to force CPU inference (slow but functional: ~1.2 tok/s).

Dependency: agillm_checkpoint_provenance.py must be in the same directory as agillm41.py.

Current Inference Quality / Recovery Status (2026-07-01)

See INFERENCE_QUALITY.md for AR/SAT/NAT benchmark outputs and regression notes.

The July 1 FedC latest-delta smoke test was not healthy: AR/SAT/NAT outputs were dominated by date/number/token fragments. Treat that as a quality regression, not as a pass.

A corrected FedC recovery run is live from later-lineage v100a0 checkpoint pretrain_step00243186_from00050650_20260630T1811Z.pt, whose archived June 30 AR sample was materially better than the regressed July 1 delta. At launch, the corrected run used the language/generic-math mix only and validated with language_mix=True numeracy=False.

Before reporting model quality healthy again, run AR + SAT + NAT inference on the next saved checkpoint from the recovery run and record it in INFERENCE_QUALITY.md.

Repositories

Repo	Type	Notes
`Marxist-Leninist/agillm4.3-private`	GitHub private	Source of truth for code
`Marxist-Leninist/AGILLM4.3`	GitHub public	Mirror
`Marxist-Leninist/AGILLM4.1`	GitHub public	Mirror (same codebase)
`Marxist-Leninist/agillm4.1-private`	GitHub private	Mirror
`OpenTransformer/AGILLM-4.3`	HuggingFace public	Code, inference artifacts, and active recovery checkpoints
`OpenTransformer/agillm4.3-private`	HuggingFace private	Historical/private mirror; do not use for active recovery checkpoint uploads unless explicitly requested
`OpenTransformer/AGILLM-4.3`	HuggingFace public	Code + checkpoints

For Future Claude/AI Agents

MCP memory (Silicon Goddess) slot index for AGILLM4.3 state: slots 42, 95, 481–525+. Standing instruction: always run AR + SAT + NAT inference checks before reporting training healthy. See INFERENCE_QUALITY.md.

Latest Inference Smoke Test - 2026-06-26

Latest smoke-test artifacts were uploaded under training/agillm43_shared/inference/20260626T183400Z/.

Monolithic latest-checkpoint AR: /workspace/agillm4_v100a0_ckpts/pretrain_step00065633_from00050650_20260626T1811Z.pt, 32 tokens at 5.0 tok/s on CPU.
Distributed AR: existing 2026-06-06 split packages across GETH/MCP/Prime/communist-web, 32 tokens at 1.504 tok/s.
Status aliases: training/agillm43_shared/status/latest_inference.md and .json.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support