YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
AGILLM 4.3 โ Autoregressive + DiffusionBlock + MoE Language Model
Single-file implementation: agillm41.py
Parameters: 1.22B (1,221,580,802)
Architecture: d_model=1280, layers=28, heads=20, d_k=64, rank=160 (2.5ร expansion), tied weights
โ ๏ธ CHECKPOINT PROVENANCE โ READ FIRST
Checkpoint filenames (e.g. pretrain_step00050650.pt) reflect the step counter within the current training run, NOT total training steps.
This repo contains multiple checkpoint lineages. The 2026-06-24 pretrain_step00050650.pt artifact did warm-start from step 2,182,564 (~2.1M) of a prior run, but that is historical provenance, not the current recovery base. Do not restart current AGILLM4.3 recovery from raw pretrain_step02182564.pt unless explicitly doing a clean historical rollback experiment.
| Artifact / run | Meaning |
|---|---|
pretrain_step00050650.pt |
Historical current-run step 50,650 after the 2,182,564 warm-start. |
pretrain_step00243186_from00050650_20260630T1811Z.pt |
Later-lineage v100a0 checkpoint selected for the 2026-07-01 recovery because its June 30 inference was materially better than the July 1 latest delta. |
pretrain_step00359091.pt + FedC delta pretrain_delta_step00030961_from00359091_20260701T0522Z.pt |
July 1 path that produced fragment/date-token regression in AR/SAT/NAT smoke tests; do not report this quality as healthy. |
Current recovery checkpoint on HF:
checkpoints/pretrain_step00243186_from00050650_20260630T1811Z.pt
Architecture
| Component | Value |
|---|---|
| Backbone | Autoregressive transformer (AR) |
| DiffusionBlocks | Active โ layers cycle AR/SAT/NAT objectives |
| Mixture-of-Experts | Active โ 14 slots per block |
| d_model | 1280 |
| Layers | 28 |
| Attention heads | 20 |
| Tied weights | Yes |
| Tokenizer | Llama-compatible (from checkpoint) |
Training Fleet (as of 2026-06-24)
- FedA (41441116): 2ร V100-SXM2-32GB,
ssh2.vast.ai:11116, $0.0593/hr- a0: role=coverage, B=56, L=1536
- a1: role=hard-blocks, B=48, L=1536
- Target: 67.2B tokens total
- Budget runway: ~Jul 24, 2026
Current Recovery Run (2026-07-01)
- FedC Vast host:
ae2bb300509f/ RTX 3090 Ti. - Live recovery PID at verification:
7100. - Warm-start:
checkpoints/pretrain_step00243186_from00050650_20260630T1811Z.pt(v100a0 later-lineage checkpoint, SHA256e65d65ba82239f28e10188767fe16ba091dad11c60bb57aac346ded684604349). - Corrected source mix: FineWeb, FineWeb-Edu sample-10BT, Wikipedia 20231101.en, C4 en, OpenWebText, Falcon-RefinedWeb, Proof-Pile-2.
- Excluded from AGILLM4.3 pretraining: local AGILLM3 numeracy JSONL (
/workspace/agillm_math_numeracy_synth/train.jsonl) and Dolma sample source. - Initial corrected validation:
ce=9.1199; first stable progress linestep=101,61962.18 tok/s,loss=6.818.
Inference
# AR mode (standard autoregressive)
python3 agillm41.py infer \
--ckpt checkpoints/warmstart_step2182564__current_step50650/pretrain_step00050650.pt \
--prompt "Your prompt here" \
--mode ar --max_new 100 --plain-output --block_stream
# SAT mode (score-and-threshold diffusion)
python3 agillm41.py infer ... --mode sat
# NAT mode (non-autoregressive diffusion)
python3 agillm41.py infer ... --mode nat
Note: If both GPUs are busy with training, add
CUDA_VISIBLE_DEVICES=""to force CPU inference (slow but functional: ~1.2 tok/s).
Dependency:
agillm_checkpoint_provenance.pymust be in the same directory asagillm41.py.
Current Inference Quality / Recovery Status (2026-07-01)
See INFERENCE_QUALITY.md for AR/SAT/NAT benchmark outputs and regression notes.
The July 1 FedC latest-delta smoke test was not healthy: AR/SAT/NAT outputs were dominated by date/number/token fragments. Treat that as a quality regression, not as a pass.
A corrected FedC recovery run is live from later-lineage v100a0 checkpoint pretrain_step00243186_from00050650_20260630T1811Z.pt, whose archived June 30 AR sample was materially better than the regressed July 1 delta. At launch, the corrected run used the language/generic-math mix only and validated with language_mix=True numeracy=False.
Before reporting model quality healthy again, run AR + SAT + NAT inference on the next saved checkpoint from the recovery run and record it in INFERENCE_QUALITY.md.
Repositories
| Repo | Type | Notes |
|---|---|---|
Marxist-Leninist/agillm4.3-private |
GitHub private | Source of truth for code |
Marxist-Leninist/AGILLM4.3 |
GitHub public | Mirror |
Marxist-Leninist/AGILLM4.1 |
GitHub public | Mirror (same codebase) |
Marxist-Leninist/agillm4.1-private |
GitHub private | Mirror |
OpenTransformer/AGILLM-4.3 |
HuggingFace public | Code, inference artifacts, and active recovery checkpoints |
OpenTransformer/agillm4.3-private |
HuggingFace private | Historical/private mirror; do not use for active recovery checkpoint uploads unless explicitly requested |
OpenTransformer/AGILLM-4.3 |
HuggingFace public | Code + checkpoints |
For Future Claude/AI Agents
MCP memory (Silicon Goddess) slot index for AGILLM4.3 state: slots 42, 95, 481โ525+.
Standing instruction: always run AR + SAT + NAT inference checks before reporting training healthy. See INFERENCE_QUALITY.md.
Latest Inference Smoke Test - 2026-06-26
Latest smoke-test artifacts were uploaded under training/agillm43_shared/inference/20260626T183400Z/.
- Monolithic latest-checkpoint AR:
/workspace/agillm4_v100a0_ckpts/pretrain_step00065633_from00050650_20260626T1811Z.pt, 32 tokens at 5.0 tok/s on CPU. - Distributed AR: existing 2026-06-06 split packages across GETH/MCP/Prime/communist-web, 32 tokens at 1.504 tok/s.
- Status aliases:
training/agillm43_shared/status/latest_inference.mdand.json.