---
license: cc-by-sa-4.0
library_name: transformers
tags:
  - oeis
  - qwen3
  - causal-lm
  - checkpoint
---

# NextTerm-440M Checkpoints

Transformers-compatible checkpoints from the OEIS NextTerm-440M run.

These checkpoints use a Qwen3-style causal LM architecture with a 16-token OEIS digit vocabulary. They were converted from the training checkpoints by remapping the custom interleaved RoPE basis into the Hugging Face / Qwen split-half RoPE basis, so they can be loaded directly with `AutoModelForCausalLM`.

## Checkpoints

| Folder | Tokens trained | Notes |
| --- | ---: | --- |
| `checkpoints/final_latest` | 13,999,999,995 | Final checkpoint; recommended default |
| `checkpoints/best_val` | 9,500,200,875 | Best validation-loss checkpoint |
| `checkpoints/checkpoint_tokens_012000258345` | 12,000,258,345 | Historical checkpoint |
| `checkpoints/checkpoint_tokens_012500265837` | 12,500,265,837 | Historical checkpoint |
| `checkpoints/checkpoint_tokens_013000266889` | 13,000,266,889 | Historical checkpoint |
| `checkpoints/checkpoint_tokens_013500289737` | 13,500,289,737 | Historical checkpoint |

## OEIS Vocab

The model is token-ID based; no text tokenizer is included.

| Token ID | Meaning |
| ---: | --- |
| `0`-`9` | decimal digits |
| `10` | negative sign |
| `11` | term separator |
| `12` | BOS |
| `13` | EOS |
| `14` | PAD |
| `15` | reserved |

For next-term generation, stop on any of `[11, 13, 14]`.

## Loading

```python
import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "N8Programs/NextTerm-440M-Checkpoints",
    subfolder="checkpoints/final_latest",
    dtype=torch.bfloat16,
    device_map="auto",
)
```

Example input IDs for the prefix `1, 2, 3, ...`:

```python
input_ids = torch.tensor([[12, 1, 11, 2, 11, 3, 11]], device=model.device)
out = model.generate(
    input_ids,
    max_new_tokens=192,
    do_sample=False,
    eos_token_id=[11, 13, 14],
    pad_token_id=14,
)
```

## Evaluation Notes

OEIS Eval Neo excludes exact packed-sequence overlaps with the training data and uses `max_new_tokens=192`, which is sufficient for every answer in that eval set.

Known OEIS Eval Neo results:

| Checkpoint | Accuracy |
| --- | ---: |
| `final_latest` | 6545 / 19034 = 34.39% |
| `best_val` | 6477 / 19034 = 34.03% |

Each checkpoint folder includes an `oeis_checkpoint_meta.json` file with training tokens, source checkpoint path, and conversion details.