N8Programs's picture
Add files using upload-large-folder tool
451a228 verified
---
license: cc-by-sa-4.0
library_name: transformers
tags:
- oeis
- qwen3
- causal-lm
- checkpoint
---
# NextTerm-440M Checkpoints
Transformers-compatible checkpoints from the OEIS NextTerm-440M run.
These checkpoints use a Qwen3-style causal LM architecture with a 16-token OEIS digit vocabulary. They were converted from the training checkpoints by remapping the custom interleaved RoPE basis into the Hugging Face / Qwen split-half RoPE basis, so they can be loaded directly with `AutoModelForCausalLM`.
## Checkpoints
| Folder | Tokens trained | Notes |
| --- | ---: | --- |
| `checkpoints/final_latest` | 13,999,999,995 | Final checkpoint; recommended default |
| `checkpoints/best_val` | 9,500,200,875 | Best validation-loss checkpoint |
| `checkpoints/checkpoint_tokens_012000258345` | 12,000,258,345 | Historical checkpoint |
| `checkpoints/checkpoint_tokens_012500265837` | 12,500,265,837 | Historical checkpoint |
| `checkpoints/checkpoint_tokens_013000266889` | 13,000,266,889 | Historical checkpoint |
| `checkpoints/checkpoint_tokens_013500289737` | 13,500,289,737 | Historical checkpoint |
## OEIS Vocab
The model is token-ID based; no text tokenizer is included.
| Token ID | Meaning |
| ---: | --- |
| `0`-`9` | decimal digits |
| `10` | negative sign |
| `11` | term separator |
| `12` | BOS |
| `13` | EOS |
| `14` | PAD |
| `15` | reserved |
For next-term generation, stop on any of `[11, 13, 14]`.
## Loading
```python
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"N8Programs/NextTerm-440M-Checkpoints",
subfolder="checkpoints/final_latest",
dtype=torch.bfloat16,
device_map="auto",
)
```
Example input IDs for the prefix `1, 2, 3, ...`:
```python
input_ids = torch.tensor([[12, 1, 11, 2, 11, 3, 11]], device=model.device)
out = model.generate(
input_ids,
max_new_tokens=192,
do_sample=False,
eos_token_id=[11, 13, 14],
pad_token_id=14,
)
```
## Evaluation Notes
OEIS Eval Neo excludes exact packed-sequence overlaps with the training data and uses `max_new_tokens=192`, which is sufficient for every answer in that eval set.
Known OEIS Eval Neo results:
| Checkpoint | Accuracy |
| --- | ---: |
| `final_latest` | 6545 / 19034 = 34.39% |
| `best_val` | 6477 / 19034 = 34.03% |
Each checkpoint folder includes an `oeis_checkpoint_meta.json` file with training tokens, source checkpoint path, and conversion details.