N8Programs
/

NextTerm-440M-Checkpoints

Model card Files Files and versions

NextTerm-440M-Checkpoints / README.md

N8Programs's picture

Add files using upload-large-folder tool

451a228 verified 3 days ago

|

history blame contribute delete

2.46 kB

	---
	license: cc-by-sa-4.0
	library_name: transformers
	tags:
	- oeis
	- qwen3
	- causal-lm
	- checkpoint
	---

	# NextTerm-440M Checkpoints

	Transformers-compatible checkpoints from the OEIS NextTerm-440M run.

	These checkpoints use a Qwen3-style causal LM architecture with a 16-token OEIS digit vocabulary. They were converted from the training checkpoints by remapping the custom interleaved RoPE basis into the Hugging Face / Qwen split-half RoPE basis, so they can be loaded directly with `AutoModelForCausalLM`.

	## Checkpoints

	\| Folder \| Tokens trained \| Notes \|
	\| --- \| ---: \| --- \|
	\| `checkpoints/final_latest` \| 13,999,999,995 \| Final checkpoint; recommended default \|
	\| `checkpoints/best_val` \| 9,500,200,875 \| Best validation-loss checkpoint \|
	\| `checkpoints/checkpoint_tokens_012000258345` \| 12,000,258,345 \| Historical checkpoint \|
	\| `checkpoints/checkpoint_tokens_012500265837` \| 12,500,265,837 \| Historical checkpoint \|
	\| `checkpoints/checkpoint_tokens_013000266889` \| 13,000,266,889 \| Historical checkpoint \|
	\| `checkpoints/checkpoint_tokens_013500289737` \| 13,500,289,737 \| Historical checkpoint \|

	## OEIS Vocab

	The model is token-ID based; no text tokenizer is included.

	\| Token ID \| Meaning \|
	\| ---: \| --- \|
	\| `0`-`9` \| decimal digits \|
	\| `10` \| negative sign \|
	\| `11` \| term separator \|
	\| `12` \| BOS \|
	\| `13` \| EOS \|
	\| `14` \| PAD \|
	\| `15` \| reserved \|

	For next-term generation, stop on any of `[11, 13, 14]`.

	## Loading

	```python
	import torch
	from transformers import AutoModelForCausalLM

	model = AutoModelForCausalLM.from_pretrained(
	"N8Programs/NextTerm-440M-Checkpoints",
	subfolder="checkpoints/final_latest",
	dtype=torch.bfloat16,
	device_map="auto",
	)
	```

	Example input IDs for the prefix `1, 2, 3, ...`:

	```python
	input_ids = torch.tensor([[12, 1, 11, 2, 11, 3, 11]], device=model.device)
	out = model.generate(
	input_ids,
	max_new_tokens=192,
	do_sample=False,
	eos_token_id=[11, 13, 14],
	pad_token_id=14,
	)
	```

	## Evaluation Notes

	OEIS Eval Neo excludes exact packed-sequence overlaps with the training data and uses `max_new_tokens=192`, which is sufficient for every answer in that eval set.

	Known OEIS Eval Neo results:

	\| Checkpoint \| Accuracy \|
	\| --- \| ---: \|
	\| `final_latest` \| 6545 / 19034 = 34.39% \|
	\| `best_val` \| 6477 / 19034 = 34.03% \|

	Each checkpoint folder includes an `oeis_checkpoint_meta.json` file with training tokens, source checkpoint path, and conversion details.