Text Generation
Transformers
Safetensors
MLX
qwen3
oeis
integer-sequences
causal-lm
text-generation-inference
Instructions to use N8Programs/NextTerm-440M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use N8Programs/NextTerm-440M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="N8Programs/NextTerm-440M")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("N8Programs/NextTerm-440M") model = AutoModelForCausalLM.from_pretrained("N8Programs/NextTerm-440M") - MLX
How to use N8Programs/NextTerm-440M with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("N8Programs/NextTerm-440M") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- vLLM
How to use N8Programs/NextTerm-440M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "N8Programs/NextTerm-440M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "N8Programs/NextTerm-440M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/N8Programs/NextTerm-440M
- SGLang
How to use N8Programs/NextTerm-440M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "N8Programs/NextTerm-440M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "N8Programs/NextTerm-440M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "N8Programs/NextTerm-440M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "N8Programs/NextTerm-440M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - MLX LM
How to use N8Programs/NextTerm-440M with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "N8Programs/NextTerm-440M" --prompt "Once upon a time"
- Docker Model Runner
How to use N8Programs/NextTerm-440M with Docker Model Runner:
docker model run hf.co/N8Programs/NextTerm-440M
File size: 7,110 Bytes
ef65102 5db721a ef65102 5db721a a46649b 5db721a a46649b 5db721a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | ---
license: cc-by-sa-4.0
library_name: transformers
pipeline_tag: text-generation
tags:
- oeis
- integer-sequences
- qwen3
- causal-lm
- mlx
datasets:
- N8Programs/oeis-massive
---
# NextTerm-440M
[](radar_chart.png)
## Model Summary
NextTerm-440M is a 440M parameter causal transformer trained to continue integer
sequences. It uses a Qwen3 architecture with a compact 16-token digit
vocabulary: decimal digits, negative sign, comma separator, BOS, EOS, PAD, and one unused token.
The model was trained on an extended OEIS corpus that enhanced many OEIS sequences with additional terms from b-files (supplemental appendices provided with OEIS) and then further augmented the data w/ a variety of prefix-preserving transforms empirically selected via small pilot experiments. The model was trained for 14B tokens w/ preserved sequence prefixes rather than concatenating distinct documents (as this was found to improve performance in pilot experiments).
NextTerm-440M improves dramatically over NextTerm-47M on long-context sequence continuation (as it was trained w/ a context length of 4096), innate OEIS knowledge, and long-range in context learning. The 47M model, however, remains ahead on very short prefixes that require simple rule induction without much context, which may be due to the 440M model's training on longer contexts and more complex sequences.
The tokenizer accepts integer sequences formatted as comma-separated values, for
example:
```text
1,-2,3,-4,
```
The tokenizer ignores characters other than digits, commas, and `-`. Digits are
tokenized individually, so there is no fixed integer-magnitude limit, but large
integers consume more context. The model was not trained on numbers with leading
zeros, so strings like `01,02,03,` should be treated as out of distribution.
## Training Details
| Field | Value |
| --- | --- |
| Parameters | 440,500,224 |
| Architecture | Qwen3-style causal LM |
| Layers | 28 |
| Hidden size | 1024 |
| FFN size | 3072 |
| Attention heads | 16 |
| KV heads | 8 |
| Vocabulary size | 16 |
| Training tokens | 13,999,999,995 |
| Sequence length cap | 4096 training tokens per sequence |
| Batch mode | Length-bucketed sequence batches |
| Optimizer | Muon/AdamW hybrid |
| LR schedule | Linear warmup to `1e-2` for Muon, `1e-4` for AdamW, cosine decay to 0.1x, final cooldown to 0 |
| Training hardware | Single H100 |
| Export dtype | bfloat16 |
A classic Muon/AdamW hybrid was used: Muon for 2D weight matrices and AdamW for 1D parameters and embedding matrices.
The model was trained on the following files in the `N8Programs/oeis-massive` dataset, randomly mixed:
- `oeis_train_bfile_prefix4096.packed`
- `oeis_synth_aug0_inv_len_13245370099_seed0.packed`
## Evaluation Results
### Main Benchmarks
| Model | OEIS-Eval-Neo | Ryskina & Knight | M1 Competition 111 MAPE |
| --- | ---: | ---: | ---: |
| **NextTerm-440M** | **34.43%** | 52.63% | **17.6239** |
| NextTerm-47M | 29.49% | **70.18%** | 18.7621 |
| Qwen3-0.6B | 18.44% | 33.33% | 22.7984 |
| Qwen3-1.7B | 20.77% | 49.12% | 22.2411 |
| Qwen3-4B | 23.74% | 63.16% | 19.1731 |
| Qwen3-8B | 24.62% | 57.89% | 18.4027 |
| Qwen3-14B | 26.00% | 59.65% | 17.9837 |
OEIS-Eval-Neo is a decontaminated held-out OEIS next-term evaluation. M1
Competition 111 reports macro MAPE, where lower is better. Ryskina & Knight
(2021) is a 57-sequence next-term benchmark based on psychometrics and puzzles. Note that the 47M model's strong performance on Ryskina & Knight is indicative of its strength on short-prefix sequences and rule induction.
### Polynomial Continuation
The polynomial continuation evaluation samples integer sequences from
polynomials of degree 1 through 4 and asks for the next term. Accuracy is exact
match across 200 samples for each prompt length `k`.
| Model | Arithmetic | Quadratic | Cubic | Quartic |
| --- | ---: | ---: | ---: | ---: |
| **NextTerm-440M** | 94.38% | **86.39%** | **75.20%** | **67.83%** |
| NextTerm-47M | 94.15% | 81.07% | 37.43% | 15.17% |
| Qwen3-0.6B | 90.31% | 8.72% | 0.30% | 0.02% |
| Qwen3-1.7B | 93.10% | 41.57% | 5.36% | 0.71% |
| Qwen3-4B | 93.90% | 77.26% | 28.18% | 5.98% |
| Qwen3-8B | **96.10%** | 80.59% | 32.93% | 7.95% |
| Qwen3-14B | 95.60% | 84.61% | 49.16% | 14.98% |
## Usage
### MLX
```bash
mlx_lm.generate --model N8Programs/NextTerm-440M --prompt "1,2,3,"
```
### Hugging Face Transformers
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "N8Programs/NextTerm-440M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
)
prompt = "1,2,3,4,5,"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=64,
do_sample=False,
eos_token_id=[tokenizer.convert_tokens_to_ids(","), tokenizer.eos_token_id],
pad_token_id=tokenizer.pad_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
For strict next-term evaluation, stop generation on *comma or EOS* and parse the
text before the first comma as the predicted integer.
## Reproducibility
This repository contains the local evaluation scripts and artifacts used for the
results above, including the small evaluation datasets needed to rerun them:
- `oeis_eval_mlx_neo.py` for OEIS-Eval-Neo with MLX batch generation.
- `arithmetic_eval.py` for arithmetic/quadratic/cubic/quartic continuation.
- `eval_m1_competition_mape_mlx.py` for M1 Competition 111 MAPE.
- `oeis_val_neo.jsonl` for OEIS-Eval-Neo.
- `m1_competition_111.jsonl` for M1 Competition 111.
- `eval_results.txt` for the compact result table.
The last three training checkpoints are available separately at
[N8Programs/NextTerm-440M-Checkpoints](https://huggingface.co/N8Programs/NextTerm-440M-Checkpoints).
The released `final_latest` checkpoint was trained for 14B tokens. Additionally, the checkpoint corresponding to the best val loss is available as well (although it is not included in the main results table as it was inferior on downstream eval performance).
The `.packed` files used for training are binary files containing the tokenized and augmented OEIS data - w/ tokens encoded as nibbles. A dedicate decoder is provided in this repo as `decode_packed_oeis.py`.
## Citation
```bibtex
@misc{nextterm440m2026,
author = {Nathan Breslow},
title = {NextTerm-440M: A Pretrained Transformer for Integer Sequence Prediction},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/N8Programs/NextTerm-440M}},
note = {440.5M parameter model trained on augmented OEIS data}
}
```
## Attribution
This model and dataset were trained and created using data from the
**On-Line Encyclopedia of Integer Sequences (OEIS)**.
- Source: https://oeis.org/
- License: Creative Commons Attribution-ShareAlike 4.0 (CC BY-SA 4.0)
- OEIS End-User License Agreement: https://oeis.org/wiki/The_OEIS_End-User_License_Agreement
|