Instructions to use N8Programs/NextTerm-440M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use N8Programs/NextTerm-440M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="N8Programs/NextTerm-440M")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("N8Programs/NextTerm-440M")
model = AutoModelForCausalLM.from_pretrained("N8Programs/NextTerm-440M")

MLX

How to use N8Programs/NextTerm-440M with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm
# if on a CUDA device, also pip install mlx[cuda]

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("N8Programs/NextTerm-440M")

prompt = "Once upon a time in"
text = generate(model, tokenizer, prompt=prompt, verbose=True)

Inference
Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

vLLM

How to use N8Programs/NextTerm-440M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "N8Programs/NextTerm-440M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "N8Programs/NextTerm-440M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/N8Programs/NextTerm-440M

SGLang

How to use N8Programs/NextTerm-440M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "N8Programs/NextTerm-440M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "N8Programs/NextTerm-440M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "N8Programs/NextTerm-440M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "N8Programs/NextTerm-440M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

MLX LM

How to use N8Programs/NextTerm-440M with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Generate some text
mlx_lm.generate --model "N8Programs/NextTerm-440M" --prompt "Once upon a time"

Docker Model Runner
How to use N8Programs/NextTerm-440M with Docker Model Runner:
```
docker model run hf.co/N8Programs/NextTerm-440M
```

N8Programs commited on 2 days ago

Commit

a46649b

verified ·

1 Parent(s): 5db721a

Bundle evaluation datasets with model card scripts

Browse files

Files changed (10) hide show

README.md +3 -1
arithmetic_eval.py +14 -2
eval_m1_competition_mape_mlx.py +14 -2
eval_m1_monthly_mape_mlx.py +583 -0
m1_competition_111.jsonl +0 -0
m1_monthly_dataset.txt +0 -0
oeis_eval_mlx_neo.py +14 -2
oeis_val_neo.jsonl +0 -0
oeis_val_neo.meta.json +9 -0
oeis_val_neo_excluded_exact_packed_matches.tsv +74 -0

README.md CHANGED Viewed

@@ -142,11 +142,13 @@ text before the first comma as the predicted integer.
 ## Reproducibility
 This repository contains the local evaluation scripts and artifacts used for the
-results above, including:
 - `oeis_eval_mlx_neo.py` for OEIS-Eval-Neo with MLX batch generation.
 - `arithmetic_eval.py` for arithmetic/quadratic/cubic/quartic continuation.
 - `eval_m1_competition_mape_mlx.py` for M1 Competition 111 MAPE.
 - `eval_results.txt` for the compact result table.

 ## Reproducibility
 This repository contains the local evaluation scripts and artifacts used for the
+results above, including the small evaluation datasets needed to rerun them:
 - `oeis_eval_mlx_neo.py` for OEIS-Eval-Neo with MLX batch generation.
 - `arithmetic_eval.py` for arithmetic/quadratic/cubic/quartic continuation.
 - `eval_m1_competition_mape_mlx.py` for M1 Competition 111 MAPE.
+- `oeis_val_neo.jsonl` for OEIS-Eval-Neo.
+- `m1_competition_111.jsonl` for M1 Competition 111.
 - `eval_results.txt` for the compact result table.

arithmetic_eval.py CHANGED Viewed

@@ -19,6 +19,7 @@ import argparse
 import csv
 import random
 from dataclasses import dataclass
 from typing import Dict, List, Tuple
 from mlx_lm import load
@@ -26,8 +27,19 @@ from mlx_lm.generate import BatchGenerator
 from mlx_lm.tuner.utils import print_trainable_parameters
 from tqdm import tqdm
-# Step 335000
-MODEL_NAME = "NextTerm-47M"
 MAX_NEW_TOKENS = 20
 DEFAULT_MAX_TERMS = 25
 SAMPLES_PER_K = 200

 import csv
 import random
 from dataclasses import dataclass
+from pathlib import Path
 from typing import Dict, List, Tuple
 from mlx_lm import load
 from mlx_lm.tuner.utils import print_trainable_parameters
 from tqdm import tqdm
+SCRIPT_DIR = Path(__file__).resolve().parent
+def default_model_path() -> str:
+    if (SCRIPT_DIR / "model.safetensors").exists():
+        return str(SCRIPT_DIR)
+    local_model = SCRIPT_DIR / "NextTerm-47M"
+    if local_model.exists():
+        return str(local_model)
+    return "N8Programs/NextTerm-47M"
+MODEL_NAME = default_model_path()
 MAX_NEW_TOKENS = 20
 DEFAULT_MAX_TERMS = 25
 SAMPLES_PER_K = 200

eval_m1_competition_mape_mlx.py CHANGED Viewed

@@ -28,8 +28,20 @@ from eval_m1_monthly_mape_mlx import (
 )
-DATA_PATH = Path("m1_competition_111.jsonl")
-MODEL_PATH = Path("NextTerm-440M")
 OUTPUT_PATH = Path("m1_eval_results/m1_competition111_nextterm440m_greedy_per_series.jsonl")
 SUMMARY_PATH = Path("m1_eval_results/m1_competition111_nextterm440m_greedy_summary.json")

 )
+SCRIPT_DIR = Path(__file__).resolve().parent
+def default_model_path() -> Path:
+    if (SCRIPT_DIR / "model.safetensors").exists():
+        return SCRIPT_DIR
+    local_model = SCRIPT_DIR / "NextTerm-440M"
+    if local_model.exists():
+        return local_model
+    return Path("N8Programs/NextTerm-440M")
+DATA_PATH = SCRIPT_DIR / "m1_competition_111.jsonl"
+MODEL_PATH = default_model_path()
 OUTPUT_PATH = Path("m1_eval_results/m1_competition111_nextterm440m_greedy_per_series.jsonl")
 SUMMARY_PATH = Path("m1_eval_results/m1_competition111_nextterm440m_greedy_summary.json")

eval_m1_monthly_mape_mlx.py ADDED Viewed

	@@ -0,0 +1,583 @@

+#!/usr/bin/env python3
+"""Evaluate NextTerm-style MLX models on the M1 monthly forecasting dataset."""
+from __future__ import annotations
+import argparse
+import gc
+import json
+import math
+import re
+import time
+from dataclasses import dataclass
+from decimal import Decimal, InvalidOperation, localcontext
+from pathlib import Path
+from statistics import mean, median
+import mlx.core as mx
+from mlx_lm import load
+from mlx_lm.generate import BatchGenerator
+from tqdm import tqdm
+SCRIPT_DIR = Path(__file__).resolve().parent
+def default_model_path() -> Path:
+    if (SCRIPT_DIR / "model.safetensors").exists():
+        return SCRIPT_DIR
+    local_model = SCRIPT_DIR / "NextTerm-47M"
+    if local_model.exists():
+        return local_model
+    return Path("N8Programs/NextTerm-47M")
+DATA_PATH = SCRIPT_DIR / "m1_monthly_dataset.txt"
+MODEL_PATH = default_model_path()
+OUTPUT_PATH = Path("m1_eval_results/m1_monthly_nextterm47m_per_series.jsonl")
+SUMMARY_PATH = Path("m1_eval_results/m1_monthly_nextterm47m_summary.json")
+@dataclass
+class SeriesRecord:
+    row_index: int
+    series_name: str
+    start_timestamp: str
+    raw_values: list[str]
+    values: list[Decimal]
+    context_values: list[Decimal]
+    target_values: list[Decimal]
+    scale: int
+    scaled_context: list[int]
+class SuppressTokenLogits:
+    """Set selected token logits to a large negative value."""
+    def __init__(self, token_ids: list[int]):
+        self.token_ids = sorted({int(t) for t in token_ids if t is not None and int(t) >= 0})
+        self._bias_by_width: dict[int, mx.array] = {}
+    def __call__(self, tokens: mx.array, logits: mx.array) -> mx.array:
+        width = int(logits.shape[-1])
+        bias = self._bias_by_width.get(width)
+        if bias is None:
+            values = [0.0] * width
+            for token_id in self.token_ids:
+                if token_id < width:
+                    values[token_id] = -1.0e9
+            bias = mx.array(values, dtype=logits.dtype)
+            self._bias_by_width[width] = bias
+        return logits + bias
+def parse_decimal(raw: str) -> Decimal:
+    raw = raw.strip()
+    try:
+        value = Decimal(raw)
+    except InvalidOperation as exc:
+        raise ValueError(f"Could not parse decimal value {raw!r}") from exc
+    if not value.is_finite():
+        raise ValueError(f"Non-finite decimal value {raw!r}")
+    return value
+def context_scale(raw_context_values: list[str]) -> int:
+    """Choose an integer scale using only visible decimal precision in context."""
+    max_places = 0
+    for raw in raw_context_values:
+        value = parse_decimal(raw)
+        exponent = value.as_tuple().exponent
+        if exponent < 0:
+            max_places = max(max_places, -exponent)
+    return 10**max_places
+def scale_decimal_to_int(value: Decimal, scale: int) -> int:
+    scaled = value * Decimal(scale)
+    with localcontext() as ctx:
+        ctx.prec = max(50, len(scaled.as_tuple().digits) + 10)
+        rounded = scaled.to_integral_value()
+    if rounded != scaled:
+        # This can only happen if the held-out side has more precision than the
+        # context-derived scale. It is fine for scoring, but not for prompting.
+        raise ValueError(f"Context scale {scale} does not make {value} integral")
+    return int(rounded)
+def load_m1_monthly(path: Path, horizon: int | None = None) -> tuple[list[SeriesRecord], int]:
+    records: list[SeriesRecord] = []
+    parsed_horizon: int | None = horizon
+    in_data = False
+    with path.open("r", encoding="latin-1", newline=None) as f:
+        for raw_line in f:
+            line = raw_line.strip()
+            if not line:
+                continue
+            if line.startswith("@horizon") and parsed_horizon is None:
+                parts = line.split()
+                if len(parts) >= 2:
+                    parsed_horizon = int(parts[1])
+                continue
+            if line == "@data":
+                in_data = True
+                continue
+            if not in_data or line.startswith("#") or line.startswith("@"):
+                continue
+            try:
+                series_name, start_timestamp, values_blob = line.split(":", 2)
+            except ValueError as exc:
+                raise ValueError(f"Malformed M1 data line: {line[:120]!r}") from exc
+            raw_values = [part.strip() for part in values_blob.split(",") if part.strip()]
+            values = [parse_decimal(part) for part in raw_values]
+            if parsed_horizon is None:
+                raise ValueError("No horizon provided and no @horizon metadata found")
+            if len(values) <= parsed_horizon:
+                continue
+            raw_context = raw_values[:-parsed_horizon]
+            scale = context_scale(raw_context)
+            context_values = values[:-parsed_horizon]
+            target_values = values[-parsed_horizon:]
+            scaled_context = [scale_decimal_to_int(v, scale) for v in context_values]
+            records.append(
+                SeriesRecord(
+                    row_index=len(records),
+                    series_name=series_name,
+                    start_timestamp=start_timestamp,
+                    raw_values=raw_values,
+                    values=values,
+                    context_values=context_values,
+                    target_values=target_values,
+                    scale=scale,
+                    scaled_context=scaled_context,
+                )
+            )
+    if parsed_horizon is None:
+        raise ValueError("No horizon provided and no @horizon metadata found")
+    return records, parsed_horizon
+def parse_generated_terms(text: str, limit: int) -> tuple[list[int], bool]:
+    terms: list[int] = []
+    current: list[str] = []
+    malformed = False
+    def flush_current() -> None:
+        nonlocal malformed
+        if not current:
+            return
+        token = "".join(current)
+        current.clear()
+        if token == "-":
+            malformed = True
+            return
+        try:
+            terms.append(int(token))
+        except ValueError:
+            malformed = True
+    for ch in text:
+        if len(terms) >= limit:
+            break
+        if ch.isdigit() or (ch == "-" and not current):
+            current.append(ch)
+        elif ch == ",":
+            flush_current()
+        elif ch.isspace():
+            continue
+        else:
+            malformed = True
+            flush_current()
+            break
+    if len(terms) < limit:
+        flush_current()
+    return terms[:limit], malformed
+def as_float(value: Decimal) -> float:
+    return float(value)
+def ape(actual: Decimal, prediction: Decimal) -> float | None:
+    if actual == 0:
+        return None
+    return float(abs(actual - prediction) / abs(actual) * Decimal(100))
+def mape_for_predictions(
+    actuals: list[Decimal],
+    predictions: list[Decimal | None],
+    *,
+    missing_policy: str,
+    fallback: Decimal,
+) -> tuple[float | None, list[float | None], int]:
+    apes: list[float | None] = []
+    missing = 0
+    for actual, prediction in zip(actuals, predictions):
+        pred = prediction
+        if pred is None:
+            missing += 1
+            if missing_policy == "skip":
+                apes.append(None)
+                continue
+            if missing_policy == "zero":
+                pred = Decimal(0)
+            elif missing_policy == "last_context":
+                pred = fallback
+            else:
+                raise ValueError(f"Unknown missing policy: {missing_policy}")
+        apes.append(ape(actual, pred))
+    valid = [x for x in apes if x is not None and math.isfinite(x)]
+    return (mean(valid) if valid else None), apes, missing
+def seasonal_naive_predictions(context: list[Decimal], horizon: int, season: int = 12) -> list[Decimal]:
+    if not context:
+        return [Decimal(0)] * horizon
+    if len(context) < season:
+        return [context[-1]] * horizon
+    last_season = context[-season:]
+    return [last_season[i % season] for i in range(horizon)]
+def last_value_predictions(context: list[Decimal], horizon: int) -> list[Decimal]:
+    fallback = context[-1] if context else Decimal(0)
+    return [fallback] * horizon
+def load_completed(path: Path) -> dict[int, dict]:
+    completed: dict[int, dict] = {}
+    if not path.exists():
+        return completed
+    with path.open("r", encoding="utf-8") as f:
+        for line in f:
+            if not line.strip():
+                continue
+            record = json.loads(line)
+            completed[int(record["row_index"])] = record
+    return completed
+def aggregate_summary(records: list[dict], horizon: int) -> dict:
+    def collect(key: str) -> list[float]:
+        return [
+            float(r[key])
+            for r in records
+            if r.get(key) is not None and math.isfinite(float(r[key]))
+        ]
+    model_series = collect("mape")
+    seasonal_series = collect("seasonal_naive_mape")
+    last_series = collect("last_value_mape")
+    per_horizon: list[dict] = []
+    for h in range(horizon):
+        vals = []
+        seasonal_vals = []
+        last_vals = []
+        for r in records:
+            for source, target in [
+                ("apes", vals),
+                ("seasonal_naive_apes", seasonal_vals),
+                ("last_value_apes", last_vals),
+            ]:
+                xs = r.get(source) or []
+                if h < len(xs) and xs[h] is not None and math.isfinite(float(xs[h])):
+                    target.append(float(xs[h]))
+        per_horizon.append(
+            {
+                "horizon": h + 1,
+                "mape": mean(vals) if vals else None,
+                "seasonal_naive_mape": mean(seasonal_vals) if seasonal_vals else None,
+                "last_value_mape": mean(last_vals) if last_vals else None,
+                "n": len(vals),
+            }
+        )
+    all_apes = [
+        float(x)
+        for r in records
+        for x in (r.get("apes") or [])
+        if x is not None and math.isfinite(float(x))
+    ]
+    all_seasonal_apes = [
+        float(x)
+        for r in records
+        for x in (r.get("seasonal_naive_apes") or [])
+        if x is not None and math.isfinite(float(x))
+    ]
+    all_last_apes = [
+        float(x)
+        for r in records
+        for x in (r.get("last_value_apes") or [])
+        if x is not None and math.isfinite(float(x))
+    ]
+    return {
+        "series_count": len(records),
+        "model_macro_mape": mean(model_series) if model_series else None,
+        "model_median_series_mape": median(model_series) if model_series else None,
+        "model_point_mape": mean(all_apes) if all_apes else None,
+        "seasonal_naive_macro_mape": mean(seasonal_series) if seasonal_series else None,
+        "seasonal_naive_point_mape": mean(all_seasonal_apes) if all_seasonal_apes else None,
+        "last_value_macro_mape": mean(last_series) if last_series else None,
+        "last_value_point_mape": mean(all_last_apes) if all_last_apes else None,
+        "parsed_full_series": sum(1 for r in records if r.get("parsed_terms", 0) >= horizon),
+        "total_missing_terms": sum(int(r.get("missing_terms", 0)) for r in records),
+        "malformed_series": sum(1 for r in records if r.get("malformed", False)),
+        "per_horizon": per_horizon,
+    }
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--data-path", type=Path, default=DATA_PATH)
+    parser.add_argument("--model", type=Path, default=MODEL_PATH)
+    parser.add_argument("--output", type=Path, default=OUTPUT_PATH)
+    parser.add_argument("--summary-output", type=Path, default=SUMMARY_PATH)
+    parser.add_argument("--horizon", type=int, default=None)
+    parser.add_argument("--batch-size", type=int, default=64)
+    parser.add_argument("--max-new-tokens", type=int, default=384)
+    parser.add_argument("--max-context-tokens", type=int, default=2048)
+    parser.add_argument("--max-series", type=int, default=0)
+    parser.add_argument(
+        "--suppress-eos-until-horizon",
+        action="store_true",
+        help="Suppress EOS/pad logits and stop only after the requested separator count or length cap.",
+    )
+    parser.add_argument(
+        "--rerun-incomplete",
+        action="store_true",
+        help="When resuming, rerun rows whose previous record parsed fewer than horizon terms.",
+    )
+    parser.add_argument(
+        "--missing-policy",
+        choices=["zero", "last_context", "skip"],
+        default="zero",
+        help="How to score missing/unparseable forecast terms.",
+    )
+    parser.add_argument("--overwrite", action="store_true")
+    return parser.parse_args()
+def main() -> None:
+    args = parse_args()
+    started = time.perf_counter()
+    series, horizon = load_m1_monthly(args.data_path, args.horizon)
+    if args.max_series > 0:
+        series = series[: args.max_series]
+    print(f"Loaded {len(series)} M1 monthly series from {args.data_path}; horizon={horizon}")
+    model, tokenizer = load(str(args.model))
+    print(f"Loaded model: {args.model}")
+    prompts_text = [",".join(str(x) for x in s.scaled_context) + "," for s in series]
+    prompts = [tokenizer.encode(text) for text in prompts_text]
+    sep_token = tokenizer.encode("1,")[-1]
+    eval_indices = [
+        i
+        for i, prompt in enumerate(prompts)
+        if args.max_context_tokens <= 0 or len(prompt) < args.max_context_tokens
+    ]
+    skipped_long = len(prompts) - len(eval_indices)
+    eval_indices = sorted(eval_indices, key=lambda i: len(prompts[i]))
+    print(
+        f"Evaluating {len(eval_indices)} series; skipped_long={skipped_long}; "
+        f"batch_size={args.batch_size}; max_new_tokens={args.max_new_tokens}; sep_token={sep_token}"
+    )
+    args.output.parent.mkdir(parents=True, exist_ok=True)
+    args.summary_output.parent.mkdir(parents=True, exist_ok=True)
+    if args.overwrite and args.output.exists():
+        args.output.unlink()
+    completed = load_completed(args.output)
+    if args.rerun_incomplete:
+        incomplete = [
+            idx
+            for idx, record in completed.items()
+            if int(record.get("parsed_terms", 0)) < horizon
+        ]
+        for idx in incomplete:
+            completed.pop(idx, None)
+        if incomplete:
+            print(
+                "Rerunning incomplete rows: "
+                + ", ".join(str(idx) for idx in sorted(incomplete))
+            )
+    if completed:
+        print(f"Resuming from {args.output}: {len(completed)} series already done")
+    todo_indices = [i for i in eval_indices if i not in completed]
+    stop_tokens = [] if args.suppress_eos_until_horizon else [[tokenizer.eos_token_id]]
+    suppress_token_ids: list[int] = []
+    if getattr(tokenizer, "eos_token_id", None) is not None:
+        suppress_token_ids.append(tokenizer.eos_token_id)
+    if getattr(tokenizer, "pad_token_id", None) is not None:
+        if not args.suppress_eos_until_horizon:
+            stop_tokens.append([tokenizer.pad_token_id])
+        suppress_token_ids.append(tokenizer.pad_token_id)
+    suppress_processor = (
+        SuppressTokenLogits(suppress_token_ids) if args.suppress_eos_until_horizon else None
+    )
+    gen = BatchGenerator(
+        model,
+        stop_tokens=stop_tokens or None,
+        completion_batch_size=args.batch_size,
+        prefill_batch_size=args.batch_size,
+    )
+    uid_to_idx: dict[int, int] = {}
+    generated_tokens: dict[int, list[int]] = {}
+    separator_counts: dict[int, int] = {}
+    if todo_indices:
+        logits_processors = (
+            [[suppress_processor] for _ in todo_indices]
+            if suppress_processor is not None
+            else None
+        )
+        uids = gen.insert(
+            [prompts[i] for i in todo_indices],
+            [args.max_new_tokens] * len(todo_indices),
+            logits_processors=logits_processors,
+        )
+        uid_to_idx = {uid: idx for uid, idx in zip(uids, todo_indices)}
+        generated_tokens = {uid: [] for uid in uids}
+        separator_counts = {uid: 0 for uid in uids}
+    finished: set[int] = set()
+    try:
+        with args.output.open("a", encoding="utf-8") as out:
+            with tqdm(total=len(todo_indices), desc="Generating") as progress:
+                def finalize_uid(uid: int, finish_reason: str) -> None:
+                    finished.add(uid)
+                    idx = uid_to_idx[uid]
+                    s = series[idx]
+                    generated_text = tokenizer.decode(generated_tokens[uid])
+                    scaled_terms, malformed = parse_generated_terms(generated_text, horizon)
+                    predictions: list[Decimal | None] = [
+                        Decimal(term) / Decimal(s.scale) for term in scaled_terms
+                    ]
+                    predictions.extend([None] * (horizon - len(predictions)))
+                    fallback = s.context_values[-1] if s.context_values else Decimal(0)
+                    mape, apes, missing = mape_for_predictions(
+                        s.target_values,
+                        predictions,
+                        missing_policy=args.missing_policy,
+                        fallback=fallback,
+                    )
+                    seasonal_preds = seasonal_naive_predictions(s.context_values, horizon)
+                    seasonal_mape, seasonal_apes, _ = mape_for_predictions(
+                        s.target_values,
+                        seasonal_preds,
+                        missing_policy="skip",
+                        fallback=fallback,
+                    )
+                    last_preds = last_value_predictions(s.context_values, horizon)
+                    last_mape, last_apes, _ = mape_for_predictions(
+                        s.target_values,
+                        last_preds,
+                        missing_policy="skip",
+                        fallback=fallback,
+                    )
+                    record = {
+                        "row_index": idx,
+                        "series_name": s.series_name,
+                        "start_timestamp": s.start_timestamp,
+                        "scale": s.scale,
+                        "context_length": len(s.context_values),
+                        "prompt_tokens": len(prompts[idx]),
+                        "target": [str(x) for x in s.target_values],
+                        "scaled_prediction_terms": scaled_terms,
+                        "prediction": [str(x) if x is not None else None for x in predictions],
+                        "parsed_terms": len(scaled_terms),
+                        "missing_terms": missing,
+                        "malformed": malformed,
+                        "mape": mape,
+                        "apes": apes,
+                        "seasonal_naive_mape": seasonal_mape,
+                        "seasonal_naive_apes": seasonal_apes,
+                        "last_value_mape": last_mape,
+                        "last_value_apes": last_apes,
+                        "generated_text": generated_text,
+                        "finish_reason": finish_reason,
+                    }
+                    out.write(json.dumps(record) + "\n")
+                    out.flush()
+                    progress.update(1)
+                while todo_indices:
+                    responses = gen.next()
+                    if not isinstance(responses, tuple) or len(responses) != 2:
+                        raise RuntimeError(
+                            "Unexpected mlx_lm BatchGenerator.next() API. "
+                            "Update your mlx_lm version."
+                        )
+                    prompt_responses, generation_responses = responses
+                    if not prompt_responses and not generation_responses:
+                        break
+                    remove_uids: list[int] = []
+                    for response in generation_responses:
+                        uid = response.uid
+                        if uid in finished:
+                            continue
+                        if response.finish_reason != "stop":
+                            token = int(response.token)
+                            generated_tokens[uid].append(token)
+                            if token == sep_token:
+                                separator_counts[uid] += 1
+                                if separator_counts[uid] >= horizon:
+                                    finalize_uid(uid, "separator_count")
+                                    remove_uids.append(uid)
+                                    continue
+                        if response.finish_reason is not None:
+                            finalize_uid(uid, str(response.finish_reason))
+                    if remove_uids:
+                        gen.remove(remove_uids)
+                if len(finished) != len(todo_indices):
+                    raise RuntimeError(f"Finished {len(finished)}/{len(todo_indices)} series")
+    finally:
+        gen.close()
+        mx.clear_cache()
+        gc.collect()
+    all_records = [
+        r
+        for idx, r in sorted(load_completed(args.output).items())
+        if idx in set(eval_indices)
+    ]
+    aggregate = aggregate_summary(all_records, horizon)
+    elapsed = time.perf_counter() - started
+    summary = {
+        "data_path": str(args.data_path),
+        "model": str(args.model),
+        "output": str(args.output),
+        "horizon": horizon,
+        "series_loaded": len(series),
+        "series_evaluated": len(all_records),
+        "skipped_long": skipped_long,
+        "max_context_tokens": args.max_context_tokens,
+        "max_new_tokens": args.max_new_tokens,
+        "batch_size": args.batch_size,
+        "missing_policy": args.missing_policy,
+        "suppress_eos_until_horizon": args.suppress_eos_until_horizon,
+        "seconds": elapsed,
+        **aggregate,
+    }
+    args.summary_output.write_text(json.dumps(summary, indent=2) + "\n", encoding="utf-8")
+    print(json.dumps({k: summary[k] for k in [
+        "series_evaluated",
+        "model_macro_mape",
+        "model_point_mape",
+        "seasonal_naive_macro_mape",
+        "last_value_macro_mape",
+        "parsed_full_series",
+        "total_missing_terms",
+        "malformed_series",
+        "seconds",
+    ]}, indent=2))
+    print(f"Wrote {args.output}")
+    print(f"Wrote {args.summary_output}")
+if __name__ == "__main__":
+    main()

m1_competition_111.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

m1_monthly_dataset.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

oeis_eval_mlx_neo.py CHANGED Viewed

@@ -12,8 +12,20 @@ from mlx_lm import load
 from mlx_lm.generate import BatchGenerator
 from tqdm import tqdm
-DATA_PATH = Path("/Users/natebreslow/Documents/khashabiLab/bigOEIS/oeis_val_neo.jsonl")
-MODEL_NAME = "/Users/natebreslow/Documents/khashabiLab/bigOEIS/NextTerm-440M"
 MAX_NEW_TOKENS = 196
 MAX_CONTEXT_TOKENS = 4096
 BATCH_SIZE = 64

 from mlx_lm.generate import BatchGenerator
 from tqdm import tqdm
+SCRIPT_DIR = Path(__file__).resolve().parent
+def default_model_path() -> str:
+    if (SCRIPT_DIR / "model.safetensors").exists():
+        return str(SCRIPT_DIR)
+    local_model = SCRIPT_DIR / "NextTerm-440M"
+    if local_model.exists():
+        return str(local_model)
+    return "N8Programs/NextTerm-440M"
+DATA_PATH = SCRIPT_DIR / "oeis_val_neo.jsonl"
+MODEL_NAME = default_model_path()
 MAX_NEW_TOKENS = 196
 MAX_CONTEXT_TOKENS = 4096
 BATCH_SIZE = 64

oeis_val_neo.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

oeis_val_neo.meta.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "name": "oeis_val_neo",
+  "description": "OEIS Eval Neo: oeis_val_decontam with exact packed-sequence overlaps against the synthetic training shard removed.",
+  "source_jsonl": "/Users/natebreslow/Documents/khashabiLab/bigOEIS/oeis_val_decontam.jsonl",
+  "overlap_source_tsv": "/Users/natebreslow/Documents/khashabiLab/bigOEIS/packed_data/oeis_val_vs_synth_aug0_inv_len_13245370099_seed0.matches_with_ids.tsv",
+  "excluded_rows": 73,
+  "kept_rows": 19035,
+  "indexing": "needle_seq_id is 0-based and jsonl_line_1based = needle_seq_id + 1"
+}

oeis_val_neo_excluded_exact_packed_matches.tsv ADDED Viewed

	@@ -0,0 +1,74 @@

+needle_seq_id	jsonl_line_1based	oeis_id	match_pairs	first_haystack_seq_id	packed_content_tokens	terms	first_12_terms	last_3_terms
+545	546	A167176	2	15370293	209	105	[0, 1, 8, 0, 1, 8, 0, 1, 8, 0, 1, 8]	[0, 1, 8]
+812	813	A368056	11	5322373	13	6	[1, 2, 4, 8, 16, 17]	[8, 16, 17]
+892	893	A316981	7	4912247	17	7	[1, 1, 2, 6, 15, 40, 121]	[15, 40, 121]
+938	939	A000811	1	55273515	153	7	[16, 160, 13056, 183305216, 153746461690757120, 472614400151965797795362900461748224, 22974620880419880070513913016918297106276186594558904213019777370224066560]	[153746461690757120, 472614400151965797795362900461748224, 22974620880419880070513913016918297106276186594558904213019777370224066560]
+984	985	A166486	1	14419403	201	101	[0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1]	[1, 1, 0]
+1418	1419	A044795	1	6999629	182	39	[82, 182, 282, 382, 482, 582, 682, 782, 829, 882, 982, 1082]	[3382, 3482, 3582]
+1561	1562	A273720	1	48019539	224	28	[1, 3, 8, 21, 57, 162, 479, 1458, 4528, 14259, 45349, 145289]	[2403295565913, 7966021263923, 26425616887971]
+1870	1871	A044758	1	41462953	182	39	[45, 145, 245, 345, 445, 459, 545, 645, 745, 845, 945, 1045]	[3345, 3445, 3459]
+2152	2153	A249541	1	23907330	55	9	[3, 4, 5, 17, 257, 65537, 83623937, 4294967297, 6992962672132097]	[83623937, 4294967297, 6992962672132097]
+2375	2376	A279069	1	57156	142	56	[1, 2, 2, 4, 16, 4, 2, 18, 4, 4, 6, 12]	[6, 22, 4]
+2429	2430	A010699	1	67423087	161	81	[2, 9, 2, 9, 2, 9, 2, 9, 2, 9, 2, 9]	[2, 9, 2]
+2537	2538	A048058	2	13019375	214	51	[11, 13, 17, 23, 31, 41, 53, 67, 83, 101, 121, 143]	[2363, 2461, 2561]
+3059	3060	A051637	28	1481960	10	5	[1, 2, 3, 7, 10]	[3, 7, 10]
+3298	3299	A179101	5	17275932	15	7	[2, 3, 5, 6, 8, 11, 14]	[8, 11, 14]
+3312	3313	A084528	1	23488569	24	8	[1, 2, 5, 17, 59, 201, 703, 2405]	[201, 703, 2405]
+3440	3441	A122059	3	4681413	17	9	[1, 0, 0, 1, 1, 2, 3, 0, 4]	[3, 0, 4]
+3507	3508	A047578	1	9038770	194	62	[2, 5, 6, 7, 10, 13, 14, 15, 18, 21, 22, 23]	[119, 122, 125]
+4130	4131	A228407	1	50962227	192	59	[0, 11, 1, 10, 100, 12, 2, 20, 101, 22, 3, 13]	[134, 143, 314]
+4236	4237	A063160	4	13631973	193	48	[10, 33, 57, 81, 105, 129, 153, 177, 201, 225, 249, 273]	[1089, 1113, 1137]
+4572	4573	A009836	2	58624055	170	14	[0, 2, 8, 368, 16512, 1583104, 199552000, 36445579264, 8620299812864, 2621816292114432, 987354046567284736, 452732308336619290624]	[452732308336619290624, 247917555997339251900416, 159904531672039230122491904]
+4682	4683	A145039	1	5184810	98	18	[3, 7, 19, 31, 107, 127, 607, 1279, 2203, 4423, 86243, 110503]	[20996011, 24036583, 25964951]
+5618	5619	A263858	1	45502052	36	17	[1, 1, 1, 1, 1, 3, 1, 1, 7, 6, 2, 1]	[18, 4, 2]
+6137	6138	A088689	2	17024337	209	105	[0, 1, 1, 0, 2, 2, 0, 1, 1, 0, 2, 2]	[0, 1, 1]
+7257	7258	A044615	1	29890298	186	41	[47, 111, 175, 239, 303, 367, 383, 431, 495, 559, 623, 687]	[2223, 2287, 2351]
+7653	7654	A062744	1	41839518	135	15	[1, 1, 10, 145, 2470, 46060, 910252, 18730855, 397089550, 8612835715, 190223180840, 4263421511271]	[96723482198980, 2216905597676000, 51256802757808320]
+8155	8156	A270840	1	58982447	204	26	[12, 36, 138, 270, 546, 4800, 7560, 12840, 14700, 358200, 678480, 16139970]	[5088964800, 6974736756, 9214178820]
+8251	8252	A155645	1	1385507	193	20	[1, 12, 84, 558, 3696, 24582, 164304, 1103478, 7444416, 50431302, 342941424, 2340123798]	[249557173431942, 1729973554578864, 12008254925383638]
+8279	8280	A290453	4	29575809	181	91	[1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 2]	[0, 1, 0]
+8477	8478	A044402	3	19293444	182	39	[70, 170, 270, 370, 470, 570, 670, 700, 770, 870, 970, 1070]	[3370, 3470, 3570]
+9025	9026	A068987	1	62874440	85	11	[2, 149, 1925, 13808, 49703, 2458886, 9470345, 186557267, 523551503, 191278379840, 4368196101672]	[523551503, 191278379840, 4368196101672]
+9707	9708	A212167	3	33046	191	66	[1, 2, 3, 5, 6, 7, 10, 11, 12, 13, 14, 15]	[79, 82, 83]
+9721	9722	A293077	3	14353436	199	35	[2, 4, 6, 10, 16, 26, 44, 74, 126, 214, 364, 620]	[46697496, 79717612, 136086476]
+9894	9895	A009705	5	24197587	164	14	[0, 2, 4, 302, 8104, 947642, 86855404, 16203909542, 3130092938704, 896924477276402, 290713861720990804, 121467176505314129822]	[121467176505314129822, 58492863120535523766904, 33925794100542844193202602]
+10305	10306	A135839	2	3672765	181	91	[1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0]	[1, 0, 1]
+10600	10601	A101587	2	594093	59	14	[2, 4, 5, 20, 308, 815, 857, 1114, 1418, 1688, 12008, 18692]	[18692, 28097, 90964]
+10681	10682	A024036	3	21353804	219	25	[0, 3, 15, 63, 255, 1023, 4095, 16383, 65535, 262143, 1048575, 4194303]	[17592186044415, 70368744177663, 281474976710655]
+10706	10707	A100285	1	18949167	179	90	[1, 1, 5, 5, 1, 1, 5, 5, 1, 1, 5, 5]	[5, 1, 1]
+11007	11008	A055843	1	3608436	231	31	[1, 13, 85, 385, 1375, 4147, 11011, 26455, 58630, 121550, 238238, 445094]	[406833460, 536222500, 700950052]
+11079	11080	A017629	2	52873588	202	53	[9, 21, 33, 45, 57, 69, 81, 93, 105, 117, 129, 141]	[609, 621, 633]
+11234	11235	A007520	1	28251297	202	51	[3, 11, 19, 43, 59, 67, 83, 107, 131, 139, 163, 179]	[1163, 1171, 1187]
+11307	11308	A299729	1	7973728	201	54	[12, 24, 30, 36, 40, 48, 60, 63, 70, 72, 80, 84]	[320, 324, 325]
+11450	11451	A010126	1	52767868	161	81	[4, 1, 2, 4, 2, 1, 8, 1, 2, 4, 2, 1]	[8, 1, 2]
+12022	12023	A380991	9	1893358	9	4	[1, 4, 15, 48]	[4, 15, 48]
+12091	12092	A047612	2	10789238	196	63	[0, 2, 4, 5, 8, 10, 12, 13, 16, 18, 20, 21]	[120, 122, 124]
+12305	12306	A063144	1	8491582	189	49	[8, 27, 47, 67, 87, 107, 127, 147, 167, 187, 207, 227]	[927, 947, 967]
+12646	12647	A129756	1	3311430	207	76	[1, 1, 1, 1, 3, 3, 3, 3, 5, 5, 5, 5]	[37, 37, 37]
+13501	13502	A050621	1	20013751	188	18	[2, 12, 104, 1008, 10016, 100032, 1000064, 10000128, 100000256, 1000000512, 10000001024, 100000002048]	[1000000000032768, 10000000000065536, 100000000000131072]
+13614	13615	A126048	3	11143698	95	48	[2, 3, 5, 7, 5, 1, 3, 7, 5, 1, 3, 7]	[1, 1, 1]
+13703	13704	A163471	2	46480945	194	20	[3, 18, 114, 744, 4932, 32952, 221016, 1485216, 9989808, 67223328, 452457504, 3045661824]	[283481998053888, 1908413999583744, 12847536038651904]
+13704	13705	A309224	3	10124493	99	35	[1, 1, 0, 0, 2, 1, 1, 2, 2, 3, 5, 7]	[185, 202, 222]
+13737	13738	A101540	1	33989933	43	10	[4, 56, 500, 514, 626, 640, 724, 53110, 65330, 109672]	[53110, 65330, 109672]
+14074	14075	A160124	1	19649603	206	57	[0, 0, 0, 2, 4, 4, 8, 18, 24, 24, 28, 36]	[932, 1000, 1028]
+14081	14082	A010704	2	33797649	161	81	[3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6]	[3, 6, 3]
+14379	14380	A327424	2	605474	23	8	[1, 1, 2, 4, 10, 33, 234, 16579]	[33, 234, 16579]
+14391	14392	A212611	2	5704190	167	24	[135, 3375, 25137, 68607, 22113, 557375, 451737, 1278316, 273, 6739509, 188325, 8775]	[366776, 1487640245, 417725]
+14435	14436	A139302	5	8333360	100	4	[2, 512, 144115188075855872, 904625697166532776746648320380374280103671755200316906558262375061821325312]	[512, 144115188075855872, 904625697166532776746648320380374280103671755200316906558262375061821325312]
+14960	14961	A057556	1	65334021	335	168	[0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0]	[5, 0, 0]
+15368	15369	A056263	1	51823997	59	13	[1, 3, 27, 155, 321, 351, 1211, 1283, 7983, 15191, 84771, 119929]	[84771, 119929, 148859]
+15953	15954	A007097	1	484792	261	24	[1, 2, 3, 5, 11, 31, 127, 709, 5381, 52711, 648391, 9737333]	[9332039515881088707361, 499720579610303128776791, 28785866289100396890228041]
+16105	16106	A082705	4	11152753	23	6	[7, 9, 895, 1525, 3037, 21157]	[1525, 3037, 21157]
+16931	16932	A255771	53	771418	21	11	[1, 1, 1, 2, 2, 1, 2, 2, 4, 2, 2]	[4, 2, 2]
+17007	17008	A389124	2	37692541	230	29	[1, 3, 6, 17, 46, 133, 386, 1137, 3366, 10013, 29866, 89257]	[1270955283786, 3812843481737, 11438485705966]
+17160	17161	A117995	1	52396131	207	51	[0, 0, 1, 1, 2, 3, 4, 6, 8, 11, 14, 20]	[26252, 30700, 35717]
+17170	17171	A271449	3	5003778	13	6	[0, 3, 6, 9, 12, 18]	[9, 12, 18]
+17255	17256	A176393	1	8944574	205	61	[1, 3, 9, 13, 17, 19, 21, 25, 29, 31, 33, 37]	[161, 163, 165]
+17300	17301	A085070	11	9630624	8	4	[1, 2, 9, 26]	[2, 9, 26]
+17347	17348	A295745	1	37424184	8	3	[18, 20, 24]	[18, 20, 24]
+18185	18186	A380287	4	8734831	113	18	[4, 6, 16, 48, 142, 472, 1670, 6364, 24604, 97668, 390070, 1570560]	[419930444, 1701635046, 6898183050]
+18227	18228	A133384	1	15280426	227	19	[12, 102, 1002, 10002, 100002, 1000002, 10000002, 100000002, 1000000002, 10000000002, 100000000002, 1000000000002]	[100000000000000002, 1000000000000000002, 10000000000000000002]
+18258	18259	A011803	2	34513693	58	10	[1, 2, 8, 64, 904, 20926, 753994, 40412530, 3099627142, 329518779600]	[40412530, 3099627142, 329518779600]
+18304	18305	A137444	2	13209959	217	41	[1, 4, 6, 4, -4, -16, -24, -16, 16, 64, 96, 64]	[-1572864, -1048576, 1048576]
+18559	18560	A083318	1	45434387	198	32	[1, 3, 5, 9, 17, 33, 65, 129, 257, 513, 1025, 2049]	[536870913, 1073741825, 2147483649]
+18605	18606	A141044	2	3030285	209	105	[2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]	[1, 1, 1]