PermuFormer

PermuFormer is a small Llama-style causal language model trained on symbolic permutation tasks from algebraic combinatorics. It is intended as a specialist base model for permutation representation, reasoning, and finetuning experiments rather than as a general natural-language assistant.

The model operates on a compact word-level vocabulary for permutation syntax. Training examples are stored as pre-tokenized lists of tokens; at inference time, the Hugging Face tokenizer can also consume equivalent whitespace-separated strings. Prompts are formulaic equations: the left side specifies a permutation task and generation begins after the = token.

Model Details

Architecture: LlamaForCausalLM
Parameters: about 75.7M
Layers: 12
Hidden size: 768
Attention heads: 12 query heads, 4 key/value heads
MLP intermediate size: 2048
Activation: SiLU/SwiGLU
Position encoding: RoPE, theta 10000
Vocabulary size: 186
Context length used by tokenizer: 1000 tokens
Checkpoint: step_2600000

Training Data

PermuFormer was trained autoregressively on synthetic permutation examples generated with exact combinatorial algorithms. The paper describes a dataset of 39.8M instances, approximately 2.66B tokens, over the symmetric groups S_2 through S_11.

Training tasks cover three broad families:

Translation between encodings: one-line notation, cycle notation, reduced Coxeter expressions, RSK tableaux, inversion vectors, and Lehmer codes.
Permutation statistics and properties: length, descents, fixed points, sign/parity, cycle type, RSK shape, pattern avoidance, longest increasing/decreasing subsequences, and related statistics.
Algebraic operations and comparisons: product/composition, inverse, powers, conjugation, commutator, relative products, multiplication by simple transpositions, complement, reverse, descent tests, and Bruhat order.

Some targets include computational witnesses before the final answer, for example inversion lists before a length answer or pattern witnesses before an avoidance answer.

Usage

Use deterministic decoding for most evaluation-style tasks. Make sure special token IDs come from the tokenizer.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "YOUR_ORG/permuformer"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
model.eval()

prompt = (
    "<|endoftext|> n3 "
    "1linebegin [ 3 , 1 , 2 ] 1lineend "
    "in cyclenotationmake ="
)

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=80,
        do_sample=False,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

print(tokenizer.decode(output_ids[0], skip_special_tokens=False))

Prompt Format

Training data is represented as lists of token strings. When writing prompts as plain text, separate every token with spaces. Multi-digit integers, delimiters, and task names are individual tokens. A typical example starts with <|endoftext|>, then a size token such as n7, then the task expression, then =.

Translation example:

<|endoftext|> n3 1linebegin [ 3 , 1 , 2 ] 1lineend in cyclenotationmake =

Property example:

<|endoftext|> n3 1linebegin [ 3 , 2 , 1 ] 1lineend property lengthmake =

Algebraic operation example:

<|endoftext|> n3 1linebegin [ 2 , 1 , 3 ] 1lineend inversemake =

Evaluation Notes

The training code evaluates by exact match on the generated right-hand side after =. The local training log for this repository reports, at step 2,522,000 on a 2,560-example stratified evaluation sample:

Overall exact match: 98.44%
Translation: 97.78%
Property/statistic tasks: 99.17%
Algebraic tasks: 98.36%

These figures are from the local log and should be treated as checkpoint-adjacent repository metadata, not a full benchmark report for every downstream setting.

The paper also reports that PermuFormer is substantially more accurate than frontier general-purpose LLMs on a small held-out sample from the model's symbolic test distribution, while noting that the comparison is imperfect because PermuFormer was trained directly in this syntax.

Finetuning

PermuFormer is designed to be finetuned on specialized permutation tasks. Experiments in the paper include:

231-avoidance and 2143-avoidance
mHeight
Schubert polynomial structure constants
Kazhdan-Lusztig polynomial degree prediction

The repository's finetuning scripts compare starting from this pretrained checkpoint with training the same architecture from scratch.

Limitations

This is a specialist symbolic model. It expects the exact whitespace-tokenized syntax used during training and is brittle to natural-language paraphrases or malformed prompts.
The model is trained on permutations of sizes represented in the training data, primarily S_2 through S_11; behavior outside that regime is not guaranteed.
Exact-match accuracy depends on canonical output formatting. Some mathematical tasks may have multiple valid answers, but evaluation expects the chosen canonical form.
The model focuses on permutations. It does not natively handle broader combinatorial structures such as arbitrary graphs or partitions unless encoded through the supported task syntax.
Outputs should be verified by exact combinatorial software for research-critical use.

Citation

If you use this model, please cite the accompanying PermuFormer paper once citation details are available.

Downloads last month: 1

Safetensors

Model size

75.7M params

Tensor type

F32