---
library_name: transformers
pipeline_tag: text-generation
tags:
- math
- combinatorics
- permutations
- algebraic-combinatorics
- llama
- causal-lm
---

# PermuFormer

PermuFormer is a small Llama-style causal language model trained on symbolic permutation tasks from algebraic combinatorics. It is intended as a specialist base model for permutation representation, reasoning, and finetuning experiments rather than as a general natural-language assistant.

The model operates on a compact word-level vocabulary for permutation syntax. Training examples are stored as pre-tokenized lists of tokens; at inference time, the Hugging Face tokenizer can also consume equivalent whitespace-separated strings. Prompts are formulaic equations: the left side specifies a permutation task and generation begins after the `=` token.

## Model Details

- **Architecture:** `LlamaForCausalLM`
- **Parameters:** about 75.7M
- **Layers:** 12
- **Hidden size:** 768
- **Attention heads:** 12 query heads, 4 key/value heads
- **MLP intermediate size:** 2048
- **Activation:** SiLU/SwiGLU
- **Position encoding:** RoPE, theta 10000
- **Vocabulary size:** 186
- **Context length used by tokenizer:** 1000 tokens
- **Checkpoint:** `step_2600000`

## Training Data

PermuFormer was trained autoregressively on synthetic permutation examples generated with exact combinatorial algorithms. The paper describes a dataset of 39.8M instances, approximately 2.66B tokens, over the symmetric groups `S_2` through `S_11`.

Training tasks cover three broad families:

- **Translation between encodings:** one-line notation, cycle notation, reduced Coxeter expressions, RSK tableaux, inversion vectors, and Lehmer codes.
- **Permutation statistics and properties:** length, descents, fixed points, sign/parity, cycle type, RSK shape, pattern avoidance, longest increasing/decreasing subsequences, and related statistics.
- **Algebraic operations and comparisons:** product/composition, inverse, powers, conjugation, commutator, relative products, multiplication by simple transpositions, complement, reverse, descent tests, and Bruhat order.

Some targets include computational witnesses before the final answer, for example inversion lists before a length answer or pattern witnesses before an avoidance answer.

## Usage

Use deterministic decoding for most evaluation-style tasks. Make sure special token IDs come from the tokenizer.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "YOUR_ORG/permuformer"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
model.eval()

prompt = (
    "<|endoftext|> n3 "
    "1linebegin [ 3 , 1 , 2 ] 1lineend "
    "in cyclenotationmake ="
)

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=80,
        do_sample=False,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

print(tokenizer.decode(output_ids[0], skip_special_tokens=False))
```

### Prompt Format

Training data is represented as lists of token strings. When writing prompts as plain text, separate every token with spaces. Multi-digit integers, delimiters, and task names are individual tokens. A typical example starts with `<|endoftext|>`, then a size token such as `n7`, then the task expression, then `=`.

Translation example:

```text
<|endoftext|> n3 1linebegin [ 3 , 1 , 2 ] 1lineend in cyclenotationmake =
```

Property example:

```text
<|endoftext|> n3 1linebegin [ 3 , 2 , 1 ] 1lineend property lengthmake =
```

Algebraic operation example:

```text
<|endoftext|> n3 1linebegin [ 2 , 1 , 3 ] 1lineend inversemake =
```

## Evaluation Notes

The training code evaluates by exact match on the generated right-hand side after `=`. The local training log for this repository reports, at step 2,522,000 on a 2,560-example stratified evaluation sample:

- Overall exact match: **98.44%**
- Translation: **97.78%**
- Property/statistic tasks: **99.17%**
- Algebraic tasks: **98.36%**

These figures are from the local log and should be treated as checkpoint-adjacent repository metadata, not a full benchmark report for every downstream setting.

The paper also reports that PermuFormer is substantially more accurate than frontier general-purpose LLMs on a small held-out sample from the model's symbolic test distribution, while noting that the comparison is imperfect because PermuFormer was trained directly in this syntax.

## Finetuning

PermuFormer is designed to be finetuned on specialized permutation tasks. Experiments in the paper include:

- 231-avoidance and 2143-avoidance
- mHeight
- Schubert polynomial structure constants
- Kazhdan-Lusztig polynomial degree prediction

The repository's finetuning scripts compare starting from this pretrained checkpoint with training the same architecture from scratch.

## Limitations

- This is a specialist symbolic model. It expects the exact whitespace-tokenized syntax used during training and is brittle to natural-language paraphrases or malformed prompts.
- The model is trained on permutations of sizes represented in the training data, primarily `S_2` through `S_11`; behavior outside that regime is not guaranteed.
- Exact-match accuracy depends on canonical output formatting. Some mathematical tasks may have multiple valid answers, but evaluation expects the chosen canonical form.
- The model focuses on permutations. It does not natively handle broader combinatorial structures such as arbitrary graphs or partitions unless encoded through the supported task syntax.
- Outputs should be verified by exact combinatorial software for research-critical use.

## Citation

If you use this model, please cite the accompanying PermuFormer paper once citation details are available.