Upload folder using huggingface_hub

eb03713 verified 1 day ago

5.88 kB

	---
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- math
	- combinatorics
	- permutations
	- algebraic-combinatorics
	- llama
	- causal-lm
	---

	# PermuFormer

	PermuFormer is a small Llama-style causal language model trained on symbolic permutation tasks from algebraic combinatorics. It is intended as a specialist base model for permutation representation, reasoning, and finetuning experiments rather than as a general natural-language assistant.

	The model operates on a compact word-level vocabulary for permutation syntax. Training examples are stored as pre-tokenized lists of tokens; at inference time, the Hugging Face tokenizer can also consume equivalent whitespace-separated strings. Prompts are formulaic equations: the left side specifies a permutation task and generation begins after the `=` token.

	## Model Details

	- Architecture: `LlamaForCausalLM`
	- Parameters: about 75.7M
	- Layers: 12
	- Hidden size: 768
	- Attention heads: 12 query heads, 4 key/value heads
	- MLP intermediate size: 2048
	- Activation: SiLU/SwiGLU
	- Position encoding: RoPE, theta 10000
	- Vocabulary size: 186
	- Context length used by tokenizer: 1000 tokens
	- Checkpoint: `step_2600000`

	## Training Data

	PermuFormer was trained autoregressively on synthetic permutation examples generated with exact combinatorial algorithms. The paper describes a dataset of 39.8M instances, approximately 2.66B tokens, over the symmetric groups `S_2` through `S_11`.

	Training tasks cover three broad families:

	- Translation between encodings: one-line notation, cycle notation, reduced Coxeter expressions, RSK tableaux, inversion vectors, and Lehmer codes.
	- Permutation statistics and properties: length, descents, fixed points, sign/parity, cycle type, RSK shape, pattern avoidance, longest increasing/decreasing subsequences, and related statistics.
	- Algebraic operations and comparisons: product/composition, inverse, powers, conjugation, commutator, relative products, multiplication by simple transpositions, complement, reverse, descent tests, and Bruhat order.

	Some targets include computational witnesses before the final answer, for example inversion lists before a length answer or pattern witnesses before an avoidance answer.

	## Usage

	Use deterministic decoding for most evaluation-style tasks. Make sure special token IDs come from the tokenizer.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "YOUR_ORG/permuformer"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)
	model.eval()

	prompt = (
	"<\|endoftext\|> n3 "
	"1linebegin [ 3 , 1 , 2 ] 1lineend "
	"in cyclenotationmake ="
	)

	inputs = tokenizer(prompt, return_tensors="pt")

	with torch.no_grad():
	output_ids = model.generate(
	**inputs,
	max_new_tokens=80,
	do_sample=False,
	eos_token_id=tokenizer.eos_token_id,
	pad_token_id=tokenizer.pad_token_id,
	)

	print(tokenizer.decode(output_ids[0], skip_special_tokens=False))
	```

	### Prompt Format

	Training data is represented as lists of token strings. When writing prompts as plain text, separate every token with spaces. Multi-digit integers, delimiters, and task names are individual tokens. A typical example starts with `<\|endoftext\|>`, then a size token such as `n7`, then the task expression, then `=`.

	Translation example:

	```text
	<\|endoftext\|> n3 1linebegin [ 3 , 1 , 2 ] 1lineend in cyclenotationmake =
	```

	Property example:

	```text
	<\|endoftext\|> n3 1linebegin [ 3 , 2 , 1 ] 1lineend property lengthmake =
	```

	Algebraic operation example:

	```text
	<\|endoftext\|> n3 1linebegin [ 2 , 1 , 3 ] 1lineend inversemake =
	```

	## Evaluation Notes

	The training code evaluates by exact match on the generated right-hand side after `=`. The local training log for this repository reports, at step 2,522,000 on a 2,560-example stratified evaluation sample:

	- Overall exact match: 98.44%
	- Translation: 97.78%
	- Property/statistic tasks: 99.17%
	- Algebraic tasks: 98.36%

	These figures are from the local log and should be treated as checkpoint-adjacent repository metadata, not a full benchmark report for every downstream setting.

	The paper also reports that PermuFormer is substantially more accurate than frontier general-purpose LLMs on a small held-out sample from the model's symbolic test distribution, while noting that the comparison is imperfect because PermuFormer was trained directly in this syntax.

	## Finetuning

	PermuFormer is designed to be finetuned on specialized permutation tasks. Experiments in the paper include:

	- 231-avoidance and 2143-avoidance
	- mHeight
	- Schubert polynomial structure constants
	- Kazhdan-Lusztig polynomial degree prediction

	The repository's finetuning scripts compare starting from this pretrained checkpoint with training the same architecture from scratch.

	## Limitations

	- This is a specialist symbolic model. It expects the exact whitespace-tokenized syntax used during training and is brittle to natural-language paraphrases or malformed prompts.
	- The model is trained on permutations of sizes represented in the training data, primarily `S_2` through `S_11`; behavior outside that regime is not guaranteed.
	- Exact-match accuracy depends on canonical output formatting. Some mathematical tasks may have multiple valid answers, but evaluation expects the chosen canonical form.
	- The model focuses on permutations. It does not natively handle broader combinatorial structures such as arbitrary graphs or partitions unless encoded through the supported task syntax.
	- Outputs should be verified by exact combinatorial software for research-critical use.

	## Citation

	If you use this model, please cite the accompanying PermuFormer paper once citation details are available.