Text Generation
Transformers
Safetensors
llama
math
combinatorics
permutations
algebraic-combinatorics
causal-lm
text-generation-inference
Instructions to use ACDRepo/PermuFormer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ACDRepo/PermuFormer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ACDRepo/PermuFormer")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ACDRepo/PermuFormer") model = AutoModelForCausalLM.from_pretrained("ACDRepo/PermuFormer") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ACDRepo/PermuFormer with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ACDRepo/PermuFormer" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ACDRepo/PermuFormer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ACDRepo/PermuFormer
- SGLang
How to use ACDRepo/PermuFormer with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ACDRepo/PermuFormer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ACDRepo/PermuFormer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ACDRepo/PermuFormer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ACDRepo/PermuFormer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ACDRepo/PermuFormer with Docker Model Runner:
docker model run hf.co/ACDRepo/PermuFormer
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - math | |
| - combinatorics | |
| - permutations | |
| - algebraic-combinatorics | |
| - llama | |
| - causal-lm | |
| # PermuFormer | |
| PermuFormer is a small Llama-style causal language model trained on symbolic permutation tasks from algebraic combinatorics. It is intended as a specialist base model for permutation representation, reasoning, and finetuning experiments rather than as a general natural-language assistant. | |
| The model operates on a compact word-level vocabulary for permutation syntax. Training examples are stored as pre-tokenized lists of tokens; at inference time, the Hugging Face tokenizer can also consume equivalent whitespace-separated strings. Prompts are formulaic equations: the left side specifies a permutation task and generation begins after the `=` token. | |
| ## Model Details | |
| - **Architecture:** `LlamaForCausalLM` | |
| - **Parameters:** about 75.7M | |
| - **Layers:** 12 | |
| - **Hidden size:** 768 | |
| - **Attention heads:** 12 query heads, 4 key/value heads | |
| - **MLP intermediate size:** 2048 | |
| - **Activation:** SiLU/SwiGLU | |
| - **Position encoding:** RoPE, theta 10000 | |
| - **Vocabulary size:** 186 | |
| - **Context length used by tokenizer:** 1000 tokens | |
| - **Checkpoint:** `step_2600000` | |
| ## Training Data | |
| PermuFormer was trained autoregressively on synthetic permutation examples generated with exact combinatorial algorithms. The paper describes a dataset of 39.8M instances, approximately 2.66B tokens, over the symmetric groups `S_2` through `S_11`. | |
| Training tasks cover three broad families: | |
| - **Translation between encodings:** one-line notation, cycle notation, reduced Coxeter expressions, RSK tableaux, inversion vectors, and Lehmer codes. | |
| - **Permutation statistics and properties:** length, descents, fixed points, sign/parity, cycle type, RSK shape, pattern avoidance, longest increasing/decreasing subsequences, and related statistics. | |
| - **Algebraic operations and comparisons:** product/composition, inverse, powers, conjugation, commutator, relative products, multiplication by simple transpositions, complement, reverse, descent tests, and Bruhat order. | |
| Some targets include computational witnesses before the final answer, for example inversion lists before a length answer or pattern witnesses before an avoidance answer. | |
| ## Usage | |
| Use deterministic decoding for most evaluation-style tasks. Make sure special token IDs come from the tokenizer. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| model_id = "YOUR_ORG/permuformer" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained(model_id) | |
| model.eval() | |
| prompt = ( | |
| "<|endoftext|> n3 " | |
| "1linebegin [ 3 , 1 , 2 ] 1lineend " | |
| "in cyclenotationmake =" | |
| ) | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| with torch.no_grad(): | |
| output_ids = model.generate( | |
| **inputs, | |
| max_new_tokens=80, | |
| do_sample=False, | |
| eos_token_id=tokenizer.eos_token_id, | |
| pad_token_id=tokenizer.pad_token_id, | |
| ) | |
| print(tokenizer.decode(output_ids[0], skip_special_tokens=False)) | |
| ``` | |
| ### Prompt Format | |
| Training data is represented as lists of token strings. When writing prompts as plain text, separate every token with spaces. Multi-digit integers, delimiters, and task names are individual tokens. A typical example starts with `<|endoftext|>`, then a size token such as `n7`, then the task expression, then `=`. | |
| Translation example: | |
| ```text | |
| <|endoftext|> n3 1linebegin [ 3 , 1 , 2 ] 1lineend in cyclenotationmake = | |
| ``` | |
| Property example: | |
| ```text | |
| <|endoftext|> n3 1linebegin [ 3 , 2 , 1 ] 1lineend property lengthmake = | |
| ``` | |
| Algebraic operation example: | |
| ```text | |
| <|endoftext|> n3 1linebegin [ 2 , 1 , 3 ] 1lineend inversemake = | |
| ``` | |
| ## Evaluation Notes | |
| The training code evaluates by exact match on the generated right-hand side after `=`. The local training log for this repository reports, at step 2,522,000 on a 2,560-example stratified evaluation sample: | |
| - Overall exact match: **98.44%** | |
| - Translation: **97.78%** | |
| - Property/statistic tasks: **99.17%** | |
| - Algebraic tasks: **98.36%** | |
| These figures are from the local log and should be treated as checkpoint-adjacent repository metadata, not a full benchmark report for every downstream setting. | |
| The paper also reports that PermuFormer is substantially more accurate than frontier general-purpose LLMs on a small held-out sample from the model's symbolic test distribution, while noting that the comparison is imperfect because PermuFormer was trained directly in this syntax. | |
| ## Finetuning | |
| PermuFormer is designed to be finetuned on specialized permutation tasks. Experiments in the paper include: | |
| - 231-avoidance and 2143-avoidance | |
| - mHeight | |
| - Schubert polynomial structure constants | |
| - Kazhdan-Lusztig polynomial degree prediction | |
| The repository's finetuning scripts compare starting from this pretrained checkpoint with training the same architecture from scratch. | |
| ## Limitations | |
| - This is a specialist symbolic model. It expects the exact whitespace-tokenized syntax used during training and is brittle to natural-language paraphrases or malformed prompts. | |
| - The model is trained on permutations of sizes represented in the training data, primarily `S_2` through `S_11`; behavior outside that regime is not guaranteed. | |
| - Exact-match accuracy depends on canonical output formatting. Some mathematical tasks may have multiple valid answers, but evaluation expects the chosen canonical form. | |
| - The model focuses on permutations. It does not natively handle broader combinatorial structures such as arbitrary graphs or partitions unless encoded through the supported task syntax. | |
| - Outputs should be verified by exact combinatorial software for research-critical use. | |
| ## Citation | |
| If you use this model, please cite the accompanying PermuFormer paper once citation details are available. | |