TensionLM-Phase2-TSNative
A 13.5M parameter language model trained with TS-native objectives β constraint consistency loss and tension entropy loss β on top of standard cross-entropy. Part of the TensionLM project, an empirical implementation of the Thinking System (TS) theory of computation.
What makes this different
Standard transformers use softmax attention β positions compete for a fixed budget summing to 1. TensionLM replaces this with independent sigmoid scores:
Ο[t, w] = sigmoid( dot(Q[t], K[t-w]) / βd )
output[t] = Ξ£_w Ο[t, w] Β· V[t-w] / valid_count
No position suppresses any other. The tension field IS the constraint graph β fully inspectable.
TS-native objectives go further β they train the graph directly:
- Constraint consistency loss: if A tensions B and B tensions C, A should tension C (transitivity)
- Tension entropy loss: penalises isolated nodes (Οβ0) and saturated nodes (Οβ1) β pushes toward sparse but non-trivial graphs
Result: the model builds structurally coherent constraint graphs rather than just minimising next-token prediction. The transitivity chain is explicitly visible in the tension field.
Phase 2 results
| Model | Val PPL | Training objective |
|---|---|---|
| Baseline | 85.19 | Cross-entropy only |
| This model | 86.50 | Cross-entropy + consistency + entropy |
1.31 PPL cost for structurally coherent reasoning behaviour. On "If A then B. If B then C. Therefore", layer 5 head 2 produces:
A[1]:0.32 B[3]:0.43 B[6]:0.64 C[8]:0.59 then[7]:0.50
The full transitivity chain AβBβC held simultaneously as active constraints. Baseline produces uniform diffuse noise at the same position.
Usage
import torch
from model import TensionConfig, TensionLM, generate
from tokenizers import Tokenizer
ckpt = torch.load("pytorch_model.pt", map_location="cpu", weights_only=False)
model = TensionLM(TensionConfig(**ckpt["cfg"]))
state = {k.replace("_orig_mod.", ""): v for k, v in ckpt["model"].items()}
model.load_state_dict(state)
tokenizer = Tokenizer.from_file("tokenizer.json")
enc = tokenizer.encode("To prove that P implies Q, we assume")
ids = generate(model, enc.ids, max_new=150, temp=0.7, top_p=0.9)
print(tokenizer.decode(ids))
Or use the CLI from the GitHub repo:
# Standard generation
python3 generate.py --checkpoint pytorch_model.pt \
--prompt "If all mammals are warm-blooded and all dogs are mammals then" \
--max_new 150 --temp 0.7
# Anchored generation β keeps prompt permanently in tension window
python3 generate.py --checkpoint pytorch_model.pt \
--prompt "Assume for contradiction that there are finitely many primes." \
--anchor --max_new 200
# Inspect the constraint graph
python3 visualise.py --checkpoint pytorch_model.pt \
--mode token \
--text "If A then B. If B then C. Therefore" \
--layer 5
# Full tension heatmap across all layers and heads
python3 visualise.py --checkpoint pytorch_model.pt \
--mode heatmap \
--text "To prove that P implies Q, we assume P is true and show Q must follow"
Example prompt outputs
Prompt: "To prove that P implies Q, we assume P is true and show that Q must follow. Suppose P holds."
Let P is the probability of getting a contradiction.
Now let A = {{x,y} - 1} and B < {{P(B), P(A), P(B), R(C)}}.
If the system is true then there exists a cycle of time 1 (up to B),
and for every i = j β A. Prove that in [4].
Note: reaches for proof-by-contradiction structure and maintains proof obligation ("Prove that") β logical scaffolding preserved even at 13.5M params with no logic warmup.
Prompt: "If A then B. If B then C. Therefore"
A is true if AB or CD, then A is false.
If B is odd then B = B and E is false; In both cases: C = C.
Note: tracking truth values, reasoning about both cases simultaneously.
Model card
| Property | Value |
|---|---|
| Parameters | 13,573,894 |
| Architecture | TensionLM (sigmoid tension, windowed) |
| Dataset | open-web-math (1B tokens) |
| Val PPL | 86.50 |
| Context window | 32 tokens Γ 6 layers |
| Training objective | Cross-entropy + constraint consistency + tension entropy |
Limitations
Research model β no instruction following, no RLHF, not production-ready. 13.5M params trained on mathematical web text. Generates incoherent text at this scale β the value is in the constraint graph structure, not the output quality. The 117M formal curriculum run is the target for generation quality.
- Downloads last month
- 24