TensionLM-Phase2-TSNative

A 13.5M parameter language model trained with TS-native objectives — constraint consistency loss and tension entropy loss — on top of standard cross-entropy. Part of the TensionLM project, an empirical implementation of the Thinking System (TS) theory of computation.

GitHub | 117M model

What makes this different

Standard transformers use softmax attention — positions compete for a fixed budget summing to 1. TensionLM replaces this with independent sigmoid scores:

τ[t, w] = sigmoid( dot(Q[t], K[t-w]) / √d )
output[t] = Σ_w  τ[t, w] · V[t-w]  /  valid_count

No position suppresses any other. The tension field IS the constraint graph — fully inspectable.

TS-native objectives go further — they train the graph directly:

Constraint consistency loss: if A tensions B and B tensions C, A should tension C (transitivity)
Tension entropy loss: penalises isolated nodes (τ≈0) and saturated nodes (τ≈1) — pushes toward sparse but non-trivial graphs

Result: the model builds structurally coherent constraint graphs rather than just minimising next-token prediction. The transitivity chain is explicitly visible in the tension field.

Phase 2 results

Model	Val PPL	Training objective
Baseline	85.19	Cross-entropy only
This model	86.50	Cross-entropy + consistency + entropy

1.31 PPL cost for structurally coherent reasoning behaviour. On "If A then B. If B then C. Therefore", layer 5 head 2 produces:

A[1]:0.32   B[3]:0.43   B[6]:0.64   C[8]:0.59   then[7]:0.50

The full transitivity chain A→B→C held simultaneously as active constraints. Baseline produces uniform diffuse noise at the same position.

Usage

import torch
from model import TensionConfig, TensionLM, generate
from tokenizers import Tokenizer

ckpt      = torch.load("pytorch_model.pt", map_location="cpu", weights_only=False)
model     = TensionLM(TensionConfig(**ckpt["cfg"]))
state     = {k.replace("_orig_mod.", ""): v for k, v in ckpt["model"].items()}
model.load_state_dict(state)
tokenizer = Tokenizer.from_file("tokenizer.json")

enc    = tokenizer.encode("To prove that P implies Q, we assume")
ids    = generate(model, enc.ids, max_new=150, temp=0.7, top_p=0.9)
print(tokenizer.decode(ids))

Or use the CLI from the GitHub repo:

# Standard generation
python3 generate.py --checkpoint pytorch_model.pt \
    --prompt "If all mammals are warm-blooded and all dogs are mammals then" \
    --max_new 150 --temp 0.7

# Anchored generation — keeps prompt permanently in tension window
python3 generate.py --checkpoint pytorch_model.pt \
    --prompt "Assume for contradiction that there are finitely many primes." \
    --anchor --max_new 200

# Inspect the constraint graph
python3 visualise.py --checkpoint pytorch_model.pt \
    --mode token \
    --text "If A then B. If B then C. Therefore" \
    --layer 5

# Full tension heatmap across all layers and heads
python3 visualise.py --checkpoint pytorch_model.pt \
    --mode heatmap \
    --text "To prove that P implies Q, we assume P is true and show Q must follow"

Example prompt outputs

Prompt: "To prove that P implies Q, we assume P is true and show that Q must follow. Suppose P holds."

Let P is the probability of getting a contradiction.
Now let A = {{x,y} - 1} and B < {{P(B), P(A), P(B), R(C)}}.
If the system is true then there exists a cycle of time 1 (up to B),
and for every i = j ∈ A. Prove that in [4].

Note: reaches for proof-by-contradiction structure and maintains proof obligation ("Prove that") — logical scaffolding preserved even at 13.5M params with no logic warmup.

Prompt: "If A then B. If B then C. Therefore"

A is true if AB or CD, then A is false.
If B is odd then B = B and E is false; In both cases: C = C.

Note: tracking truth values, reasoning about both cases simultaneously.

Model card

Property	Value
Parameters	13,573,894
Architecture	TensionLM (sigmoid tension, windowed)
Dataset	open-web-math (1B tokens)
Val PPL	86.50
Context window	32 tokens × 6 layers
Training objective	Cross-entropy + constraint consistency + tension entropy

Limitations

Research model — no instruction following, no RLHF, not production-ready. 13.5M params trained on mathematical web text. Generates incoherent text at this scale — the value is in the constraint graph structure, not the output quality. The 117M formal curriculum run is the target for generation quality.

Downloads last month: 24

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support