TensionLM-Phase2-TSNative

A 13.5M parameter language model trained with TS-native objectives β€” constraint consistency loss and tension entropy loss β€” on top of standard cross-entropy. Part of the TensionLM project, an empirical implementation of the Thinking System (TS) theory of computation.

GitHub | 117M model


What makes this different

Standard transformers use softmax attention β€” positions compete for a fixed budget summing to 1. TensionLM replaces this with independent sigmoid scores:

Ο„[t, w] = sigmoid( dot(Q[t], K[t-w]) / √d )
output[t] = Ξ£_w  Ο„[t, w] Β· V[t-w]  /  valid_count

No position suppresses any other. The tension field IS the constraint graph β€” fully inspectable.

TS-native objectives go further β€” they train the graph directly:

  • Constraint consistency loss: if A tensions B and B tensions C, A should tension C (transitivity)
  • Tension entropy loss: penalises isolated nodes (Ο„β‰ˆ0) and saturated nodes (Ο„β‰ˆ1) β€” pushes toward sparse but non-trivial graphs

Result: the model builds structurally coherent constraint graphs rather than just minimising next-token prediction. The transitivity chain is explicitly visible in the tension field.


Phase 2 results

Model Val PPL Training objective
Baseline 85.19 Cross-entropy only
This model 86.50 Cross-entropy + consistency + entropy

1.31 PPL cost for structurally coherent reasoning behaviour. On "If A then B. If B then C. Therefore", layer 5 head 2 produces:

A[1]:0.32   B[3]:0.43   B[6]:0.64   C[8]:0.59   then[7]:0.50

The full transitivity chain A→B→C held simultaneously as active constraints. Baseline produces uniform diffuse noise at the same position.


Usage

import torch
from model import TensionConfig, TensionLM, generate
from tokenizers import Tokenizer

ckpt      = torch.load("pytorch_model.pt", map_location="cpu", weights_only=False)
model     = TensionLM(TensionConfig(**ckpt["cfg"]))
state     = {k.replace("_orig_mod.", ""): v for k, v in ckpt["model"].items()}
model.load_state_dict(state)
tokenizer = Tokenizer.from_file("tokenizer.json")

enc    = tokenizer.encode("To prove that P implies Q, we assume")
ids    = generate(model, enc.ids, max_new=150, temp=0.7, top_p=0.9)
print(tokenizer.decode(ids))

Or use the CLI from the GitHub repo:

# Standard generation
python3 generate.py --checkpoint pytorch_model.pt \
    --prompt "If all mammals are warm-blooded and all dogs are mammals then" \
    --max_new 150 --temp 0.7

# Anchored generation β€” keeps prompt permanently in tension window
python3 generate.py --checkpoint pytorch_model.pt \
    --prompt "Assume for contradiction that there are finitely many primes." \
    --anchor --max_new 200

# Inspect the constraint graph
python3 visualise.py --checkpoint pytorch_model.pt \
    --mode token \
    --text "If A then B. If B then C. Therefore" \
    --layer 5

# Full tension heatmap across all layers and heads
python3 visualise.py --checkpoint pytorch_model.pt \
    --mode heatmap \
    --text "To prove that P implies Q, we assume P is true and show Q must follow"

Example prompt outputs

Prompt: "To prove that P implies Q, we assume P is true and show that Q must follow. Suppose P holds."

Let P is the probability of getting a contradiction.
Now let A = {{x,y} - 1} and B < {{P(B), P(A), P(B), R(C)}}.
If the system is true then there exists a cycle of time 1 (up to B),
and for every i = j ∈ A. Prove that in [4].

Note: reaches for proof-by-contradiction structure and maintains proof obligation ("Prove that") β€” logical scaffolding preserved even at 13.5M params with no logic warmup.

Prompt: "If A then B. If B then C. Therefore"

A is true if AB or CD, then A is false.
If B is odd then B = B and E is false; In both cases: C = C.

Note: tracking truth values, reasoning about both cases simultaneously.


Model card

Property Value
Parameters 13,573,894
Architecture TensionLM (sigmoid tension, windowed)
Dataset open-web-math (1B tokens)
Val PPL 86.50
Context window 32 tokens Γ— 6 layers
Training objective Cross-entropy + constraint consistency + tension entropy

Limitations

Research model β€” no instruction following, no RLHF, not production-ready. 13.5M params trained on mathematical web text. Generates incoherent text at this scale β€” the value is in the constraint graph structure, not the output quality. The 117M formal curriculum run is the target for generation quality.

Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support