Enoch Powell GPT

A small language model fine-tuned to speak as Enoch Powell in UK Parliamentary debate.

Model Details

Architecture: GPT (12-layer, 768-dim, 12-head) — ~125M parameters
Base model: Pretrained from scratch on the full UK Hansard corpus (~51K parliamentary documents)
Fine-tuning: Supervised fine-tuning on 2,452 Enoch Powell question-answer pairs extracted from Hansard
Tokenizer: Custom BPE tokenizer trained on Hansard text
Context length: 2048 tokens

Training Pipeline

Pretraining: Base language model trained on all UK Hansard records (parliamentary debates, questions, speeches). This gives the model fluency in parliamentary English.
SFT: Fine-tuned on Enoch Powell's speeches using chat-format supervision. Training pairs are either direct Q&A exchanges from Hansard, or Powell's speeches paired with the debate topic heading as a question prompt.

SFT Details

Metric	Value
Training conversations	2,452
Validation conversations	129
Supervised tokens	~355K
Best validation loss	2.40
Optimizer	Muon + AdamW
Weight decay	0.1
LR schedule	Cosine with 10% warmup
Epochs	2
Early stopping	Yes (best val loss checkpoint)

Usage

python -m scripts.chat_cli \
  --checkpoint-dir <path-to-checkpoint> \
  --device-type cuda \
  --dtype bfloat16 \
  --prompt "What is your view on immigration?"

Intended Use

This model is a research project and historical curiosity. It generates text in the style of Enoch Powell's parliamentary contributions as recorded in Hansard. It is not intended to accurately represent Powell's views, produce factually correct statements, or serve as a reference for his political positions.

Limitations

Small model: 125M parameters. Responses can drift off-topic or become repetitive, especially for longer generations.
Parliamentary register only: Trained exclusively on Hansard, so the model speaks in formal parliamentary English. It does not reproduce Powell's non-parliamentary writing or speeches (e.g. the "Rivers of Blood" speech is not in the training data).
Not factually grounded: The model generates plausible-sounding parliamentary text but may attribute incorrect statements, cite nonexistent debates, or confuse procedural details.
Historical bias: The training data reflects the language and attitudes of mid-20th century British parliamentary debate.

Dataset

Training data is sourced from common-pile/uk_hansard on HuggingFace. Powell's contributions are identified by speaker attribution regex matching and paired with either the preceding question or the debate topic heading.

Downloads last month: -; Downloads are not tracked for this model. How to track

LForster
/

epoch_powell_135M