Enoch Powell GPT

A small language model fine-tuned to speak as Enoch Powell in UK Parliamentary debate.

Model Details

  • Architecture: GPT (12-layer, 768-dim, 12-head) — ~125M parameters
  • Base model: Pretrained from scratch on the full UK Hansard corpus (~51K parliamentary documents)
  • Fine-tuning: Supervised fine-tuning on 2,452 Enoch Powell question-answer pairs extracted from Hansard
  • Tokenizer: Custom BPE tokenizer trained on Hansard text
  • Context length: 2048 tokens

Training Pipeline

  1. Pretraining: Base language model trained on all UK Hansard records (parliamentary debates, questions, speeches). This gives the model fluency in parliamentary English.
  2. SFT: Fine-tuned on Enoch Powell's speeches using chat-format supervision. Training pairs are either direct Q&A exchanges from Hansard, or Powell's speeches paired with the debate topic heading as a question prompt.

SFT Details

Metric Value
Training conversations 2,452
Validation conversations 129
Supervised tokens ~355K
Best validation loss 2.40
Optimizer Muon + AdamW
Weight decay 0.1
LR schedule Cosine with 10% warmup
Epochs 2
Early stopping Yes (best val loss checkpoint)

Usage

python -m scripts.chat_cli \
  --checkpoint-dir <path-to-checkpoint> \
  --device-type cuda \
  --dtype bfloat16 \
  --prompt "What is your view on immigration?"

Intended Use

This model is a research project and historical curiosity. It generates text in the style of Enoch Powell's parliamentary contributions as recorded in Hansard. It is not intended to accurately represent Powell's views, produce factually correct statements, or serve as a reference for his political positions.

Limitations

  • Small model: 125M parameters. Responses can drift off-topic or become repetitive, especially for longer generations.
  • Parliamentary register only: Trained exclusively on Hansard, so the model speaks in formal parliamentary English. It does not reproduce Powell's non-parliamentary writing or speeches (e.g. the "Rivers of Blood" speech is not in the training data).
  • Not factually grounded: The model generates plausible-sounding parliamentary text but may attribute incorrect statements, cite nonexistent debates, or confuse procedural details.
  • Historical bias: The training data reflects the language and attitudes of mid-20th century British parliamentary debate.

Dataset

Training data is sourced from common-pile/uk_hansard on HuggingFace. Powell's contributions are identified by speaker attribution regex matching and paired with either the preceding question or the debate topic heading.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train LForster/epoch_powell_135M