Havelock Orality Regressor
ModernBERT-based regression model that scores text on the oral–literate spectrum (0–1), grounded in Walter Ong's Orality and Literacy (1982).
Given a passage of text, the model outputs a continuous score where higher values indicate greater orality (spoken, performative, additive discourse) and lower values indicate greater literacy (analytic, subordinative, abstract discourse).
Model Details
| Property | Value |
|---|---|
| Base model | answerdotai/ModernBERT-base |
| Architecture | HavelockOralityRegressor (custom, mean pooling → linear) |
| Task | Single-value regression (MSE loss) |
| Output range | Continuous (not clamped) |
| Max sequence length | 512 tokens |
| Best MAE | 0.0791 |
| R² (at best MAE) | 0.748 |
| Parameters | ~149M |
Usage
import os
os.environ["TORCH_COMPILE_DISABLE"] = "1"
import warnings
warnings.filterwarnings("ignore", message="Flash Attention 2 only supports")
import torch
from transformers import AutoModel, AutoTokenizer
model_name = "HavelockAI/bert-orality-regressor"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
text = "Tell me, O Muse, of that ingenious hero who travelled far and wide"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad(), torch.autocast(device_type=device.type, enabled=device.type == "cuda"):
score = model(**inputs).logits.squeeze().item()
print(f"Orality score: {max(0.0, min(1.0, score)):.3f}")
Score Interpretation
| Score | Register |
|---|---|
| 0.8–1.0 | Highly oral — epic poetry, sermons, rap, oral storytelling |
| 0.6–0.8 | Oral-dominant — speeches, podcasts, conversational prose |
| 0.4–0.6 | Mixed — journalism, blog posts, dialogue-heavy fiction |
| 0.2–0.4 | Literate-dominant — essays, expository prose |
| 0.0–0.2 | Highly literate — academic papers, legal texts, philosophy |
Training
Data
The model was trained on a curated corpus of documents annotated with orality scores using a multi-pass scoring system. Scores were originally on a 0–100 scale and normalized to 0–1 for training. The corpus draws from Project Gutenberg, textfiles.com, Reddit, and Wikipedia talk pages, representing a range of registers from highly oral to highly literate.
An 80/20 train/test split was used (random seed 42).
Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 20 |
| Learning rate | 2e-5 |
| Optimizer | AdamW (weight decay 0.01) |
| LR schedule | Cosine with warmup (10% of total steps) |
| Gradient clipping | 1.0 |
| Loss | MSE |
| Mixed precision | FP16 |
| Regularization | Mixout (p=0.1) |
Training Metrics
Click to show per-epoch metrics
| Epoch | Loss | MAE | R² |
|---|---|---|---|
| 1 | 0.3496 | 0.1173 | 0.476 |
| 2 | 0.0286 | 0.0992 | 0.593 |
| 3 | 0.0215 | 0.0872 | 0.704 |
| 4 | 0.0144 | 0.0879 | 0.714 |
| 5 | 0.0169 | 0.0865 | 0.712 |
| 6 | 0.0117 | 0.0853 | 0.700 |
| 7 | 0.0096 | 0.0922 | 0.691 |
| 8 | 0.0094 | 0.0850 | 0.722 |
| 9 | 0.0086 | 0.0822 | 0.745 |
| 10 | 0.0064 | 0.0841 | 0.723 |
| 11 | 0.0054 | 0.0921 | 0.682 |
| 12 | 0.0050 | 0.0840 | 0.720 |
| 13 | 0.0044 | 0.0806 | 0.744 |
| 14 | 0.0037 | 0.0805 | 0.740 |
| 15 | 0.0034 | 0.0791 | 0.748 |
| 16 | 0.0033 | 0.0807 | 0.738 |
| 17 | 0.0031 | 0.0803 | 0.742 |
| 18 | 0.0026 | 0.0797 | 0.745 |
| 19 | 0.0027 | 0.0803 | 0.742 |
| 20 | 0.0029 | 0.0805 | 0.741 |
Best checkpoint selected at epoch 15 by lowest MAE.
Architecture
Custom HavelockOralityRegressor with mean pooling (ModernBERT has no pooler output):
ModernBERT (answerdotai/ModernBERT-base)
└── Mean pooling over non-padded tokens
└── Dropout (p=0.1)
└── Linear (hidden_size → 1)
Regularization
- Mixout (p=0.1): During training, each backbone weight element has a 10% chance of being replaced by its pretrained value per forward pass, acting as a stochastic L2 anchor that prevents representation drift (Lee et al., 2019)
- Weight decay (0.01) via AdamW
- Gradient clipping (max norm 1.0)
Limitations
- No sigmoid clamping: The model can output values outside [0, 1]. Consumers should clamp if needed.
- Domain coverage: Training corpus skews historical/literary. Performance on modern social media, code-switched text, or non-English text is untested.
- Document length: Texts longer than 512 tokens are truncated. The model sees only the first ~400 words, which may not be representative of longer documents.
- Regression target subjectivity: Orality scores involve human judgment; inter-annotator agreement bounds the ceiling for model performance.
Theoretical Background
The oral–literate spectrum follows Ong's framework, which characterizes oral discourse as additive, aggregative, redundant, agonistic, empathetic, and situational, while literate discourse is subordinative, analytic, abstract, distanced, and context-free. The model learns to place text along this continuum from document-level annotations informed by 72 specific rhetorical markers (36 oral, 36 literate).
Citation
@misc{havelock2026regressor,
title={Havelock Orality Regressor},
author={Havelock AI},
year={2026},
url={https://huggingface.co/HavelockAI/bert-orality-regressor}
}
References
- Ong, Walter J. Orality and Literacy: The Technologizing of the Word. Routledge, 1982.
- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
- Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.
Trained: February 2026
- Downloads last month
- 15
Model tree for HavelockAI/bert-orality-regressor
Base model
answerdotai/ModernBERT-baseSpace using HavelockAI/bert-orality-regressor 1
Collection including HavelockAI/bert-orality-regressor
Evaluation results
- Mean Absolute Errorself-reported0.079
- R² Scoreself-reported0.748