Havelock Marker Category Classifier

ModernBERT-based binary classifier that determines whether a rhetorical span is oral or literate, grounded in Walter Ong's Orality and Literacy (1982).

This is the coarsest level of the Havelock span classification hierarchy. Given a text span that has been identified as a rhetorical marker, the model classifies it into one of two categories: oral (characteristic of spoken, performative discourse) or literate (characteristic of written, analytic discourse).

Model Details

Property Value
Base model answerdotai/ModernBERT-base
Architecture ModernBertForSequenceClassification
Task Binary classification
Labels 2 (oral, literate)
Max sequence length 128 tokens
Test F1 (macro) 0.804
Test Accuracy 0.825
Missing labels 0/2
Parameters ~149M

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "HavelockAI/bert-marker-category"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

span = "Tell me, O Muse, of that ingenious hero"
inputs = tokenizer(span, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs).logits
    pred = torch.argmax(logits, dim=1).item()

label_map = {0: "oral", 1: "literate"}
print(f"Category: {label_map[pred]}")

Training

Data

22,367 span-level annotations from the Havelock corpus with marker types normalized against a canonical taxonomy at build time. Spans are drawn from documents sourced from Project Gutenberg, textfiles.com, Reddit, and Wikipedia talk pages. A stratified 80/10/10 train/val/test split was used with swap-based optimization. The test set contains 1,609 spans (1,162 oral, 447 literate).

Hyperparameters

Parameter Value
Epochs 20
Batch size 16
Learning rate 2e-5
Optimizer AdamW (weight decay 0.01)
LR schedule Cosine with 10% warmup
Gradient clipping 1.0
Loss Focal loss (γ=2.0) + class weights
Mixout 0.1
Mixed precision FP16

Training Metrics

Best checkpoint selected at epoch 13 by missing-label-primary, F1-tiebreaker (0 missing, F1 0.850).

Click to show per-epoch metrics
Epoch Loss Val F1 F1 range
1 0.1231 0.815 0.786–0.843
2 0.0785 0.829 0.795–0.863
3 0.0599 0.835 0.804–0.866
4 0.0457 0.816 0.788–0.844
5 0.0356 0.826 0.794–0.857
6 0.0290 0.834 0.787–0.881
7 0.0235 0.836 0.802–0.869
8 0.0188 0.837 0.799–0.876
9 0.0175 0.840 0.805–0.875
10 0.0162 0.839 0.802–0.875
11 0.0115 0.834 0.796–0.872
12 0.0103 0.836 0.801–0.870
13 0.0097 0.850 0.812–0.887
14 0.0086 0.827 0.794–0.861
15 0.0075 0.835 0.799–0.871
16 0.0074 0.828 0.794–0.862
17 0.0071 0.830 0.796–0.863
18 0.0073 0.840 0.804–0.877
19 0.0068 0.843 0.806–0.880
20 0.0070 0.844 0.808–0.880

Test Set Classification Report

              precision    recall  f1-score   support

        oral      0.953     0.798     0.868      1162
    literate      0.631     0.897     0.741       447

    accuracy                          0.825      1609
   macro avg      0.792     0.847     0.804      1609
weighted avg      0.863     0.825     0.833      1609

The model achieves high precision on oral spans (0.953) and high recall on literate spans (0.897). The precision gap on literate (0.631) indicates some oral spans are misclassified as literate — expected given the class imbalance (72% oral in test).

Limitations

  • Class imbalance: The test set is 72% oral / 28% literate, reflecting the corpus distribution. Literate precision suffers as a result.
  • Span-level only: This model classifies pre-extracted spans. It does not detect span boundaries — pair it with a span detection model (e.g., HavelockAI/bert-token-classifier) for end-to-end use.
  • 128-token context window: Longer spans are truncated.
  • Domain: Trained on historical/literary and web text. Performance on other domains is untested.

Theoretical Background

The oral–literate distinction follows Ong's framework. Oral markers include features like direct address, formulaic phrasing, parataxis, repetition, and sound patterning. Literate markers include features like subordination, abstraction, hedging, passive constructions, and textual apparatus (citations, cross-references). This binary classifier serves as the top level of a three-tier taxonomy: category → type → subtype.

Related Models

Model Task Classes F1
This model Binary (oral/literate) 2 0.804
HavelockAI/bert-marker-type Functional type 18 0.573
HavelockAI/bert-marker-subtype Fine-grained subtype 71 0.493
HavelockAI/bert-orality-regressor Document-level score Regression MAE 0.079
HavelockAI/bert-token-classifier Span detection (BIO) 145 0.500

Citation

@misc{havelock2026category,
  title={Havelock Marker Category Classifier},
  author={Havelock AI},
  year={2026},
  url={https://huggingface.co/HavelockAI/bert-marker-category}
}

References

  • Ong, Walter J. Orality and Literacy: The Technologizing of the Word. Routledge, 1982.
  • Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
  • Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.

Trained: February 2026

Downloads last month
26
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HavelockAI/bert-marker-category

Finetuned
(1139)
this model

Collection including HavelockAI/bert-marker-category

Evaluation results