Havelock Marker Type Classifier
ModernBERT-based classifier for 18 rhetorical marker types on the oral–literate spectrum, grounded in Walter Ong's Orality and Literacy (1982).
This is the mid-level of the Havelock span classification hierarchy. Given a text span identified as a rhetorical marker, the model classifies it into one of 18 functional types (e.g., repetition, subordination, direct_address, hedging_qualification).
Model Details
| Property | Value |
|---|---|
| Base model | answerdotai/ModernBERT-base |
| Architecture | ModernBertForSequenceClassification |
| Task | Multi-class classification (18 classes) |
| Max sequence length | 128 tokens |
| Test F1 (macro) | 0.573 |
| Test Accuracy | 0.584 |
| Missing labels | 0/18 |
| Parameters | ~149M |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "HavelockAI/bert-marker-type"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
span = "whether or not the underlying assumptions hold true"
inputs = tokenizer(span, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
pred = torch.argmax(logits, dim=1).item()
print(f"Marker type: {model.config.id2label[pred]}")
Label Taxonomy (18 types)
The 18 types group fine-grained subtypes into functional families. Prior versions carried spurious label variants (e.g., hedging alongside hedging_qualification, passive alongside passive_agentless) introduced by inconsistent upstream annotation. These have been resolved via a canonical taxonomy with normalization and validation at build time.
| Oral Types (10) | Literate Types (8) |
|---|---|
direct_address |
subordination |
repetition |
abstraction |
formulaic_phrases |
hedging_qualification |
parallelism |
analytical_distance |
parataxis |
logical_connectives |
sound_patterns |
textual_apparatus |
performance_markers |
literate_feature |
concrete_situational |
passive_agentless |
agonistic_framing |
|
oral_feature |
Training
Data
22,367 span-level annotations from the Havelock corpus. Each span carries a marker_type field normalized against a canonical taxonomy at build time. A stratified 80/10/10 train/val/test split was used with swap-based optimization to balance label distributions across splits. The test set contains 2,178 spans.
Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 20 |
| Batch size | 16 |
| Learning rate | 3e-5 |
| Optimizer | AdamW (weight decay 0.01) |
| LR schedule | Cosine with 10% warmup |
| Gradient clipping | 1.0 |
| Loss | Focal loss (γ=2.0) + class weights |
| Label smoothing | 0.0 |
| Mixout | 0.1 |
| Mixed precision | FP16 |
| Min examples per class | 50 |
Training Metrics
Best checkpoint selected at epoch 15 by missing-label-primary, F1-tiebreaker (0 missing, F1 0.590).
Test Set Classification Report
Click to expand per-class precision/recall/F1/support
``` precision recall f1-score support abstraction 0.368 0.658 0.472 117
agonistic_framing 0.857 0.750 0.800 32
analytical_distance 0.504 0.475 0.489 120 concrete_situational 0.509 0.385 0.438 143 direct_address 0.671 0.689 0.680 367 formulaic_phrases 0.205 0.608 0.307 51 hedging_qualification 0.600 0.500 0.545 114 literate_feature 0.478 0.833 0.608 66 logical_connectives 0.621 0.516 0.564 124 oral_feature 0.784 0.365 0.498 159 parallelism 0.688 0.579 0.629 19 parataxis 0.655 0.387 0.486 93 passive_agentless 0.721 0.500 0.590 62 performance_markers 0.660 0.403 0.500 77 repetition 0.738 0.705 0.721 156 sound_patterns 0.672 0.623 0.647 69 subordination 0.622 0.689 0.654 296 textual_apparatus 0.718 0.655 0.685 113
accuracy 0.584 2178
macro avg 0.615 0.573 0.573 2178
weighted avg 0.624 0.584 0.587 2178
</details>
**Top performing types (F1 ≥ 0.65):** `agonistic_framing` (0.800), `repetition` (0.721), `textual_apparatus` (0.685), `direct_address` (0.680), `subordination` (0.654), `sound_patterns` (0.647), `parallelism` (0.629), `literate_feature` (0.608).
**Weakest types (F1 < 0.50):** `formulaic_phrases` (0.307), `concrete_situational` (0.438), `abstraction` (0.472), `parataxis` (0.486), `oral_feature` (0.498). `formulaic_phrases` suffers from severe precision collapse (P=0.205) despite reasonable recall, suggesting heavy confusion with other oral types. `oral_feature` shows the inverse pattern (P=0.784, R=0.365) — the model is confident but conservative.
## Class Distribution
| Support Range | Classes | Examples |
|---------------|---------|----------|
| >2500 | `direct_address`, `subordination`, `abstraction` | 3 |
| 1000–2500 | `repetition`, `formulaic_phrases`, `hedging_qualification`, `analytical_distance`, `concrete_situational`, `logical_connectives`, `textual_apparatus` | 7 |
| 500–1000 | `sound_patterns`, `passive_agentless`, `performance_markers`, `parataxis`, `literate_feature`, `oral_feature` | 6 |
| <500 | `agonistic_framing`, `parallelism` | 2 |
## Limitations
- **Class imbalance**: `direct_address` has 367 test examples while `parallelism` has 19. Weighted F1 (0.587) is close to macro F1 (0.573), indicating reasonably balanced performance, but rare types remain harder.
- **Span-level only**: Requires pre-extracted spans. Does not detect boundaries.
- **128-token context window**: Longer spans are truncated.
- **Abstraction underperforms**: At 0.472 F1 despite being a large class (117 test spans), suggesting the type may be too broad or overlapping with `analytical_distance` and `literate_feature`.
- **Precision-recall asymmetry**: Several types show strong precision–recall imbalance (`oral_feature` P=0.784/R=0.365; `formulaic_phrases` P=0.205/R=0.608), indicating the focal loss weighting could be further tuned.
## Theoretical Background
The type level captures functional groupings within the oral–literate framework. Oral types reflect Ong's characterization of oral discourse as additive (`parataxis`), aggregative (`formulaic_phrases`), redundant (`repetition`), agonistically toned (`agonistic_framing`), empathetic and participatory (`direct_address`), and close to the human lifeworld (`concrete_situational`). Literate types capture the analytic (`abstraction`, `subordination`), distanced (`analytical_distance`, `passive_agentless`), and self-referential (`textual_apparatus`) qualities of written discourse.
## Related Models
| Model | Task | Classes | F1 |
|-------|------|---------|-----|
| [`HavelockAI/bert-marker-category`](https://huggingface.co/HavelockAI/bert-marker-category) | Binary (oral/literate) | 2 | 0.875 |
| **This model** | Functional type | 18 | 0.573 |
| [`HavelockAI/bert-marker-subtype`](https://huggingface.co/HavelockAI/bert-marker-subtype) | Fine-grained subtype | 71 | 0.493 |
| [`HavelockAI/bert-orality-regressor`](https://huggingface.co/HavelockAI/bert-orality-regressor) | Document-level score | Regression | MAE 0.079 |
| [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier) | Span detection (BIO) | 145 | 0.500 |
## Citation
```bibtex
@misc{havelock2026type,
title={Havelock Marker Type Classifier},
author={Havelock AI},
year={2026},
url={https://huggingface.co/HavelockAI/bert-marker-type}
}
References
- Ong, Walter J. Orality and Literacy: The Technologizing of the Word. Routledge, 1982.
- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
- Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.
Trained: February 2026
- Downloads last month
- 33
Model tree for HavelockAI/bert-marker-type
Base model
answerdotai/ModernBERT-baseSpace using HavelockAI/bert-marker-type 1
Collection including HavelockAI/bert-marker-type
Evaluation results
- F1 (macro)self-reported0.573
- Accuracyself-reported0.584