Havelock Marker Type Classifier

ModernBERT-based classifier for 18 rhetorical marker types on the oral–literate spectrum, grounded in Walter Ong's Orality and Literacy (1982).

This is the mid-level of the Havelock span classification hierarchy. Given a text span identified as a rhetorical marker, the model classifies it into one of 18 functional types (e.g., repetition, subordination, direct_address, hedging_qualification).

Model Details

Property	Value
Base model	`answerdotai/ModernBERT-base`
Architecture	`ModernBertForSequenceClassification`
Task	Multi-class classification (18 classes)
Max sequence length	128 tokens
Test F1 (macro)	0.573
Test Accuracy	0.584
Missing labels	0/18
Parameters	~149M

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "HavelockAI/bert-marker-type"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

span = "whether or not the underlying assumptions hold true"
inputs = tokenizer(span, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs).logits
    pred = torch.argmax(logits, dim=1).item()

print(f"Marker type: {model.config.id2label[pred]}")

Label Taxonomy (18 types)

The 18 types group fine-grained subtypes into functional families. Prior versions carried spurious label variants (e.g., hedging alongside hedging_qualification, passive alongside passive_agentless) introduced by inconsistent upstream annotation. These have been resolved via a canonical taxonomy with normalization and validation at build time.

Oral Types (10)	Literate Types (8)
`direct_address`	`subordination`
`repetition`	`abstraction`
`formulaic_phrases`	`hedging_qualification`
`parallelism`	`analytical_distance`
`parataxis`	`logical_connectives`
`sound_patterns`	`textual_apparatus`
`performance_markers`	`literate_feature`
`concrete_situational`	`passive_agentless`
`agonistic_framing`
`oral_feature`

Training

Data

22,367 span-level annotations from the Havelock corpus. Each span carries a marker_type field normalized against a canonical taxonomy at build time. A stratified 80/10/10 train/val/test split was used with swap-based optimization to balance label distributions across splits. The test set contains 2,178 spans.

Hyperparameters

Parameter	Value
Epochs	20
Batch size	16
Learning rate	3e-5
Optimizer	AdamW (weight decay 0.01)
LR schedule	Cosine with 10% warmup
Gradient clipping	1.0
Loss	Focal loss (γ=2.0) + class weights
Label smoothing	0.0
Mixout	0.1
Mixed precision	FP16
Min examples per class	50

Training Metrics

Best checkpoint selected at epoch 15 by missing-label-primary, F1-tiebreaker (0 missing, F1 0.590).

Test Set Classification Report

Click to expand per-class precision/recall/F1/support

``` precision recall f1-score support

      abstraction      0.368     0.658     0.472       117
agonistic_framing      0.857     0.750     0.800        32

analytical_distance 0.504 0.475 0.489 120 concrete_situational 0.509 0.385 0.438 143 direct_address 0.671 0.689 0.680 367 formulaic_phrases 0.205 0.608 0.307 51 hedging_qualification 0.600 0.500 0.545 114 literate_feature 0.478 0.833 0.608 66 logical_connectives 0.621 0.516 0.564 124 oral_feature 0.784 0.365 0.498 159 parallelism 0.688 0.579 0.629 19 parataxis 0.655 0.387 0.486 93 passive_agentless 0.721 0.500 0.590 62 performance_markers 0.660 0.403 0.500 77 repetition 0.738 0.705 0.721 156 sound_patterns 0.672 0.623 0.647 69 subordination 0.622 0.689 0.654 296 textual_apparatus 0.718 0.655 0.685 113

         accuracy                          0.584      2178
        macro avg      0.615     0.573     0.573      2178
     weighted avg      0.624     0.584     0.587      2178


</details>

**Top performing types (F1 ≥ 0.65):** `agonistic_framing` (0.800), `repetition` (0.721), `textual_apparatus` (0.685), `direct_address` (0.680), `subordination` (0.654), `sound_patterns` (0.647), `parallelism` (0.629), `literate_feature` (0.608).

**Weakest types (F1 < 0.50):** `formulaic_phrases` (0.307), `concrete_situational` (0.438), `abstraction` (0.472), `parataxis` (0.486), `oral_feature` (0.498). `formulaic_phrases` suffers from severe precision collapse (P=0.205) despite reasonable recall, suggesting heavy confusion with other oral types. `oral_feature` shows the inverse pattern (P=0.784, R=0.365) — the model is confident but conservative.

## Class Distribution

| Support Range | Classes | Examples |
|---------------|---------|----------|
| >2500 | `direct_address`, `subordination`, `abstraction` | 3 |
| 1000–2500 | `repetition`, `formulaic_phrases`, `hedging_qualification`, `analytical_distance`, `concrete_situational`, `logical_connectives`, `textual_apparatus` | 7 |
| 500–1000 | `sound_patterns`, `passive_agentless`, `performance_markers`, `parataxis`, `literate_feature`, `oral_feature` | 6 |
| <500 | `agonistic_framing`, `parallelism` | 2 |

## Limitations

- **Class imbalance**: `direct_address` has 367 test examples while `parallelism` has 19. Weighted F1 (0.587) is close to macro F1 (0.573), indicating reasonably balanced performance, but rare types remain harder.
- **Span-level only**: Requires pre-extracted spans. Does not detect boundaries.
- **128-token context window**: Longer spans are truncated.
- **Abstraction underperforms**: At 0.472 F1 despite being a large class (117 test spans), suggesting the type may be too broad or overlapping with `analytical_distance` and `literate_feature`.
- **Precision-recall asymmetry**: Several types show strong precision–recall imbalance (`oral_feature` P=0.784/R=0.365; `formulaic_phrases` P=0.205/R=0.608), indicating the focal loss weighting could be further tuned.

## Theoretical Background

The type level captures functional groupings within the oral–literate framework. Oral types reflect Ong's characterization of oral discourse as additive (`parataxis`), aggregative (`formulaic_phrases`), redundant (`repetition`), agonistically toned (`agonistic_framing`), empathetic and participatory (`direct_address`), and close to the human lifeworld (`concrete_situational`). Literate types capture the analytic (`abstraction`, `subordination`), distanced (`analytical_distance`, `passive_agentless`), and self-referential (`textual_apparatus`) qualities of written discourse.

## Related Models

| Model | Task | Classes | F1 |
|-------|------|---------|-----|
| [`HavelockAI/bert-marker-category`](https://huggingface.co/HavelockAI/bert-marker-category) | Binary (oral/literate) | 2 | 0.875 |
| **This model** | Functional type | 18 | 0.573 |
| [`HavelockAI/bert-marker-subtype`](https://huggingface.co/HavelockAI/bert-marker-subtype) | Fine-grained subtype | 71 | 0.493 |
| [`HavelockAI/bert-orality-regressor`](https://huggingface.co/HavelockAI/bert-orality-regressor) | Document-level score | Regression | MAE 0.079 |
| [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier) | Span detection (BIO) | 145 | 0.500 |

## Citation
```bibtex
@misc{havelock2026type,
  title={Havelock Marker Type Classifier},
  author={Havelock AI},
  year={2026},
  url={https://huggingface.co/HavelockAI/bert-marker-type}
}

References

Ong, Walter J. Orality and Literacy: The Technologizing of the Word. Routledge, 1982.
Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.

Trained: February 2026