Havelock Marker Subtype Classifier
ModernBERT-based classifier for 71 fine-grained rhetorical marker subtypes on the oral–literate spectrum, grounded in Walter Ong's Orality and Literacy (1982).
This is the finest level of the Havelock span classification hierarchy. Given a text span identified as a rhetorical marker, the model classifies it into one of 71 specific rhetorical devices (e.g., anaphora, epistemic_hedge, vocative, nested_clauses).
Model Details
| Property | Value |
|---|---|
| Base model | answerdotai/ModernBERT-base |
| Architecture | ModernBertForSequenceClassification |
| Task | Multi-class classification (71 classes) |
| Max sequence length | 128 tokens |
| Test F1 (macro) | 0.493 |
| Test Accuracy | 0.500 |
| Missing labels (test) | 1/71 (proverb) |
| Parameters | ~149M |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "HavelockAI/bert-marker-subtype"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
span = "it seems likely that this would, in principle, be feasible"
inputs = tokenizer(span, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
pred = torch.argmax(logits, dim=1).item()
print(f"Marker subtype: {model.config.id2label[pred]}")
Label Taxonomy (71 subtypes)
Oral Subtypes (36)
| Category | Subtypes |
|---|---|
| Repetition & Pattern | anaphora, epistrophe, parallelism, tricolon, lexical_repetition, refrain |
| Sound & Rhythm | alliteration, assonance, rhyme, rhythm |
| Address & Interaction | vocative, imperative, second_person, inclusive_we, rhetorical_question, audience_response, phatic_check, phatic_filler |
| Conjunction | polysyndeton, asyndeton, simple_conjunction |
| Formulas | discourse_formula, proverb, religious_formula, epithet |
| Narrative | named_individual, specific_place, temporal_anchor, sensory_detail, embodied_action, everyday_example |
| Performance | dramatic_pause, self_correction, conflict_frame, us_them, intensifier_doubling, antithesis |
Literate Subtypes (35)
| Category | Subtypes |
|---|---|
| Abstraction | nominalization, abstract_noun, conceptual_metaphor, categorical_statement |
| Syntax | nested_clauses, relative_chain, conditional, concessive, temporal_embedding, causal_chain |
| Hedging | epistemic_hedge, probability, evidential, qualified_assertion, concessive_connector |
| Impersonality | agentless_passive, agent_demoted, institutional_subject, objectifying_stance, third_person_reference |
| Scholarly Apparatus | citation, footnote_reference, cross_reference, metadiscourse, methodological_framing |
| Technical | technical_term, technical_abbreviation, enumeration, list_structure, definitional_move |
| Connectives | contrastive, causal_explicit, additive_formal, aside |
Training
Data
22,367 span-level annotations from the Havelock corpus with marker types normalized against a canonical taxonomy at build time. Each span carries a marker_subtype field. Only subtypes with ≥10 examples are included. A stratified 80/10/10 train/val/test split was used with swap-based optimization to balance label distributions across splits. The test set contains 2,357 spans.
Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 20 |
| Batch size | 16 |
| Learning rate | 3e-5 |
| Optimizer | AdamW (weight decay 0.01) |
| LR schedule | Cosine with 10% warmup |
| Gradient clipping | 1.0 |
| Loss | Focal loss (γ=2.0) + class weights |
| Mixout | 0.1 |
| Mixed precision | FP16 |
| Min examples per class | 10 |
Training Metrics
Best checkpoint selected at epoch 15 by missing-label-primary, F1-tiebreaker (0 missing, F1 0.486).
Test Set Classification Report
Click to expand per-class precision/recall/F1/support
``` precision recall f1-score support abstract_noun 0.408 0.330 0.365 88
additive_formal 0.286 0.167 0.211 12
agent_demoted 0.667 1.000 0.800 10
agentless_passive 0.583 0.491 0.533 57
alliteration 0.500 0.200 0.286 10
anaphora 0.500 0.537 0.518 41
antithesis 0.947 0.818 0.878 22
aside 0.615 0.216 0.320 37
assonance 1.000 0.960 0.980 25
asyndeton 0.636 0.500 0.560 14
audience_response 1.000 0.800 0.889 10
categorical_statement 0.103 0.200 0.136 20 causal_chain 0.442 0.452 0.447 42 causal_explicit 0.400 0.468 0.431 47 citation 0.743 0.565 0.642 46 conceptual_metaphor 0.065 0.051 0.057 39 concessive 0.595 0.556 0.575 45 concessive_connector 0.882 0.833 0.857 18 conditional 0.596 0.609 0.602 87 conflict_frame 0.733 0.733 0.733 15 contrastive 0.533 0.525 0.529 61 cross_reference 0.733 0.458 0.564 24 definitional_move 0.286 0.200 0.235 10 discourse_formula 0.405 0.508 0.451 118 dramatic_pause 0.833 0.500 0.625 10 embodied_action 0.375 0.214 0.273 42 enumeration 0.510 0.605 0.553 43 epistemic_hedge 0.102 0.357 0.159 14 epistrophe 0.824 0.875 0.848 16 epithet 0.333 0.250 0.286 12 everyday_example 0.312 0.179 0.227 28 evidential 0.667 0.432 0.525 37 footnote_reference 0.417 0.500 0.455 10 imperative 0.645 0.600 0.622 100 inclusive_we 0.630 0.576 0.602 59 institutional_subject 0.938 0.714 0.811 21 intensifier_doubling 0.944 0.773 0.850 22 lexical_repetition 0.417 0.556 0.476 45 list_structure 0.267 0.174 0.211 23 metadiscourse 0.085 0.182 0.116 22 methodological_framing 0.500 0.190 0.276 21 named_individual 0.500 0.300 0.375 30 nested_clauses 0.500 0.348 0.410 46 nominalization 0.288 0.304 0.296 56 objectifying_stance 0.267 0.400 0.320 10 parallelism 0.350 0.259 0.298 27 phatic_check 0.500 0.364 0.421 11 phatic_filler 0.333 0.800 0.471 10 polysyndeton 1.000 0.792 0.884 24 probability 0.500 0.455 0.476 22 proverb 0.000 0.000 0.000 10 qualified_assertion 0.250 0.241 0.246 29 refrain 0.944 0.708 0.810 24 relative_chain 0.350 0.509 0.415 55 religious_formula 0.857 0.750 0.800 16 rhetorical_question 0.688 0.762 0.723 84 rhyme 0.231 0.300 0.261 10 rhythm 0.909 0.625 0.741 16 second_person 0.571 0.586 0.579 116 self_correction 0.821 0.575 0.676 40 sensory_detail 0.364 0.200 0.258 20 simple_conjunction 0.167 0.300 0.214 10 specific_place 0.400 0.222 0.286 18 technical_abbreviation 0.900 0.321 0.474 28 technical_term 0.426 0.703 0.531 74 temporal_anchor 0.396 0.618 0.483 34 temporal_embedding 0.500 0.562 0.529 48 third_person_reference 0.700 0.700 0.700 10 tricolon 0.611 0.611 0.611 18 us_them 0.733 0.611 0.667 18 vocative 0.462 0.600 0.522 20
accuracy 0.500 2357
macro avg 0.535 0.484 0.493 2357
weighted avg 0.532 0.500 0.503 2357
</details>
**Top performing subtypes (F1 ≥ 0.75):** `assonance` (0.980), `polysyndeton` (0.884), `antithesis` (0.878), `concessive_connector` (0.857), `intensifier_doubling` (0.850), `epistrophe` (0.848), `audience_response` (0.889), `institutional_subject` (0.811), `refrain` (0.810), `agent_demoted` (0.800), `religious_formula` (0.800), `conflict_frame` (0.733), `rhythm` (0.741), `rhetorical_question` (0.723).
**Weakest subtypes (F1 < 0.20):** `proverb` (0.000), `conceptual_metaphor` (0.057), `metadiscourse` (0.116), `categorical_statement` (0.136), `epistemic_hedge` (0.159). These tend to be semantically diffuse classes that overlap heavily with neighbouring subtypes or have very low test support.
## Class Distribution
The training set exhibits significant imbalance across 71 classes:
| Support Range | Example Classes | Count |
|---------------|-----------------|-------|
| >1000 | `discourse_formula`, `second_person` | 2 |
| 500–1000 | `conditional`, `rhetorical_question`, `technical_term`, `imperative` | 8 |
| 200–500 | `abstract_noun`, `contrastive`, `inclusive_we`, `nominalization` | 27 |
| 100–200 | `alliteration`, `antithesis`, `asyndeton`, `epistrophe`, `refrain` | 30 |
| <100 | `footnote_reference`, `phatic_check`, `technical_abbreviation` | 4 |
## Limitations
- **71-way classification on ~22k spans**: The data budget per class is thin, particularly for classes near the minimum. More data or class consolidation would help.
- **Semantic overlap**: Some subtypes are difficult to distinguish from surface text alone (e.g., `parallelism` vs `anaphora` vs `tricolon`; `epistemic_hedge` vs `qualified_assertion` vs `probability`). The model may benefit from hierarchical classification that conditions on type-level predictions.
- **Recall-precision tradeoff on rare classes**: Many rare classes show high precision but lower recall (e.g., `self_correction`: P=0.821, R=0.575; `technical_abbreviation`: P=0.900, R=0.321), suggesting the model learns narrow prototypes but misses variation.
- **Span-level only**: Requires pre-extracted spans. Does not detect boundaries.
- **128-token context window**: Longer spans are truncated.
## Theoretical Background
The 71 subtypes represent the full granularity of the Havelock taxonomy, operationalizing Ong's oral–literate framework into specific, annotatable rhetorical devices. Oral subtypes capture the textural signatures of spoken and performative discourse: repetitive structures (`anaphora`, `epistrophe`, `tricolon`), sound patterning (`alliteration`, `assonance`, `rhythm`), direct audience engagement (`vocative`, `imperative`, `rhetorical_question`), and formulas (`proverb`, `epithet`, `discourse_formula`). Literate subtypes capture the apparatus of analytic prose: complex syntax (`nested_clauses`, `relative_chain`, `conditional`), epistemic positioning (`epistemic_hedge`, `evidential`, `probability`), impersonal voice (`agentless_passive`, `institutional_subject`), and scholarly machinery (`citation`, `footnote_reference`, `metadiscourse`).
## Related Models
| Model | Task | Classes | F1 |
|-------|------|---------|-----|
| [`HavelockAI/bert-marker-category`](https://huggingface.co/HavelockAI/bert-marker-category) | Binary (oral/literate) | 2 | 0.875 |
| [`HavelockAI/bert-marker-type`](https://huggingface.co/HavelockAI/bert-marker-type) | Functional type | 18 | 0.583 |
| **This model** | Fine-grained subtype | 71 | 0.493 |
| [`HavelockAI/bert-orality-regressor`](https://huggingface.co/HavelockAI/bert-orality-regressor) | Document-level score | Regression | MAE 0.079 |
| [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier) | Span detection (BIO) | 145 | 0.500 |
## Citation
```bibtex
@misc{havelock2026subtype,
title={Havelock Marker Subtype Classifier},
author={Havelock AI},
year={2026},
url={https://huggingface.co/HavelockAI/bert-marker-subtype}
}
References
- Ong, Walter J. Orality and Literacy: The Technologizing of the Word. Routledge, 1982.
- Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
- Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.
Trained: February 2026
- Downloads last month
- 23
Model tree for HavelockAI/bert-marker-subtype
Base model
answerdotai/ModernBERT-baseSpace using HavelockAI/bert-marker-subtype 1
Collection including HavelockAI/bert-marker-subtype
Evaluation results
- F1 (macro)self-reported0.493
- Accuracyself-reported0.500