Havelock Marker Subtype Classifier

ModernBERT-based classifier for 71 fine-grained rhetorical marker subtypes on the oral–literate spectrum, grounded in Walter Ong's Orality and Literacy (1982).

This is the finest level of the Havelock span classification hierarchy. Given a text span identified as a rhetorical marker, the model classifies it into one of 71 specific rhetorical devices (e.g., anaphora, epistemic_hedge, vocative, nested_clauses).

Model Details

Property	Value
Base model	`answerdotai/ModernBERT-base`
Architecture	`ModernBertForSequenceClassification`
Task	Multi-class classification (71 classes)
Max sequence length	128 tokens
Test F1 (macro)	0.493
Test Accuracy	0.500
Missing labels (test)	1/71 (`proverb`)
Parameters	~149M

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "HavelockAI/bert-marker-subtype"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

span = "it seems likely that this would, in principle, be feasible"
inputs = tokenizer(span, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs).logits
    pred = torch.argmax(logits, dim=1).item()

print(f"Marker subtype: {model.config.id2label[pred]}")

Label Taxonomy (71 subtypes)

Oral Subtypes (36)

Category	Subtypes
Repetition & Pattern	`anaphora`, `epistrophe`, `parallelism`, `tricolon`, `lexical_repetition`, `refrain`
Sound & Rhythm	`alliteration`, `assonance`, `rhyme`, `rhythm`
Address & Interaction	`vocative`, `imperative`, `second_person`, `inclusive_we`, `rhetorical_question`, `audience_response`, `phatic_check`, `phatic_filler`
Conjunction	`polysyndeton`, `asyndeton`, `simple_conjunction`
Formulas	`discourse_formula`, `proverb`, `religious_formula`, `epithet`
Narrative	`named_individual`, `specific_place`, `temporal_anchor`, `sensory_detail`, `embodied_action`, `everyday_example`
Performance	`dramatic_pause`, `self_correction`, `conflict_frame`, `us_them`, `intensifier_doubling`, `antithesis`

Literate Subtypes (35)

Category	Subtypes
Abstraction	`nominalization`, `abstract_noun`, `conceptual_metaphor`, `categorical_statement`
Syntax	`nested_clauses`, `relative_chain`, `conditional`, `concessive`, `temporal_embedding`, `causal_chain`
Hedging	`epistemic_hedge`, `probability`, `evidential`, `qualified_assertion`, `concessive_connector`
Impersonality	`agentless_passive`, `agent_demoted`, `institutional_subject`, `objectifying_stance`, `third_person_reference`
Scholarly Apparatus	`citation`, `footnote_reference`, `cross_reference`, `metadiscourse`, `methodological_framing`
Technical	`technical_term`, `technical_abbreviation`, `enumeration`, `list_structure`, `definitional_move`
Connectives	`contrastive`, `causal_explicit`, `additive_formal`, `aside`

Training

Data

22,367 span-level annotations from the Havelock corpus with marker types normalized against a canonical taxonomy at build time. Each span carries a marker_subtype field. Only subtypes with ≥10 examples are included. A stratified 80/10/10 train/val/test split was used with swap-based optimization to balance label distributions across splits. The test set contains 2,357 spans.

Hyperparameters

Parameter	Value
Epochs	20
Batch size	16
Learning rate	3e-5
Optimizer	AdamW (weight decay 0.01)
LR schedule	Cosine with 10% warmup
Gradient clipping	1.0
Loss	Focal loss (γ=2.0) + class weights
Mixout	0.1
Mixed precision	FP16
Min examples per class	10

Training Metrics

Best checkpoint selected at epoch 15 by missing-label-primary, F1-tiebreaker (0 missing, F1 0.486).

Test Set Classification Report

Click to expand per-class precision/recall/F1/support

``` precision recall f1-score support

     abstract_noun      0.408     0.330     0.365        88
   additive_formal      0.286     0.167     0.211        12
     agent_demoted      0.667     1.000     0.800        10
 agentless_passive      0.583     0.491     0.533        57
      alliteration      0.500     0.200     0.286        10
          anaphora      0.500     0.537     0.518        41
        antithesis      0.947     0.818     0.878        22
             aside      0.615     0.216     0.320        37
         assonance      1.000     0.960     0.980        25
         asyndeton      0.636     0.500     0.560        14
 audience_response      1.000     0.800     0.889        10

categorical_statement 0.103 0.200 0.136 20 causal_chain 0.442 0.452 0.447 42 causal_explicit 0.400 0.468 0.431 47 citation 0.743 0.565 0.642 46 conceptual_metaphor 0.065 0.051 0.057 39 concessive 0.595 0.556 0.575 45 concessive_connector 0.882 0.833 0.857 18 conditional 0.596 0.609 0.602 87 conflict_frame 0.733 0.733 0.733 15 contrastive 0.533 0.525 0.529 61 cross_reference 0.733 0.458 0.564 24 definitional_move 0.286 0.200 0.235 10 discourse_formula 0.405 0.508 0.451 118 dramatic_pause 0.833 0.500 0.625 10 embodied_action 0.375 0.214 0.273 42 enumeration 0.510 0.605 0.553 43 epistemic_hedge 0.102 0.357 0.159 14 epistrophe 0.824 0.875 0.848 16 epithet 0.333 0.250 0.286 12 everyday_example 0.312 0.179 0.227 28 evidential 0.667 0.432 0.525 37 footnote_reference 0.417 0.500 0.455 10 imperative 0.645 0.600 0.622 100 inclusive_we 0.630 0.576 0.602 59 institutional_subject 0.938 0.714 0.811 21 intensifier_doubling 0.944 0.773 0.850 22 lexical_repetition 0.417 0.556 0.476 45 list_structure 0.267 0.174 0.211 23 metadiscourse 0.085 0.182 0.116 22 methodological_framing 0.500 0.190 0.276 21 named_individual 0.500 0.300 0.375 30 nested_clauses 0.500 0.348 0.410 46 nominalization 0.288 0.304 0.296 56 objectifying_stance 0.267 0.400 0.320 10 parallelism 0.350 0.259 0.298 27 phatic_check 0.500 0.364 0.421 11 phatic_filler 0.333 0.800 0.471 10 polysyndeton 1.000 0.792 0.884 24 probability 0.500 0.455 0.476 22 proverb 0.000 0.000 0.000 10 qualified_assertion 0.250 0.241 0.246 29 refrain 0.944 0.708 0.810 24 relative_chain 0.350 0.509 0.415 55 religious_formula 0.857 0.750 0.800 16 rhetorical_question 0.688 0.762 0.723 84 rhyme 0.231 0.300 0.261 10 rhythm 0.909 0.625 0.741 16 second_person 0.571 0.586 0.579 116 self_correction 0.821 0.575 0.676 40 sensory_detail 0.364 0.200 0.258 20 simple_conjunction 0.167 0.300 0.214 10 specific_place 0.400 0.222 0.286 18 technical_abbreviation 0.900 0.321 0.474 28 technical_term 0.426 0.703 0.531 74 temporal_anchor 0.396 0.618 0.483 34 temporal_embedding 0.500 0.562 0.529 48 third_person_reference 0.700 0.700 0.700 10 tricolon 0.611 0.611 0.611 18 us_them 0.733 0.611 0.667 18 vocative 0.462 0.600 0.522 20

          accuracy                          0.500      2357
         macro avg      0.535     0.484     0.493      2357
      weighted avg      0.532     0.500     0.503      2357


</details>

**Top performing subtypes (F1 ≥ 0.75):** `assonance` (0.980), `polysyndeton` (0.884), `antithesis` (0.878), `concessive_connector` (0.857), `intensifier_doubling` (0.850), `epistrophe` (0.848), `audience_response` (0.889), `institutional_subject` (0.811), `refrain` (0.810), `agent_demoted` (0.800), `religious_formula` (0.800), `conflict_frame` (0.733), `rhythm` (0.741), `rhetorical_question` (0.723).

**Weakest subtypes (F1 < 0.20):** `proverb` (0.000), `conceptual_metaphor` (0.057), `metadiscourse` (0.116), `categorical_statement` (0.136), `epistemic_hedge` (0.159). These tend to be semantically diffuse classes that overlap heavily with neighbouring subtypes or have very low test support.

## Class Distribution

The training set exhibits significant imbalance across 71 classes:

| Support Range | Example Classes | Count |
|---------------|-----------------|-------|
| >1000 | `discourse_formula`, `second_person` | 2 |
| 500–1000 | `conditional`, `rhetorical_question`, `technical_term`, `imperative` | 8 |
| 200–500 | `abstract_noun`, `contrastive`, `inclusive_we`, `nominalization` | 27 |
| 100–200 | `alliteration`, `antithesis`, `asyndeton`, `epistrophe`, `refrain` | 30 |
| <100 | `footnote_reference`, `phatic_check`, `technical_abbreviation` | 4 |

## Limitations

- **71-way classification on ~22k spans**: The data budget per class is thin, particularly for classes near the minimum. More data or class consolidation would help.
- **Semantic overlap**: Some subtypes are difficult to distinguish from surface text alone (e.g., `parallelism` vs `anaphora` vs `tricolon`; `epistemic_hedge` vs `qualified_assertion` vs `probability`). The model may benefit from hierarchical classification that conditions on type-level predictions.
- **Recall-precision tradeoff on rare classes**: Many rare classes show high precision but lower recall (e.g., `self_correction`: P=0.821, R=0.575; `technical_abbreviation`: P=0.900, R=0.321), suggesting the model learns narrow prototypes but misses variation.
- **Span-level only**: Requires pre-extracted spans. Does not detect boundaries.
- **128-token context window**: Longer spans are truncated.

## Theoretical Background

The 71 subtypes represent the full granularity of the Havelock taxonomy, operationalizing Ong's oral–literate framework into specific, annotatable rhetorical devices. Oral subtypes capture the textural signatures of spoken and performative discourse: repetitive structures (`anaphora`, `epistrophe`, `tricolon`), sound patterning (`alliteration`, `assonance`, `rhythm`), direct audience engagement (`vocative`, `imperative`, `rhetorical_question`), and formulas (`proverb`, `epithet`, `discourse_formula`). Literate subtypes capture the apparatus of analytic prose: complex syntax (`nested_clauses`, `relative_chain`, `conditional`), epistemic positioning (`epistemic_hedge`, `evidential`, `probability`), impersonal voice (`agentless_passive`, `institutional_subject`), and scholarly machinery (`citation`, `footnote_reference`, `metadiscourse`).

## Related Models

| Model | Task | Classes | F1 |
|-------|------|---------|-----|
| [`HavelockAI/bert-marker-category`](https://huggingface.co/HavelockAI/bert-marker-category) | Binary (oral/literate) | 2 | 0.875 |
| [`HavelockAI/bert-marker-type`](https://huggingface.co/HavelockAI/bert-marker-type) | Functional type | 18 | 0.583 |
| **This model** | Fine-grained subtype | 71 | 0.493 |
| [`HavelockAI/bert-orality-regressor`](https://huggingface.co/HavelockAI/bert-orality-regressor) | Document-level score | Regression | MAE 0.079 |
| [`HavelockAI/bert-token-classifier`](https://huggingface.co/HavelockAI/bert-token-classifier) | Span detection (BIO) | 145 | 0.500 |

## Citation
```bibtex
@misc{havelock2026subtype,
  title={Havelock Marker Subtype Classifier},
  author={Havelock AI},
  year={2026},
  url={https://huggingface.co/HavelockAI/bert-marker-subtype}
}

References

Ong, Walter J. Orality and Literacy: The Technologizing of the Word. Routledge, 1982.
Lee, C. et al. "Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models." ICLR 2020.
Warner, A. et al. "Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference." 2024.

Trained: February 2026