Model Card for ModernBERT-large fine-tuned on CoNLL-2003 (NER)

A Named Entity Recognition model based on answerdotai/ModernBERT-large, fine-tuned on the English CoNLL-2003 dataset. It identifies and classifies entities into four types: Person, Organization, Location, and Miscellaneous.

Model Details

Base model: answerdotai/ModernBERT-large
Task: Token classification (NER)
Dataset: lhoestq/conll2003 (CoNLL-2003 English)
Number of labels: 9 (BIO format)
- O (0)
- B-PER (1), I-PER (2)
- B-ORG (3), I-ORG (4)
- B-LOC (5), I-LOC (6)
- B-MISC (7), I-MISC (8)
Training procedure: Fine-tuning with Optuna hyperparameter search (20 trials)
Evaluation metric: seqeval (overall precision, recall, F1, accuracy)

Label Mapping

Label ID	Entity Tag
0	O
1	B-PER
2	I-PER
3	B-ORG
4	I-ORG
5	B-LOC
6	I-LOC
7	B-MISC
8	I-MISC

Training Procedure

Hyperparameter Search

An Optuna study (20 trials) maximized validation F1 over the following search space:

Learning rate: [1e-5, 5e-4] (log scale)
Batch size per device: [8, 16, 32]
Number of epochs: [2, 6]
Weight decay: [0.0, 0.1]
Warmup ratio: [0.0, 0.2]
Gradient accumulation steps: [1, 4]

Other fixed training arguments:

Evaluation batch size: 8
Max sequence length: 256
Evaluation strategy: epoch
Save strategy: epoch
Best model selection based on validation F1
Seed: 42

Training Data

Training set: CoNLL-2003 train split
Validation set: CoNLL-2003 validation split (used for early stopping / best model selection)
Test set: CoNLL-2003 test split (final evaluation)

Tokenizer Alignment

During tokenization, the original tokens are split into subwords. Subword tokens that are continuations of the same word are assigned the inside label of the corresponding entity class, if applicable. For example, if “Microsoft” is tokenized into ["Micro", "##soft"] and the original tag is B-ORG, the first subword gets B-ORG and the second gets I-ORG. This is implemented in the align_labels function.

Evaluation Results

After hyperparameter search, the best trial achieved the following results on the test set:

Precision: 0.87
Recall: 0.91
F1: 0.89
Accuracy: 0.97

How to Use

Quick Pipeline

from transformers import pipeline

ner = pipeline("token-classification", model="violetar/ner-model", aggregation_strategy="simple")
sentence = "John Smith works at Microsoft in New York."
results = ner(sentence)

for entity in results:
    print(f"{entity['word']} -> {entity['entity_group']} (score: {entity['score']:.2f})")

Downloads last month: 79

Safetensors

Model size

0.4B params

Tensor type

F32

violetar
/

Ner-model