Energy Intelligence NER

Model ID: Quantbridge/energy-intelligence-multitask-ner

A fine-tuned DistilBERT model for Named Entity Recognition in the energy markets and geopolitical domain. The model identifies nine entity types relevant to energy intelligence โ€” companies, commodities, infrastructure, markets, events, and more.


Entity Types

Label Description Examples
COMPANY Energy sector companies ExxonMobil, BP, Saudi Aramco
COMMODITY Energy commodities and resources crude oil, natural gas, LNG, coal
COUNTRY Nation states United States, Russia, Saudi Arabia
LOCATION Geographic locations, regions Persian Gulf, North Sea, Permian Basin
INFRASTRUCTURE Physical energy infrastructure pipelines, refineries, LNG terminals
MARKET Energy markets and trading hubs Henry Hub, Brent, WTI, TTF
EVENT Market events, geopolitical events sanctions, OPEC+ cut, supply disruption
ORGANIZATION Non-company organizations, bodies OPEC, IEA, G7, US Energy Department
PERSON Named individuals ministers, executives, analysts

Usage

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="Quantbridge/energy-intelligence-multitask-ner",
    aggregation_strategy="simple",
)

text = (
    "Saudi Aramco announced a production cut of 1 million barrels per day "
    "amid falling crude oil prices at the Brent benchmark market."
)

results = ner(text)
for entity in results:
    print(f"{entity['word']:<30} {entity['entity_group']:<20} score={entity['score']:.3f}")

Example output:

Saudi Aramco                   COMPANY              score=0.981
crude oil                      COMMODITY            score=0.974
Brent                          MARKET               score=0.968

Load model directly

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "Quantbridge/energy-intelligence-multitask-ner"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

inputs = tokenizer("Brent crude fell below $70 as OPEC+ met in Vienna.", return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
predicted_ids = logits.argmax(dim=-1)[0]
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

for token, label_id in zip(tokens, predicted_ids):
    label = model.config.id2label[label_id.item()]
    if label != "O":
        print(f"{token:<20} {label}")

Model Details

Property Value
Base model distilbert-base-uncased
Architecture DistilBERT + token classification head
Parameters ~67M
Max sequence length 256 tokens
Training precision FP16
Optimizer AdamW
Learning rate 2e-5
Warmup ratio 10%
Weight decay 0.01
Epochs 5

Training Data

The model was trained on a domain-specific dataset of English-language articles covering energy markets, commodities trading, geopolitics, and infrastructure. The dataset contains over 11,000 annotated examples with BIO (Beginning-Inside-Outside) tagging.

Dataset split:

Split Records
Train ~9,200
Validation ~1,150
Test ~1,150

Evaluation

Evaluated on the held-out test set using seqeval (entity-level span matching).

Metric Score
Overall F1 reported after training
Overall Precision reported after training
Overall Recall reported after training

Per-entity F1 scores are available in label_map.json in the model repository.


Limitations

  • Trained exclusively on English text.
  • Best suited for formal news-style writing about energy markets and geopolitics.
  • Performance may degrade on highly technical engineering documents or non-standard text formats.
  • Entity boundaries follow a BIO scheme; overlapping or nested entities are not supported.

Citation

If you use this model in your work, please cite:

@misc{quantbridge-energy-ner-2025,
  title  = {Energy Intelligence NER},
  author = {Quantbridge},
  year   = {2025},
  url    = {https://huggingface.co/Quantbridge/energy-intelligence-multitask-ner}
}

License

Apache 2.0 โ€” see LICENSE.

Downloads last month
13
Safetensors
Model size
66.4M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support