DistilBERT Energy Intelligence Multitask NER โ€” v2

Model ID: Quantbridge/distilbert-energy-intelligence-multitask-v2

A domain-specific fine-tuned DistilBERT model for Named Entity Recognition across energy markets, financial instruments, geopolitics, corporate events, and technology. This is a broad-coverage multitask NER model designed for intelligence extraction from financial news and market commentary.

The model recognises 59 entity types (119 BIO labels including B-/I- prefixes) spanning multiple intelligence domains.


Entity Taxonomy

Financial Instruments & Markets

Label Description
EQUITY Stocks and equity instruments
DERIVATIVE Futures, options, swaps
CURRENCY FX pairs and currencies
FIXED_INCOME Bonds, treasuries, notes
ASSET_CLASS Broad asset class references
INDEX Market indices (S&P 500, FTSE, etc.)
COMMODITY Physical commodities (oil, gas, metals)
TRADING_HUB Price benchmarks and trading hubs

Financial Institutions

Label Description
FINANCIAL_INSTITUTION Banks, brokerages, investment firms
CENTRAL_BANK Central banks (Fed, ECB, BoE)
HEDGE_FUND Hedge funds and asset managers
RATING_AGENCY Credit rating agencies
EXCHANGE Stock and commodity exchanges

Macro & Policy

Label Description
MACRO_INDICATOR GDP, inflation, unemployment figures
MONETARY_POLICY Interest rate decisions, QE programmes
FISCAL_POLICY Government spending, tax policy
TRADE_POLICY Tariffs, trade agreements, WTO actions
ECONOMIC_BLOC G7, G20, EU, ASEAN, etc.

Energy Domain

Label Description
ENERGY_COMPANY Oil majors, utilities, renewable firms
ENERGY_SOURCE Oil, gas, coal, solar, nuclear, etc.
PIPELINE Energy pipelines and transmission lines
REFINERY Oil refineries and processing plants
ENERGY_POLICY OPEC decisions, energy legislation
ENERGY_TRANSITION Decarbonisation, net-zero, EV, hydrogen
GRID Power grids and electricity networks

Geopolitical

Label Description
GEOPOLITICAL_EVENT Summits, elections, geopolitical shifts
SANCTION Economic sanctions and embargoes
TREATY International agreements and accords
CONFLICT_ZONE Active or historic conflict regions
DIPLOMATIC_ACTION Diplomatic moves, expulsions, negotiations
COUNTRY Nation states
REGION Geographic regions (Middle East, EU, etc.)
CITY Cities and urban locations

Corporate Events

Label Description
COMPANY General companies
M_AND_A Mergers and acquisitions
IPO Initial public offerings
EARNINGS_EVENT Quarterly earnings, revenue reports
EXECUTIVE Named C-suite executives
CORPORATE_ACTION Dividends, buybacks, restructuring

Infrastructure & Supply Chain

Label Description
INFRA Physical infrastructure (general)
SUPPLY_CHAIN Supply chain disruptions and logistics
SHIPPING_VESSEL Named ships and tankers
PORT Ports and maritime hubs

Risk & Events

Label Description
EVENT General newsworthy events
RISK_FACTOR Risk factors and vulnerabilities
NATURAL_DISASTER Hurricanes, earthquakes, floods
CYBER_EVENT Cyber attacks and digital incidents
DISRUPTION Supply or market disruptions

Technology

Label Description
TECH_COMPANY Technology companies
AI_MODEL AI systems and models
SEMICONDUCTOR Chips and semiconductor companies
TECH_REGULATION Technology regulation and policy

People & Organizations

Label Description
PERSON Named individuals
THINK_TANK Policy research organizations
NEWS_SOURCE Media and news outlets
REGULATORY_BODY Government regulators (SEC, FCA, etc.)
ORG General organizations

Usage

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="Quantbridge/distilbert-energy-intelligence-multitask-v2",
    aggregation_strategy="simple",
)

text = (
    "The Federal Reserve held interest rates steady as Brent crude fell below $75 "
    "following OPEC+ production cuts and renewed sanctions on Russian energy exports."
)

results = ner(text)
for entity in results:
    print(f"{entity['word']:<35} {entity['entity_group']:<25} {entity['score']:.3f}")

Example output:

Federal Reserve                     CENTRAL_BANK              0.961
Brent                               TRADING_HUB               0.954
OPEC+                               REGULATORY_BODY           0.947
Russian energy exports              SANCTION                  0.932

Load model directly

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "Quantbridge/distilbert-energy-intelligence-multitask-v2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
model.eval()

text = "Goldman Sachs cut its oil price forecast after OPEC+ agreed to extend output cuts."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

predicted_ids = outputs.logits.argmax(dim=-1)[0]
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

for token, label_id in zip(tokens, predicted_ids):
    label = model.config.id2label[label_id.item()]
    if label != "O" and not token.startswith("["):
        print(f"{token.lstrip('##'):<25} {label}")

Model Details

Property Value
Base architecture distilbert-base-uncased
Architecture type DistilBertForTokenClassification
Entity types 59 types (119 BIO labels)
Hidden dimension 768
Attention heads 12
Layers 6
Vocabulary size 30,522
Max sequence length 512 tokens

Intended Use

This model is designed for financial and energy intelligence extraction โ€” automated NER over news feeds, earnings transcripts, regulatory filings, and geopolitical reports. It is a base model suitable for:

  • Structured data extraction from unstructured financial news
  • Entity linking and knowledge graph population
  • Signal detection for trading and risk systems
  • Geopolitical risk monitoring

Out-of-scope use

  • General-purpose NER on non-financial text
  • Languages other than English
  • Documents with heavy technical jargon outside the financial/energy domain

Limitations

  • English-only
  • Optimised for news-style formal writing; may underperform on social media or informal text
  • 59-label taxonomy may produce overlapping predictions for ambiguous entities (e.g. a company that is also an energy company)
  • BIO scheme does not support nested entities

License

Apache 2.0 โ€” see LICENSE.

Downloads last month
-
Safetensors
Model size
66.5M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support