Cybersecurity NER Model v8

Named Entity Recognition model for cybersecurity domain text, trained on spaCy v3.8 with custom training data.

Model Description

Fine-tuned NER model for extracting 13 cybersecurity entity types from technical documentation, CVs, job descriptions, threat reports, and compliance documents.

Performance

Test Results (v8):

Pass Rate: 94% (62/66 tests)
Dev F1 Score: 98.58%
Precision: 98.71%
Recall: 98.46%
Training Steps: 11,500 (early stopping)
Training Data: 2,223 examples

Entity Type Performance:

Entity Type	Test Pass Rate	Dev Set F1
CVE	100% (3/3)	100.00%
AUDIT_TERM	75% (3/4)	100.00%
SECURITY_TOOL	100% (4/4)	100.00%
CERTIFICATION	100% (4/4)	98.73%
SECURITY_ROLE	100% (4/4)	98.11%
FRAMEWORK	100% (4/4)	93.88%
TECHNICAL_SKILL	100% (4/4)	100.00%
ACRONYM	100% (4/4)	100.00%
SECURITY_DOMAIN	100% (4/4)	100.00%
ATTACK_TECHNIQUE	75% (3/4)	98.70%
THREAT_TYPE	75% (3/4)	95.24%
REGULATION	75% (3/4)	96.55%
CONTROL_ID	100% (4/4)	-

Entity Types

CVE - CVE identifiers (e.g., CVE-2024-1234)
CERTIFICATION - Security certifications (CISSP, OSCP, CEH, CISM, Security+)
FRAMEWORK - Security frameworks (NIST CSF, ISO 27001, MITRE ATT&CK, CIS Controls)
ATTACK_TECHNIQUE - Attack methods (SQL injection, XSS, CSRF, buffer overflow)
TECHNICAL_SKILL - Technical skills (Incident Response, Forensics, Penetration Testing)
AUDIT_TERM - Audit/compliance terms (Risk assessment, Compliance audit, Security review)
SECURITY_ROLE - Job roles (CISO, SOC Analyst, Security Engineer, Pentester)
THREAT_TYPE - Threat types (APT, ransomware, phishing, DDoS, malware)
ACRONYM - Security acronyms (SIEM, EDR, SOAR, IDS/IPS, WAF, DLP)
SECURITY_DOMAIN - Security domains (Cloud Security, Network Security, Application Security)
REGULATION - Regulations (GDPR, HIPAA, PCI-DSS, SOX, CCPA)
SECURITY_TOOL - Security tools (Splunk, Metasploit, Burp Suite, Nmap, Wireshark)
CONTROL_ID - Control identifiers (ISO 27001 A.5.1, NIST CSF PR.AC-1, CIS Control 1.1)

Usage

import spacy

# Load model
nlp = spacy.load("path/to/model")

# Extract entities
text = "CISSP certified professional with experience in Splunk and Metasploit"
doc = nlp(text)

for ent in doc.ents:
    print(f"{ent.text} -> {ent.label_}")

Output:

CISSP -> CERTIFICATION
Splunk -> SECURITY_TOOL
Metasploit -> SECURITY_TOOL

Training Data

Sources:

v7 merged data: 1,448 examples
v8 generated: 1,347 examples with multi-entity patterns, case variants
Manual curated: 100 examples
Final dataset: 2,223 unique examples (after validation and deduplication)

v8 Improvements:

Multi-entity "X and Y" patterns (50 examples per entity type)
Title case variants (CISSP, cissp, Cissp)
Comma-separated list patterns
AUDIT_TERM edge cases (Compliance audit)

Entity Distribution:

AUDIT_TERM: 326 (12.4%)
CERTIFICATION: 295 (11.2%)
SECURITY_TOOL: 293 (11.1%)
ATTACK_TECHNIQUE: 282 (10.7%)
THREAT_TYPE: 263 (10.0%)
TECHNICAL_SKILL: 228 (8.6%)
REGULATION: 222 (8.4%)
CVE: 182 (6.9%)
FRAMEWORK: 165 (6.3%)
SECURITY_ROLE: 153 (5.8%)
ACRONYM: 142 (5.4%)
SECURITY_DOMAIN: 85 (3.2%)

Training Configuration

Framework: spaCy 3.8
Architecture: tok2vec + TransitionBasedParser
GPU: NVIDIA RTX 4090
Training steps: 11,500 (early stopping)
Patience: 5,000 steps
Learning rate: 3e-05
Dropout: 0.25
Batch size: 1,000
Train/dev split: 85/15

Version History

v8 (Current):

94% pass rate (62/66)
Multi-entity extraction improved
Title case support added
AUDIT_TERM edge cases fixed

v7:

86% pass rate (57/66)
CVE detection restored
SECURITY_ROLE improved to 100%
IDS/IPS and DDoS fixed

v6:

74% pass rate (49/66)
CVE regression (missing)
AUDIT_TERM and SECURITY_ROLE issues

Known Limitations

v8 has 4 remaining test failures:

Multi-entity extraction in specific contexts ("APT group using ransomware")
Span boundary issues with conjunctions ("XSS and CSRF mitigated")
Specific "X and Y" patterns ("HIPAA and PCI-DSS standards")
"Gap analysis" edge case

Use Cases

CV/resume skill extraction
Job description analysis
Threat intelligence reports
Compliance documentation
Security audit reports
Technical documentation
Security training materials

License

MIT

Citation

@misc{cybersecurity-ner,
  title={Cybersecurity NER Model},
  author={PKI},
  year={2026},
  url={https://huggingface.co/pki/cybersecurity-ner}
}

Contact

For issues or questions, please open an issue on GitHub.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support