YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Cybersecurity NER Model v8
Named Entity Recognition model for cybersecurity domain text, trained on spaCy v3.8 with custom training data.
Model Description
Fine-tuned NER model for extracting 13 cybersecurity entity types from technical documentation, CVs, job descriptions, threat reports, and compliance documents.
Performance
Test Results (v8):
- Pass Rate: 94% (62/66 tests)
- Dev F1 Score: 98.58%
- Precision: 98.71%
- Recall: 98.46%
- Training Steps: 11,500 (early stopping)
- Training Data: 2,223 examples
Entity Type Performance:
| Entity Type | Test Pass Rate | Dev Set F1 |
|---|---|---|
| CVE | 100% (3/3) | 100.00% |
| AUDIT_TERM | 75% (3/4) | 100.00% |
| SECURITY_TOOL | 100% (4/4) | 100.00% |
| CERTIFICATION | 100% (4/4) | 98.73% |
| SECURITY_ROLE | 100% (4/4) | 98.11% |
| FRAMEWORK | 100% (4/4) | 93.88% |
| TECHNICAL_SKILL | 100% (4/4) | 100.00% |
| ACRONYM | 100% (4/4) | 100.00% |
| SECURITY_DOMAIN | 100% (4/4) | 100.00% |
| ATTACK_TECHNIQUE | 75% (3/4) | 98.70% |
| THREAT_TYPE | 75% (3/4) | 95.24% |
| REGULATION | 75% (3/4) | 96.55% |
| CONTROL_ID | 100% (4/4) | - |
Entity Types
- CVE - CVE identifiers (e.g., CVE-2024-1234)
- CERTIFICATION - Security certifications (CISSP, OSCP, CEH, CISM, Security+)
- FRAMEWORK - Security frameworks (NIST CSF, ISO 27001, MITRE ATT&CK, CIS Controls)
- ATTACK_TECHNIQUE - Attack methods (SQL injection, XSS, CSRF, buffer overflow)
- TECHNICAL_SKILL - Technical skills (Incident Response, Forensics, Penetration Testing)
- AUDIT_TERM - Audit/compliance terms (Risk assessment, Compliance audit, Security review)
- SECURITY_ROLE - Job roles (CISO, SOC Analyst, Security Engineer, Pentester)
- THREAT_TYPE - Threat types (APT, ransomware, phishing, DDoS, malware)
- ACRONYM - Security acronyms (SIEM, EDR, SOAR, IDS/IPS, WAF, DLP)
- SECURITY_DOMAIN - Security domains (Cloud Security, Network Security, Application Security)
- REGULATION - Regulations (GDPR, HIPAA, PCI-DSS, SOX, CCPA)
- SECURITY_TOOL - Security tools (Splunk, Metasploit, Burp Suite, Nmap, Wireshark)
- CONTROL_ID - Control identifiers (ISO 27001 A.5.1, NIST CSF PR.AC-1, CIS Control 1.1)
Usage
import spacy
# Load model
nlp = spacy.load("path/to/model")
# Extract entities
text = "CISSP certified professional with experience in Splunk and Metasploit"
doc = nlp(text)
for ent in doc.ents:
print(f"{ent.text} -> {ent.label_}")
Output:
CISSP -> CERTIFICATION
Splunk -> SECURITY_TOOL
Metasploit -> SECURITY_TOOL
Training Data
Sources:
- v7 merged data: 1,448 examples
- v8 generated: 1,347 examples with multi-entity patterns, case variants
- Manual curated: 100 examples
- Final dataset: 2,223 unique examples (after validation and deduplication)
v8 Improvements:
- Multi-entity "X and Y" patterns (50 examples per entity type)
- Title case variants (CISSP, cissp, Cissp)
- Comma-separated list patterns
- AUDIT_TERM edge cases (Compliance audit)
Entity Distribution:
- AUDIT_TERM: 326 (12.4%)
- CERTIFICATION: 295 (11.2%)
- SECURITY_TOOL: 293 (11.1%)
- ATTACK_TECHNIQUE: 282 (10.7%)
- THREAT_TYPE: 263 (10.0%)
- TECHNICAL_SKILL: 228 (8.6%)
- REGULATION: 222 (8.4%)
- CVE: 182 (6.9%)
- FRAMEWORK: 165 (6.3%)
- SECURITY_ROLE: 153 (5.8%)
- ACRONYM: 142 (5.4%)
- SECURITY_DOMAIN: 85 (3.2%)
Training Configuration
- Framework: spaCy 3.8
- Architecture: tok2vec + TransitionBasedParser
- GPU: NVIDIA RTX 4090
- Training steps: 11,500 (early stopping)
- Patience: 5,000 steps
- Learning rate: 3e-05
- Dropout: 0.25
- Batch size: 1,000
- Train/dev split: 85/15
Version History
v8 (Current):
- 94% pass rate (62/66)
- Multi-entity extraction improved
- Title case support added
- AUDIT_TERM edge cases fixed
v7:
- 86% pass rate (57/66)
- CVE detection restored
- SECURITY_ROLE improved to 100%
- IDS/IPS and DDoS fixed
v6:
- 74% pass rate (49/66)
- CVE regression (missing)
- AUDIT_TERM and SECURITY_ROLE issues
Known Limitations
v8 has 4 remaining test failures:
- Multi-entity extraction in specific contexts ("APT group using ransomware")
- Span boundary issues with conjunctions ("XSS and CSRF mitigated")
- Specific "X and Y" patterns ("HIPAA and PCI-DSS standards")
- "Gap analysis" edge case
Use Cases
- CV/resume skill extraction
- Job description analysis
- Threat intelligence reports
- Compliance documentation
- Security audit reports
- Technical documentation
- Security training materials
License
MIT
Citation
@misc{cybersecurity-ner,
title={Cybersecurity NER Model},
author={PKI},
year={2026},
url={https://huggingface.co/pki/cybersecurity-ner}
}
Contact
For issues or questions, please open an issue on GitHub.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support