| | --- |
| | library_name: transformers |
| | tags: |
| | - cybersecurity |
| | - mpnet |
| | - embeddings |
| | - classification |
| | language: |
| | - en |
| | base_model: |
| | - microsoft/mpnet-base |
| | --- |
| | |
| | # MPNet (Cyber) - MPNet Fine-Tuned for Cybersecurity Group Classification |
| |
|
| | This MPNet model was fine-tuned specifically for classifying cybersecurity threat groups based on textual descriptions from cybersecurity reports. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | This model is based on `microsoft/mpnet-base` and fine-tuned using Masked Language Modeling (MLM) and supervised classification on cybersecurity threat intelligence descriptions, primarily focused on known threat actor groups. |
| |
|
| | ### Model Information |
| | - **Base Model:** microsoft/mpnet-base |
| | - **Tasks:** Text classification, embedding generation |
| | - **Language:** English |
| |
|
| | ## Intended Use |
| |
|
| | ### Primary Use |
| |
|
| | This model generates specialized embeddings that are useful for: |
| | - Identifying cybersecurity threat actor groups from textual descriptions |
| | - Cybersecurity threat intelligence analysis |
| | - Embedding-based retrieval tasks in cybersecurity contexts |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | This model is not intended for general language tasks outside cybersecurity contexts. |
| |
|
| | ## Performance Evaluation |
| |
|
| | The model was benchmarked against state-of-the-art cybersecurity NLP models: |
| |
|
| | | Model | Classification Accuracy | Embedding Variability | |
| | |------------------|-------------------------|-----------------------| |
| | | Original MPNet | 55.73% | 0.0798 | |
| | | SecBERT | 91.67% | 0.5911 | |
| | | ATTACK-BERT | 83.51% | 0.0960 | |
| | | MPNet (Cyber) | 72.74% | 0.1239 | |
| | | SecureBERT | 49.31% | 0.0071 | |
| |
|
| | ### Downstream Tasks |
| | - Attribution of cybersecurity incidents |
| | - Automated analysis of threat intelligence reports |
| | - Embeddings for cybersecurity threat detection |
| |
|
| | ### Limitations |
| | - Best suited for English language cybersecurity contexts |
| | - May require further fine-tuning for highly specific tasks |
| |
|
| | ## Usage |
| |
|
| | To use this model: |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, MPNetModel |
| | import torch |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("selfconstruct3d |
| | / |
| | mpnet-classification-finetuned-cyber-groups ") |
| | model = MPNetModel.from_pretrained("selfconstruct3d |
| | / |
| | mpnet-classification-finetuned-cyber-groups ") |
| | |
| | inputs = tokenizer("APT38 uses ransomware for financial gains.", return_tensors="pt") |
| | outputs = model(**inputs) |
| | embeddings = outputs.last_hidden_state.mean(dim=1) |
| | ``` |
| |
|
| | or |
| |
|
| | ```python |
| | from sentence_transformers import SentenceTransformer |
| | sentences = ["This is an example sentence", "Each sentence is converted"] |
| | |
| | model = SentenceTransformer('selfconstruct3d/mpnet-classification-finetuned-cyber-groups') |
| | embeddings = model.encode(sentences) |
| | print(embeddings) |
| | ``` |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | Fine-tuned on descriptions of threat actor activities sourced from cybersecurity reports, including MITRE ATT&CK techniques. |
| |
|
| | ### Hyperparameters |
| | - **Epochs:** 10 (MLM), 20 (classification) |
| | - **Batch size:** 16 |
| | - **Learning rate:** 5e-6 (MLM), 2e-6 (classification) |
| | - **Hardware:** GPU (CUDA-enabled) |
| |
|
| | ## Citation |
| |
|
| | If using this model, please cite as: |
| |
|
| | ```bibtex |
| | @misc{mpnet_cyber_finetune, |
| | author = {Hamzic, D.}, |
| | title = {MPNet Fine-Tuned for Cybersecurity Group Classification}, |
| | year = {2025}, |
| | publisher = {Hugging Face}, |
| | url = {https://huggingface.co/selfconstruct3d/mpnet-classification-finetuned-cyber-groups} |
| | } |
| | ``` |
| |
|
| | ## Contact |
| | - **Author:** Dženan Hamzić |
| | - **Contact Information:** https://www.linkedin.com/in/dzenan-hamzic/ |
| |
|
| | ## Licence |
| | This model is licensed for non-commercial use only (CC BY-NC 4.0). |
| | For commercial inquiries, please contact dzenan.hamzic@ait.ac.at. |