selfconstruct3d
/

mpnet-classification-finetuned-cyber-groups

Feature Extraction

Model card Files Files and versions

mpnet-classification-finetuned-cyber-groups / README.md

selfconstruct3d's picture

selfconstruct3d

Update README.md

7e72fc9 verified 4 months ago

|

history blame contribute delete

3.77 kB

	---
	library_name: transformers
	tags:
	- cybersecurity
	- mpnet
	- embeddings
	- classification
	language:
	- en
	base_model:
	- microsoft/mpnet-base
	---

	# MPNet (Cyber) - MPNet Fine-Tuned for Cybersecurity Group Classification

	This MPNet model was fine-tuned specifically for classifying cybersecurity threat groups based on textual descriptions from cybersecurity reports.

	## Model Details

	### Model Description

	This model is based on `microsoft/mpnet-base` and fine-tuned using Masked Language Modeling (MLM) and supervised classification on cybersecurity threat intelligence descriptions, primarily focused on known threat actor groups.

	### Model Information
	- Base Model: microsoft/mpnet-base
	- Tasks: Text classification, embedding generation
	- Language: English

	## Intended Use

	### Primary Use

	This model generates specialized embeddings that are useful for:
	- Identifying cybersecurity threat actor groups from textual descriptions
	- Cybersecurity threat intelligence analysis
	- Embedding-based retrieval tasks in cybersecurity contexts

	### Out-of-Scope Use

	This model is not intended for general language tasks outside cybersecurity contexts.

	## Performance Evaluation

	The model was benchmarked against state-of-the-art cybersecurity NLP models:

	\| Model \| Classification Accuracy \| Embedding Variability \|
	\|------------------\|-------------------------\|-----------------------\|
	\| Original MPNet \| 55.73% \| 0.0798 \|
	\| SecBERT \| 91.67% \| 0.5911 \|
	\| ATTACK-BERT \| 83.51% \| 0.0960 \|
	\| MPNet (Cyber) \| 72.74% \| 0.1239 \|
	\| SecureBERT \| 49.31% \| 0.0071 \|

	### Downstream Tasks
	- Attribution of cybersecurity incidents
	- Automated analysis of threat intelligence reports
	- Embeddings for cybersecurity threat detection

	### Limitations
	- Best suited for English language cybersecurity contexts
	- May require further fine-tuning for highly specific tasks

	## Usage

	To use this model:

	```python
	from transformers import AutoTokenizer, MPNetModel
	import torch

	tokenizer = AutoTokenizer.from_pretrained("selfconstruct3d
	/
	mpnet-classification-finetuned-cyber-groups ")
	model = MPNetModel.from_pretrained("selfconstruct3d
	/
	mpnet-classification-finetuned-cyber-groups ")

	inputs = tokenizer("APT38 uses ransomware for financial gains.", return_tensors="pt")
	outputs = model(**inputs)
	embeddings = outputs.last_hidden_state.mean(dim=1)
	```

	or

	```python
	from sentence_transformers import SentenceTransformer
	sentences = ["This is an example sentence", "Each sentence is converted"]

	model = SentenceTransformer('selfconstruct3d/mpnet-classification-finetuned-cyber-groups')
	embeddings = model.encode(sentences)
	print(embeddings)
	```

	## Training Details

	### Training Data

	Fine-tuned on descriptions of threat actor activities sourced from cybersecurity reports, including MITRE ATT&CK techniques.

	### Hyperparameters
	- Epochs: 10 (MLM), 20 (classification)
	- Batch size: 16
	- Learning rate: 5e-6 (MLM), 2e-6 (classification)
	- Hardware: GPU (CUDA-enabled)

	## Citation

	If using this model, please cite as:

	```bibtex
	@misc{mpnet_cyber_finetune,
	author = {Hamzic, D.},
	title = {MPNet Fine-Tuned for Cybersecurity Group Classification},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/selfconstruct3d/mpnet-classification-finetuned-cyber-groups}
	}
	```

	## Contact
	- Author: Dženan Hamzić
	- Contact Information: https://www.linkedin.com/in/dzenan-hamzic/

	## Licence
	This model is licensed for non-commercial use only (CC BY-NC 4.0).
	For commercial inquiries, please contact dzenan.hamzic@ait.ac.at.