| | --- |
| | language: |
| | - en |
| | - de |
| | --- |
| | |
| | # 🛡️ MLP Cybersecurity Classifier |
| |
|
| | This repository hosts a lightweight `scikit-learn`-based MLP classifier trained to distinguish cybersecurity-related content from other text, using sentence-transformer embeddings. It supports English and German input texts. |
| |
|
| | ## 📊 Training Data |
| |
|
| | The model was trained on a multilingual dataset of cybersecurity and non-cybersecurity news articles. The dataset is publicly available on Zenodo: |
| | 🔗 [https://zenodo.org/records/16417939](https://zenodo.org/records/16417939) |
| |
|
| | ## 📦 Model Details |
| |
|
| | - **Architecture**: `MLPClassifier` with hidden layers `(128, 64)` |
| | - **Embedding model**: [`intfloat/multilingual-e5-large`](https://huggingface.co/intfloat/multilingual-e5-large) |
| | - **Input**: Cleaned article (removed stopwords) or report text |
| | - **Output**: Binary label (e.g., `Cybersecurity`, `Not Cybersecurity`) |
| | - **Languages**: English, German |
| |
|
| | ## 🔧 Usage |
| |
|
| | ```python |
| | from sentence_transformers import SentenceTransformer |
| | from huggingface_hub import hf_hub_download |
| | import joblib |
| | |
| | # 1. Load the embedding model |
| | embedder = SentenceTransformer("intfloat/multilingual-e5-large") |
| | |
| | # 2. Load the pretrained MLP classifier from Hugging Face Hub |
| | model_path = hf_hub_download(repo_id="selfconstruct3d/cybersec_classifier", filename="cybersec_classifier.pkl") |
| | model = joblib.load(model_path) |
| | |
| | # 3. Example input texts (can be in English or German) |
| | texts = [ |
| | "A new ransomware attack has affected critical infrastructure in Germany.", |
| | "The local sports club hosted its annual summer festival this weekend." |
| | ] |
| | |
| | # 4. Generate embeddings |
| | embeddings = embedder.encode(texts, convert_to_numpy=True, show_progress_bar=False) |
| | |
| | # 5. Predict cybersecurity relevance |
| | predictions = model.predict(embeddings) |
| | |
| | # 6. Output results |
| | for text, label in zip(texts, predictions): |
| | print(f"Text: {text}\nPrediction: {label}\n") |
| | |
| | ``` |
| |
|
| |
|
| | ## Licence |
| | This model is licensed for non-commercial use only (CC BY-NC 4.0). |
| | For commercial inquiries, please contact dzenan.hamzic@ait.ac.at. |