| | --- |
| | datasets: |
| | - priamai/AnnoCTR |
| | base_model: |
| | - urchade/gliner_small-v1 |
| | tags: |
| | - Security |
| | - NER |
| | - CTI |
| | language: |
| | - en |
| | --- |
| | # AITSecNER - Entity Recognition for Cybersecurity |
| |
|
| | This repository demonstrates how to use the **AITSecNER** model hosted on Hugging Face, based on the powerful GLiNER library, to extract cybersecurity-related entities from text. |
| |
|
| | ## Installation |
| |
|
| | Install GLiNER via pip: |
| |
|
| | ```bash |
| | pip install gliner |
| | ``` |
| |
|
| | ## Usage |
| |
|
| | ### Import and Load Model |
| |
|
| | Load the pretrained AITSecNER model directly from Hugging Face: |
| |
|
| | ```python |
| | from gliner import GLiNER |
| | |
| | model = GLiNER.from_pretrained("selfconstruct3d/AITSecNER", load_tokenizer=True) |
| | ``` |
| |
|
| | ### Predict Entities |
| |
|
| | Define the input text and entity labels you wish to extract: |
| |
|
| | ```python |
| | # Example input text |
| | text = """ |
| | Upon opening Emotet maldocs, victims are greeted with fake Microsoft 365 prompt that states |
| | “THIS DOCUMENT IS PROTECTED,” and instructs victims on how to enable macros. |
| | """ |
| | |
| | # Entity labels |
| | labels = [ |
| | 'CLICommand/CodeSnippet', 'CON', 'DATE', 'GROUP', 'LOC', |
| | 'MALWARE', 'ORG', 'SECTOR', 'TACTIC', 'TECHNIQUE', 'TOOL' |
| | ] |
| | |
| | # Predict entities |
| | entities = model.predict_entities(text, labels, threshold=0.5) |
| | |
| | # Display results |
| | for entity in entities: |
| | print(f"{entity['text']} => {entity['label']}") |
| | ``` |
| |
|
| | ### Sample Output |
| |
|
| | ```bash |
| | Emotet => MALWARE |
| | Microsoft => ORG |
| | ``` |
| |
|
| | ## Model Details |
| |
|
| | The **AITSecNER** model was fine-tuned using the [urchade/gliner_small](https://huggingface.co/urchade/gliner_small) model from Hugging Face on the [priamai/AnnoCTR dataset](https://huggingface.co/datasets/priamai/AnnoCTR). For more details about the dataset, see the paper ["AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports"](https://arxiv.org/abs/2305.10472). |
| |
|
| | GLiNER is described in detail in the paper ["GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer"](https://arxiv.org/abs/2311.08526). |
| |
|
| | ## About |
| |
|
| | **AITSecNER** leverages GLiNER to quickly and accurately extract cybersecurity-specific entities, making it highly suitable for tasks such as: |
| |
|
| | - Cyber threat intelligence analysis |
| | - Incident response documentation |
| | - Automated cybersecurity reporting |
| |
|
| |
|
| |
|
| | ## Licence |
| | This model is licensed for non-commercial use only (CC BY-NC 4.0). |
| | For commercial inquiries, please contact dzenan.hamzic@ait.ac.at. |