---
language: en
license: cc-by-nc-4.0
tags:
- cybersecurity
- malware-analysis
- att&ck
- threat-intelligence
- mixtral
- lora
- peft
- expert-adapters
- cape-sandbox
- digital-forensics
library_name: peft
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
inference: false
metrics:
- accuracy
---

# **Fathom** — Specialized Cybersecurity Analysis Model

**Mixtral-8x7B-Instruct-v0.1 + 10× LoRA adapters (rank=32, bf16)**  
**Primary adapter:** `unified-v2` (general cybersecurity + malware analysis)  
**9 expert adapters** for domain-specific routing (static/dynamic analysis, network, forensics, threat intel, etc.)


**Fathom** turns raw sandbox reports (CAPE, Joe Sandbox, etc.) into high-quality ATT&CK-mapped malware analysis. It outperforms general-purpose models on cybersecurity tasks while remaining fully open-source and runnable on a single AMD MI300X / A100 80GB.

---

## Model Overview

- **Base:** Mixtral-8x7B-Instruct-v0.1 (full bf16, no quantization)
- **Training:** Direct PEFT+TRL 
- **Adapters:** 1 unified + 9 expert LoRA adapters (all rank=32, α=16)
- **Hardware:** AMD MI300X (205.8 GB VRAM) — full bf16 training
- **Key Innovation:** Evidence extraction layer + structured behavioral prompts → **9× improvement** in real ATT&CK mapping 

**Designed for:**
- Malware analysts & threat hunters
- SOC / DFIR teams
- CAPE / sandbox report enrichment
- Automated ATT&CK technique extraction

---

## Benchmark Results

All results use the **real Fathom pipeline** (`[INST]` chat template + 8192 context + structured evidence from CAPE extraction layer v3). Greedy decoding, bf16.

### 1. General Cybersecurity Knowledge (vs. Closed & Open Models)

| Benchmark                  | Fathom unified-v2 | GPT-4 (ref) | GPT-3.5 (ref) | Base Mixtral-8x7B | Llama-2-70B (ref) |
|----------------------------|-------------------|-------------|---------------|-------------------|-------------------|
| **CyberMetric-80**        | **91.25%**        | ~87%        | ~67%          | 82.5%             | ~57%              |
| MMLU Computer Security    | **79.0%**         | ~82%        | ~65%          | —                 | ~54%              |
| MMLU Security Studies     | **64.0%**         | ~74%        | ~60%          | —                 | ~48%              |
| TruthfulQA MC1            | **65.0%**         |             |               |                   |                   |

**Visual bar comparison (CyberMetric-80):**

```
Fathom unified-v2     ████████████████████ 91.25%
GPT-4                 ██████████████████   ~87%
Base Mixtral          █████████████████    82.5%
GPT-3.5               ██████████████       ~67%
Llama-2-70B           ████████████         ~57%
```

### 2. Expert Adapter Comparison (CyberMetric-80)

| Adapter                  | Score   | Specialty                          |
|--------------------------|---------|------------------------------------|
| `unified-v2`             | **91.25%** | All-domain baseline               |
| `expert-e8-analyst`      | **91.25%** | Analyst Q&A & reporting           |
| `expert-e3-network`      | 90.00%  | Network traffic / C2 analysis     |
| `expert-e4-forensics`    | 90.00%  | Memory & disk forensics           |
| `expert-e6-detection`    | 88.75%  | Detection engineering             |
| `expert-e7-reports`      | 88.75%  | Structured report generation      |
| `expert-e2-dynamic`      | 85.00%  | Behavioral / sandbox analysis     |
| `expert-e1-static`       | 83.75%  | Static PE + evasion detection     |
| `expert-e9-cot`          | 87.50%  | Chain-of-thought reasoning        |
| `expert-e5-threatintel`  | 81.25%  | Threat intel & actor profiling    |

### 3. Core Contribution: Real ATT&CK Mapping Accuracy

**Progression table** (same model weights, only input pipeline improved):

| Configuration                          | Exact F1 | Parent F1 | Improvement |
|----------------------------------------|----------|-----------|-------------|
| Raw API list (naive)                   | 0.083    | 0.095     | —           |
| Structured prompt (manual)             | 0.370    | 0.429     | +0.334      |
| Real Fathom evidence layer             | 0.534    | 0.508     | +0.413      |
| **Real pipeline + full context fix**   | **0.868**| **0.841** | **+0.746**  |

**This proves the architecture (evidence extraction + structured prompts) matters more than additional fine-tuning.**

### 4. Real Malware Analysis — CAPE Pipeline ( malscore 10/10 samples)

| Sample | Family   | GT T-codes                  | Predicted T-codes                          | Exact F1 | Parent F1 | Family ID |
|--------|----------|-----------------------------|--------------------------------------------|----------|-----------|-----------|
| 12     | Emotet   | T1012, T1071, T1071.004, T1083 | T1012, T1055, T1071, T1071.004, T1083    | 0.889    | 0.857     | 100% conf |
| 15     | Formbook | T1012, T1055, T1071, T1071.004, T1083 | T1003, T1012, T1027.002, T1055, T1059, T1071, T1071.004, T1083, T1497 | 0.714    | 0.667     | 85% conf  |
| 16     | Dridex   | T1012, T1055, T1071, T1071.004, T1083 | T1012, T1055, T1071, T1071.004, T1083    | **1.000**| **1.000** | 68% conf  |
| **Average** |       |                             |                                            | **0.868**| **0.841** | —         |


### 5. Additional Benchmarks

- **ATT&CK Mapping MCQ (30 handcrafted questions):** 80%
- **MMLU Machine Learning:** 60%
- **MMLU Electrical Engineering:** 64%
- **Rigorous ground-truth F1 (23 test cases):** Exact = 0.184, Parent = 0.344 (synthetic); real CAPE = 0.841 after pipeline fixes

### 5. Key Discovery: Mal-API-2019 Analysis

We evaluated Fathom on the public **Mal-API-2019** dataset (Catak & Yazı, arXiv:1905.01999) — 7,107 API call sequences from Cuckoo Sandbox.

| Variant                  | Accuracy | Macro F1 |
|--------------------------|----------|----------|
| Raw API sequences        | 12.6%    | 0.030    |
| Filtered behavioral groups | 10.9%  | 0.052    |

### Insight: 

Raw API sequences alone are insufficient for reliable family classification. The dataset contains heavy loader noise and families share nearly identical behavioral APIs. Ground-truth labels come from static AV signatures, not behavioral semantics.
> “ In contrast, Fathom’s full evidence extraction pipeline achieves 0.841 Parent F1 on real CAPEv2 reports. This demonstrates that structured behavioral evidence + multi-source context (not raw API text) is the critical enabler for production-grade malware analysis.” 

---

## How to Use

### Loading the unified model (recommended for most users)

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1"
adapter = "umer07/fathom-mixtral"   # unified-v2 at root

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter, adapter_name="unified-v2")
model.eval()
```


---

## Limitations 

- Sub-technique precision lower than parent techniques (standard across all LLMs)
- Family identification improves significantly with KSPN enrichment
- Rare/exotic TTPs (UAC bypass, ICMP C2) have low recall
- Prompt injection / attribution hallucination remains a base-model weakness (mitigable with system prompt hardening)


---

## Training & Datasets

- **Unified-v2:** 123,912 rows (1 epoch)
- **Experts:** 9 specialized datasets (total > 200k rows after augmentation)
- **Evasive dataset (NEW):** 25,160 obfuscated C++ samples (92 evasion combinations)
- **ThreatIntel upgrade:** 9,532 rows (URLhaus + GTFOBins + MITRE CTI)

---

## Citation

```bibtex
@misc{fathom2026,
  title={Fathom: Expert Cybersecurity Analysis with Mixtral LoRA Adapters},
  author={Umer},
  year={2026},
  howpublished={\url{https://huggingface.co/umer07/fathom-mixtral}},
}
```