| # Vulnerability Classification Framework | |
| This document provides a standardized system for classifying vulnerabilities identified during LLM security testing. This classification framework enables consistent categorization, facilitates trend analysis, and supports effective remediation prioritization. | |
| ## Classification Dimensions | |
| Vulnerabilities are classified across multiple dimensions to capture their full nature and impact. | |
| ### 1. Vulnerability Class | |
| The primary categorization based on the fundamental mechanism of the vulnerability. | |
| #### Primary Classes | |
| - **PJV**: Prompt Injection Vulnerabilities | |
| - **BEF**: Boundary Enforcement Failures | |
| - **IEV**: Information Extraction Vulnerabilities | |
| - **CET**: Classifier Evasion Techniques | |
| - **MVV**: Multimodal Vulnerability Vectors | |
| - **TUV**: Tool Use Vulnerabilities | |
| - **ACF**: Authentication Control Failures | |
| - **RSV**: Response Synthesis Vulnerabilities | |
| ### 2. Subclass | |
| Specific subcategory within the primary vulnerability class. | |
| #### Example Subclasses (for PJV - Prompt Injection Vulnerabilities) | |
| - **PJV-DIR**: Direct Instruction Injection | |
| - **PJV-IND**: Indirect Instruction Manipulation | |
| - **PJV-CRX**: Cross-Context Injection | |
| #### Example Subclasses (for BEF - Boundary Enforcement Failures) | |
| - **BEF-CPC**: Content Policy Circumvention | |
| - **BEF-CRB**: Capability Restriction Bypass | |
| - **BEF-ABV**: Authorization Boundary Violations | |
| #### Example Subclasses (for IEV - Information Extraction Vulnerabilities) | |
| - **IEV-TDE**: Training Data Extraction | |
| - **IEV-SIL**: System Instruction Leakage | |
| - **IEV-PAI**: Parameter Inference | |
| #### Example Subclasses (for CET - Classifier Evasion Techniques) | |
| - **CET-LOB**: Linguistic Obfuscation | |
| - **CET-CTM**: Context Manipulation | |
| - **CET-TBM**: Technical Bypass Methods | |
| #### Example Subclasses (for MVV - Multimodal Vulnerability Vectors) | |
| - **MVV-CMI**: Cross-Modal Injection | |
| - **MVV-MIC**: Modal Interpretation Conflicts | |
| - **MVV-MTV**: Modal Translation Vulnerabilities | |
| #### Example Subclasses (for TUV - Tool Use Vulnerabilities) | |
| - **TUV-TSM**: Tool Selection Manipulation | |
| - **TUV-PAI**: Parameter Injection | |
| - **TUV-FCH**: Function Call Hijacking | |
| #### Example Subclasses (for ACF - Authentication Control Failures) | |
| - **ACF-ICE**: Identity Confusion Exploitation | |
| - **ACF-PIE**: Permission Inheritance Exploitation | |
| - **ACF-SBV**: Session Boundary Violations | |
| #### Example Subclasses (for RSV - Response Synthesis Vulnerabilities) | |
| - **RSV-MET**: Metadata Manipulation | |
| - **RSV-CMH**: Content Moderation Hallucination | |
| - **RSV-USP**: Unsafe Synthesis Patterns | |
| ### 3. Attack Vector | |
| The primary method or channel through which the vulnerability is exploited. | |
| #### Categories | |
| - **TXT**: Text-Based | |
| - **IMG**: Image-Based | |
| - **AUD**: Audio-Based | |
| - **COD**: Code-Based | |
| - **DOC**: Document-Based | |
| - **MUL**: Multi-Vector | |
| - **API**: API-Based | |
| - **TOL**: Tool-Based | |
| ### 4. Impact Type | |
| The primary negative impact resulting from successful exploitation. | |
| #### Categories | |
| - **DIS**: Disclosure of Sensitive Information | |
| - **POL**: Policy Violation | |
| - **BYP**: Security Bypass | |
| - **MAN**: System Manipulation | |
| - **ACC**: Unauthorized Access | |
| - **DEG**: Service Degradation | |
| - **HAL**: Harmful Output Generation | |
| - **PRV**: Privacy Violation | |
| ### 5. Exploitation Complexity | |
| The level of technical expertise required to successfully exploit the vulnerability. | |
| #### Categories | |
| - **ECL**: Low (simple, requires minimal expertise) | |
| - **ECM**: Medium (moderate complexity, requires some domain knowledge) | |
| - **ECH**: High (complex, requires specialized knowledge) | |
| - **ECX**: Very High (sophisticated, requires expert-level understanding) | |
| ### 6. Remediation Complexity | |
| The estimated complexity of implementing an effective remediation. | |
| #### Categories | |
| - **RCL**: Low (simple fix, localized change) | |
| - **RCM**: Medium (moderate complexity, potential side effects) | |
| - **RCH**: High (complex, requires significant architectural changes) | |
| - **RCX**: Very High (extremely difficult, may require fundamental redesign) | |
| ### 7. Discovery Method | |
| How the vulnerability was discovered. | |
| #### Categories | |
| - **AUT**: Automated Testing | |
| - **MAN**: Manual Testing | |
| - **HYB**: Hybrid Approach | |
| - **USR**: User Report | |
| - **RES**: Research Finding | |
| - **ANA**: Log Analysis | |
| - **INC**: Incident Response | |
| ### 8. Status | |
| The current state of the vulnerability. | |
| #### Categories | |
| - **NEW**: Newly Identified | |
| - **CNF**: Confirmed | |
| - **REJ**: Rejected (not a valid vulnerability) | |
| - **MIT**: Mitigated (temporary solution) | |
| - **FIX**: Fixed (permanent solution) | |
| - **DUP**: Duplicate of existing vulnerability | |
| - **DEF**: Deferred (not prioritized for immediate fix) | |
| ## Composite Classification | |
| Vulnerabilities are assigned a composite classification code combining the above dimensions: | |
| ``` | |
| [Vulnerability Class]-[Subclass]:[Attack Vector]/[Impact Type]-[Exploitation Complexity][Remediation Complexity]-[Discovery Method].[Status] | |
| ``` | |
| ### Example Classifications | |
| - `PJV-DIR:TXT/POL-ECL-RCM-MAN.CNF`: A confirmed direct prompt injection vulnerability, text-based, leading to policy violations, low exploitation complexity, medium remediation complexity, discovered through manual testing. | |
| - `IEV-SIL:COD/DIS-ECM-RCH-AUT.NEW`: A newly identified system instruction leakage vulnerability, code-based, leading to disclosure of sensitive information, medium exploitation complexity, high remediation complexity, discovered through automated testing. | |
| - `MVV-CMI:IMG/BYP-ECH-RCM-HYB.MIT`: A mitigated cross-modal injection vulnerability, image-based, leading to security bypass, high exploitation complexity, medium remediation complexity, discovered through a hybrid testing approach. | |
| ## Classification Workflow | |
| ### 1. Initial Classification | |
| When a potential vulnerability is first identified: | |
| 1. Assign primary vulnerability class and subclass | |
| 2. Document attack vector and impact type | |
| 3. Note discovery method | |
| 4. Set status to `NEW` | |
| 5. Estimation of exploitation complexity may be preliminary | |
| ### 2. Verification | |
| During the verification phase: | |
| 1. Confirm vulnerability through reproduction | |
| 2. Refine classification based on deeper understanding | |
| 3. Update exploitation complexity based on reproduction experience | |
| 4. Change status to `CNF` or `REJ` | |
| ### 3. Analysis | |
| During detailed analysis: | |
| 1. Assess remediation complexity | |
| 2. Document dependencies and affected components | |
| 3. Update classification with complete understanding | |
| 4. Link to related vulnerabilities if applicable | |
| ### 4. Remediation Tracking | |
| During the remediation process: | |
| 1. Update status as appropriate | |
| 2. Document mitigation or fix approaches | |
| 3. Link to verification testing results | |
| ## Taxonomic Evolution | |
| This classification system is designed to evolve over time as new vulnerability classes emerge. The process for extending the taxonomy includes: | |
| 1. **Identification**: Recognition of a new vulnerability pattern that doesn't fit existing classes | |
| 2. **Definition**: Clear description of the new vulnerability class or subclass | |
| 3. **Consultation**: Review with security experts to validate the new category | |
| 4. **Integration**: Addition to the formal taxonomy with appropriate documentation | |
| 5. **Retroactive Analysis**: Review of existing vulnerabilities to identify any that should be reclassified | |
| ## Usage Guidelines | |
| ### For Testers | |
| - Assign preliminary classifications during testing | |
| - Document all observed behaviors clearly to enable accurate classification | |
| - Highlight unusual patterns that may indicate new vulnerability classes | |
| ### For Security Analysts | |
| - Verify and refine classifications | |
| - Ensure consistency across similar vulnerabilities | |
| - Identify patterns and trends within vulnerability classes | |
| ### For Developers | |
| - Use classification to understand vulnerability mechanisms | |
| - Reference similar vulnerabilities by class to inform remediation approaches | |
| - Track remediation effectiveness by vulnerability class | |
| ## Reporting Standards | |
| All vulnerability reports should include: | |
| 1. Full classification code | |
| 2. Detailed description of the vulnerability | |
| 3. Reproduction steps | |
| 4. Example exploitation (and its success rate) | |
| 5. Potential impact analysis | |
| 6. Suggested remediation approaches | |
| ## Conclusion | |
| This classification framework provides a standardized approach to categorizing LLM security vulnerabilities. By applying this framework consistently, the security community can develop a shared understanding of vulnerability patterns, track trends over time, and develop more effective remediation strategies. | |
| For examples of classified vulnerabilities, refer to the [vulnerability catalog](../research/vulnerabilities/catalog.md). | |