| # Fine-tuned Models |
|
|
| This repository accompanies our paper: **[How Large Language Models Balance Internal Knowledge with User and Document Assertions](https://arxiv.org/abs/2604.22193)** |
|
|
| Our code is available at: [GitHub Repository](https://github.com/shuowl/llm-source-balancing) |
|
|
| This directory contains supervised fine-tuned models for our three-source interaction experiments. The models are trained on data constructed from either CommonsenseQA (CSQA) or GSM8K, using different source-interaction variants described in the paper. |
|
|
| ## Model Naming Convention |
|
|
| Each model folder follows the format: |
|
|
| `{dataset}__{base_model}__{training_variant}_r{rank}_bs{batch_size}_lr{learning_rate}_e{epochs}` |
|
|
| Example: |
|
|
| `csqa__llama3_8b_instruct__all_variants_r8_bs4_lr1e5_e3` |
|
|
| ### Fields |
|
|
| - **dataset** |
| - `csqa`: training data constructed from the CommonsenseQA dataset. |
| - `gsm8k`: training data constructed from the multiple-choice GSM8K dataset. |
|
|
| - **base_model** |
| - `llama3_8b_instruct`: Llama 3 8B Instruct model. |
| - `qwen3_8b`: Qwen3 8B non-thinking instruction model. |
| |
| - **training_variant** |
| - `all_variants`: mixed SFT using all source-interaction probe variants, including bare, single-source, and double-source patterns. |
| - `bare100`: standard SFT using only bare prompts without external user or document assertions. |
|
|
| - **r** |
| - LoRA rank. For example, `r8` means LoRA rank 8. |
|
|
| - **bs** |
| - Training batch size. For example, `bs4` means batch size 4. |
|
|
| - **lr** |
| - Learning rate. For example, `lr1e5` means learning rate `1e-5`. |
|
|
| - **e** |
| - Number of training epochs. For example, `e3` means 3 epochs. |
|
|
| ## Training Variants |
|
|
| ### `bare100` |
|
|
| This setting fine-tunes the model only on standard question-answer examples without external assertions. It corresponds to the standard SFT baseline. |
|
|
| ### `all_variants` |
| |
| This setting fine-tunes the model on diverse source-interaction patterns. The training data includes the bare prompt, single-source prompts, and double-source prompts involving user and document assertions. These variants are designed to teach the model to better distinguish helpful external information from harmful or misleading information. |
| |
| For details about the probe construction and SFT setup, see the methodology and fine-tuning sections of the paper. |
| |
| ## Available Models |
| |
| - `csqa__llama3_8b_instruct__all_variants_r8_bs4_lr1e5_e3` |
| - `csqa__llama3_8b_instruct__bare100_r8_bs4_lr1e5_e3` |
| - `csqa__qwen3_8b__all_variants_r8_bs4_lr1e5_e3` |
| - `csqa__qwen3_8b__bare100_r8_bs4_lr1e5_e3` |
| - `gsm8k__llama3_8b_instruct__all_variants_r8_bs4_lr1e5_e3` |
| - `gsm8k__llama3_8b_instruct__bare100_r8_bs4_lr1e5_e3` |
| - `gsm8k__qwen3_8b__all_variants_r8_bs4_lr1e5_e3` |
| - `gsm8k__qwen3_8b__bare100_r8_bs4_lr1e5_e3` |