codecodebear
/

llm_source_balancing_sft_lora

Model card Files Files and versions

llm_source_balancing_sft_lora / README.md

codecodebear's picture

Add files using upload-large-folder tool

8d5d39f verified about 15 hours ago

|

history blame contribute delete

2.72 kB

	# Fine-tuned Models

	This repository accompanies our paper: [How Large Language Models Balance Internal Knowledge with User and Document Assertions](https://arxiv.org/abs/2604.22193)

	Our code is available at: [GitHub Repository](https://github.com/shuowl/llm-source-balancing)

	This directory contains supervised fine-tuned models for our three-source interaction experiments. The models are trained on data constructed from either CommonsenseQA (CSQA) or GSM8K, using different source-interaction variants described in the paper.

	## Model Naming Convention

	Each model folder follows the format:

	`{dataset}__{base_model}__{training_variant}_r{rank}_bs{batch_size}_lr{learning_rate}_e{epochs}`

	Example:

	`csqa__llama3_8b_instruct__all_variants_r8_bs4_lr1e5_e3`

	### Fields

	- dataset
	- `csqa`: training data constructed from the CommonsenseQA dataset.
	- `gsm8k`: training data constructed from the multiple-choice GSM8K dataset.

	- base_model
	- `llama3_8b_instruct`: Llama 3 8B Instruct model.
	- `qwen3_8b`: Qwen3 8B non-thinking instruction model.

	- training_variant
	- `all_variants`: mixed SFT using all source-interaction probe variants, including bare, single-source, and double-source patterns.
	- `bare100`: standard SFT using only bare prompts without external user or document assertions.

	- r
	- LoRA rank. For example, `r8` means LoRA rank 8.

	- bs
	- Training batch size. For example, `bs4` means batch size 4.

	- lr
	- Learning rate. For example, `lr1e5` means learning rate `1e-5`.

	- e
	- Number of training epochs. For example, `e3` means 3 epochs.

	## Training Variants

	### `bare100`

	This setting fine-tunes the model only on standard question-answer examples without external assertions. It corresponds to the standard SFT baseline.

	### `all_variants`

	This setting fine-tunes the model on diverse source-interaction patterns. The training data includes the bare prompt, single-source prompts, and double-source prompts involving user and document assertions. These variants are designed to teach the model to better distinguish helpful external information from harmful or misleading information.

	For details about the probe construction and SFT setup, see the methodology and fine-tuning sections of the paper.

	## Available Models

	- `csqa__llama3_8b_instruct__all_variants_r8_bs4_lr1e5_e3`
	- `csqa__llama3_8b_instruct__bare100_r8_bs4_lr1e5_e3`
	- `csqa__qwen3_8b__all_variants_r8_bs4_lr1e5_e3`
	- `csqa__qwen3_8b__bare100_r8_bs4_lr1e5_e3`
	- `gsm8k__llama3_8b_instruct__all_variants_r8_bs4_lr1e5_e3`
	- `gsm8k__llama3_8b_instruct__bare100_r8_bs4_lr1e5_e3`
	- `gsm8k__qwen3_8b__all_variants_r8_bs4_lr1e5_e3`
	- `gsm8k__qwen3_8b__bare100_r8_bs4_lr1e5_e3`