Qwen2-Audio + PCLM + DPO

ICML 2026 Paper Project Page Code Dataset AF3 + PCLM + DPO License

PCLM- and DPO-finetuned Qwen2-Audio-7B-Instruct from Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox (ICML 2026).

The base model is augmented with the Prompt-Conditioned Layer Mixer (PCLM) — a lightweight module that adaptively mixes representations from intermediate audio-encoder layers based on the user prompt — and then post-trained with Direct Preference Optimization (DPO) to prefer acoustically-grounded answers over language-implied alternatives on paralinguistic MCQs.

Usage

This checkpoint cannot be loaded with stock transformers — PCLM requires the custom modeling code shipped in the release repo.

git clone https://github.com/ihp-lab/VoxParadox
cd VoxParadox
conda create -n qwen2audio python=3.10 -y && conda activate qwen2audio
pip install torch torchaudio transformers accelerate librosa soundfile

Inference on VoxParadox (or any MCQ JSON in the same schema):

python -m qwen2audio.eval.run_eval \
    --model_path IHP-Lab/Qwen2-Audio_PCLM_DPO \
    --data_path  /path/to/voxparadox.json \
    --audio_base /path/to/audio_root \
    --output_dir runs/eval/qwen2audio_pclm_dpo

Score with the dataset-shipped eval.py:

python eval.py --predictions runs/eval/qwen2audio_pclm_dpo/predictions.jsonl

The loader auto-detects use_pclm=True from config.json and activates PCLM with expose_layers=[5, 15, 25, 30] over the audio encoder.

Project resources

Citation

@inproceedings{pang2026voxparadox,
  title     = {Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox},
  author    = {Pang, Jiacheng and Chaubey, Ashutosh and Soleymani, Mohammad},
  booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
  year      = {2026}
}

License

USC Research License (research / non-profit only). See LICENSE.

The base model (Qwen/Qwen2-Audio-7B-Instruct) carries its own Tongyi Qianwen license terms, which continue to apply to the inherited weights.

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IHP-Lab/Qwen2-Audio_PCLM_DPO

Finetuned
(19)
this model

Paper for IHP-Lab/Qwen2-Audio_PCLM_DPO