English

Deep Spurious Regression

Paper Webpage GitHub License Python

Real-world regression often exhibits shortcuts: attributes spuriously correlated with continuous targets during training that become unreliable under deployment shifts. Existing work on spurious correlations focuses primarily on classification, where labels are categorical and groups are naturally defined. However, many real-world tasks require continuous prediction, where hard label boundaries or discrete group-label pairs do not exist.

We define Deep Spurious Regression (DSR) as learning from regression data with attribute-label confounding, addressing continuous spurious correlations, and generalizing to all attribute-label combinations at test time. Motivated by the intrinsic difference between classification and regression shortcuts, we propose to exploit the similarity among spurious attributes in both label and feature spaces β€” accounting for nearby targets and related groups while calibrating both label and learned feature distributions across attributes. Extensive experiments spanning computer vision, environmental sensing, and LLM regression verify the superior performance of our strategies.

πŸ“° News

πŸ’Ώ Installation

git clone https://github.com/yang-ai-lab/Deep-Spurious-Regression.git
pip install -r requirements.txt

Dependencies

  • torch
  • torchvision
  • numpy
  • pandas
  • Pillow
  • huggingface_hub

πŸ“‚ Data Preparation

Once installed, prepare your dataset as follows.

UTKFace β€” download the images from susanqq.github.io/UTKFace and place them under:

data/UTKFace/images/*.jpg

The train/val/test split CSVs are already included in data/ β€” no additional setup needed.

πŸ” Evaluation

Download Checkpoints

Checkpoints are hosted on HuggingFace at yang-ai-lab/Deep-Spurious-Regression.

Dataset Method File
UTKFace LMDS UTKFace/LMDS.pth
UTKFace FMDS UTKFace/FMDS.pth
UTKFace LMDS+FMDS UTKFace/LMDS_FMDS.pth

More checkpoints for other datasets will be released soon.

Download a checkpoint by specifying the dataset and method file:

from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="yang-ai-lab/Deep-Spurious-Regression",
    filename="<DATASET>/<METHOD_FILE>"  # e.g. "UTKFace/FMDS.pth"
)

Or via CLI:

huggingface-cli download yang-ai-lab/Deep-Spurious-Regression <DATASET>/<METHOD_FILE>

Run Evaluation

To reproduce the results in the original paper, use download_and_evaluate.py to automatically download and evaluate without manually specifying checkpoint paths:

# evaluate all methods on UTKFace
python download_and_evaluate.py --dataset UTKFace --data_folder ./data

# evaluate one specific method on UTKFace
python download_and_evaluate.py --dataset UTKFace --method FMDS.pth --data_folder ./data

Alternatively, after downloading a checkpoint manually (see Download Checkpoints), run:

python evaluate.py --dataset <DATASET> --ckpt <CKPT_PATH> --data_folder <DATA_ROOT>

For example:

python evaluate.py --dataset UTKFace --ckpt UTKFace/FMDS.pth --data_folder ./data

πŸ“Š Results

Test L1 errors (↓ lower is better) for our proposed methods β€” LMDS, FMDS, and LMDS+FMDS β€” on UTKFace.

Dataset LMDS FMDS LMDS+FMDS
UTKFace 7.039 6.961 7.032

πŸ” Reproducibility Notes

This repo is intentionally lightweight and focuses on inference for one dataset (UTKFace). Full training code and evaluation on additional datasets will be released upon the acceptance of the paper.

πŸ“ Citation

If you use this work in your research, please cite the paper:

@article{xu2026shortcut,
  title   = {Shortcut to Nowhere: Demystifying Deep Spurious Regression},
  author  = {Xu, Guanrong and Li, Jessica and Wang, Hao and Yang, Yuzhe},
  journal = {arXiv preprint arXiv:2606.01723},
  year    = {2026}
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for yang-ai-lab/Deep-Spurious-Regression