Papers
arxiv:2604.15950

TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation

Published on Apr 17
· Submitted by
Tristan
on Apr 20
Authors:
,
,
,
,

Abstract

TwinTrack framework addresses pancreatic cancer segmentation ambiguity through post-hoc calibration of ensemble probabilities to empirical mean human response, improving calibration metrics on multi-rater benchmarks.

AI-generated summary

Pancreatic ductal adenocarcinoma (PDAC) segmentation on contrast-enhanced CT is inherently ambiguous: inter-rater disagreement among experts reflects genuine uncertainty rather than annotation noise. Standard deep learning approaches assume a single ground truth, producing probabilistic outputs that can be poorly calibrated and difficult to interpret under such ambiguity. We present TwinTrack, a framework that addresses this gap through post-hoc calibration of ensemble segmentation probabilities to the empirical mean human response (MHR) -the fraction of expert annotators labeling a voxel as tumor. Calibrated probabilities are thus directly interpretable as the expected proportion of annotators assigning the tumor label, explicitly modeling inter-rater disagreement. The proposed post-hoc calibration procedure is simple and requires only a small multi-rater calibration set. It consistently improves calibration metrics over standard approaches when evaluated on the MICCAI 2025 CURVAS-PDACVI multi-rater benchmark.

Community

Paper author Paper submitter

Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers, and its segmentation on contrast-enhanced CT is fundamentally ambiguous: when experts disagree, that disagreement often reflects real uncertainty rather than annotation noise. TwinTrack is a simple post-hoc multi-rater calibration method that transforms ensemble segmentation probabilities into predictions aligned with the Mean Human Response, better capturing expert disagreement. In other words: not just better segmentation, but better-calibrated uncertainty for genuinely ambiguous clinical images.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.15950 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.15950 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.15950 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.