Papers
arxiv:2605.17602

AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

Published on May 20
· Submitted by
Kuei-Chun Kao
on May 22
Authors:
,
,
,

Abstract

AutoRubric-T2I automatically generates and selects explicit rubrics to guide Vision-Language Model judges for text-to-image generation, achieving high-quality reward signals with minimal human annotation while improving generation quality in downstream tasks.

AI-generated summary

Aligning Text-to-Image (T2I) generation models with human preferences increasingly relies on image reward models that score or rank generated images according to prompt alignment and perceptual quality. Existing reward models are commonly trained as Bradley-Terry (BT) preference models on large-scale human preference corpora, making them costly to train, difficult to adapt, and opaque in their evaluation criteria. Meanwhile, Vision-Language Model (VLM) judges can provide more fine-grained assessments through textual rubrics, but their manually designed or heuristically generated scoring rules may fail to reliably reflect human preferences. In this paper, we propose AutoRubric-T2I, the first rubric learning framework in T2I that automatically synthesizes and selects explicit rubrics for guiding VLM judges. AutoRubric-T2I first synthesizes reasoning traces from preference pairs into candidate rubrics, then uses a VLM judge to score paired images under each rubric, producing pairwise rubric-score differences for preference learning. To remove noisy and redundant rules, we further employ a ell_1-Regularized Logistic Regression Refiner, which selects the Top-N most discriminative rubrics. Extensive evaluations show that AutoRubric-T2I produces high-quality, interpretable reward signals using less than 0.01% of the annotated preference data, substantially reducing the need for large-scale reward-model training. On image reward benchmarks such as MMRB2, AutoRubric-T2I outperforms strong reward model baselines. We further validate AutoRubric-T2I as an RL reward on downstream T2I tasks, including TIIF and UniGenBench++, where it improves generation quality over scalar reward models using the Flow-GRPO pipeline on diffusion models.

Community

Paper submitter

AutoRubric-T2I learns a compact set of weighted natural-language rubrics from image preference data, enabling interpretable VLM-based reward modeling without fine-tuning—using less than 0.01% of the annotated preference data that standard reward models require.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.17602
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.17602 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.17602 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.17602 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.