Papers
arxiv:2602.01845

No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation

Published on Feb 2
Authors:

Abstract

Proust is a 309M-parameter causal protein language model that bridges the gap between fitness prediction and generation capabilities through architectural innovations from large language models, achieving state-of-the-art performance on protein fitness prediction and indel tasks while maintaining native generative abilities.

AI-generated summary

Protein language models (PLMs) face a fundamental divide: masked language models (MLMs) excel at fitness prediction while causal models enable generation, forcing practitioners to maintain separate architectures. We introduce Proust, a 309M-parameter causal PLM that bridges this gap through architectural innovations adapted from recent LLM research, including grouped-query attention with shared K/V projections, cross-layer value residuals, and depthwise causal convolutions. Trained on 33B tokens in 40 B200 GPU-hours, Proust achieves Spearman ρ= 0.390 on ProteinGym substitutions, competitive with MLMs requiring 50--200times the compute. On indels, Proust sets a new state-of-the-art, outperforming models up to 20times larger. On EVEREST viral fitness benchmarks, it approaches structure-aware methods using sequence alone. These powerful representations position Proust in a sweet spot as it also retains native generative capabilities that MLMs lack by design. Interpretability analysis reveals that per-position entropy variance predicts, to an extent, when retrieval augmentation helps and hurts. Such insights can grow in both quantity and quality at scale and inform capabilities such as test-time scaling. Code and weights are available at https://github.com/Furkan9015/proust-inference

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.01845 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.01845 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.