File size: 3,459 Bytes
7b506f2
 
 
 
 
08983ea
85ae21c
 
 
 
 
 
 
57a7249
85ae21c
 
 
7166cfe
 
 
85ae21c
 
 
 
08983ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
license: apache-2.0
tags:
- chemistry
- biology
pipeline_tag: other
---

<p align="center">
  <img src="assets/disco.png" alt="DISCO: Diffusion for Sequence-Structure Co-design" width="900"/>
</p>

<p align="center">
    <img src="assets/carbene.gif" width="700"/>
</p>

<p align="center">
  <a href="https://arxiv.org/abs/2604.05181"><img src="https://img.shields.io/badge/arXiv-94133F?style=for-the-badge&logo=arxiv" alt="arXiv"/></a>
  <a href="https://disco-design.github.io/"><img src="https://img.shields.io/badge/📝%20Blog-007A87?style=for-the-badge&logoColor=white" alt="Blog"/></a>
  <a href="https://github.com/DISCO-design/DISCO"><img src="https://img.shields.io/badge/GitHub-747474.svg?style=for-the-badge&logo=GitHub&logoColor=white" alt="HF"/></a>
</p>

DISCO (DIffusion for Sequence-structure CO-design) is a multimodal generative model that simultaneously co-designs protein sequences and 3D structures, conditioned on and co-folded with arbitrary biomolecules — including small-molecule ligands, DNA, and RNA. Unlike sequential pipelines that first generate a backbone and then apply inverse folding, DISCO generates both modalities jointly, enabling sequence-based objectives to inform structure generation and vice versa.

The model was introduced in the paper [General Multimodal Protein Design Enables DNA-Encoding of Chemistry](https://huggingface.co/papers/2604.05181).

## Sample Usage

To run inference, first follow the installation instructions in the [official GitHub repository](https://github.com/DISCO-design/DISCO). You can then run generation using the provided runner:

```bash
python runner/inference.py \
  experiment=designable \
  input_json_path=input_jsons/unconditional_config.json \
  seeds=\[0,1,2,3,4\]
```

### Key Parameters:
- `experiment`: Use `designable` (steers toward samples more likely to refold correctly) or `diverse` (produces greater structural variety).
- `input_json_path`: Path to the JSON file describing the generation target (masked sequences, ligands, etc.).
- `effort`: Use `max` for full quality (200 diffusion steps, 4 recycling cycles) or `fast` for prototyping.

## Abstract

Evolution is an extraordinary engine for enzymatic diversity, yet the chemistry it has explored remains a narrow slice of what DNA can encode. Deep generative models can design new proteins that bind ligands, but none have created enzymes without pre-specifying catalytic residues. We introduce DISCO (DIffusion for Sequence-structure CO-design), a multimodal model that co-designs protein sequence and 3D structure around arbitrary biomolecules. Conditioned solely on reactive intermediates, DISCO designs diverse heme enzymes with novel active-site geometries that catalyze new-to-nature carbene-transfer reactions with high activities exceeding those of engineered enzymes.

## Citation

```bibtex
@Article{disco2026,
      title={General Multimodal Protein Design Enables DNA-Encoding of Chemistry},
      author={Jarrid Rector-Brooks and Théophile Lambert and Marta Skreta and Daniel Roth and Yueming Long and Zi-Qi Li and Xi Zhang and Miruna Cretu and Francesca-Zhoufan Li and Tanvi Ganapathy and Emily Jin and Avishek Joey Bose and Jason Yang and Kirill Neklyudov and Yoshua Bengio and Alexander Tong and Frances H. Arnold and Cheng-Hao Liu},
      year={2026},
      eprint={2604.05181},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2604.05181},
}
```