| --- |
| license: apache-2.0 |
| pipeline_tag: graph-ml |
| tags: |
| - biology |
| - protein |
| - molecule |
| - dna |
| - rna |
| - graph-neural-network |
| --- |
| |
| # Cuttlefish-Encoder |
|
|
| Graph encoder component of **Cuttlefish**, a unified all-atom LLM that grounds language reasoning in geometric cues while scaling modality tokens with structural complexity. |
|
|
| This model was presented in the paper [Scaling-Aware Adapter for Structure-Grounded LLM Reasoning](https://arxiv.org/abs/2602.02780). |
|
|
| - **Code:** [GitHub - zihao-jing/Cuttlefish](https://github.com/zihao-jing/Cuttlefish) |
| - **Pretrained with:** Masked reconstruction on all-atom structures. |
|
|
| ## Usage |
|
|
| You can download the encoder using the `huggingface_hub` library: |
|
|
| ```python |
| from huggingface_hub import snapshot_download |
| encoder_dir = snapshot_download("zihaojing/Cuttlefish-Encoder") |
| |
| # Load via the Cuttlefish codebase |
| # See https://github.com/zihao-jing/Cuttlefish for full usage |
| ``` |
|
|
| ## Pretraining data |
|
|
| Pretrained on **[Cuttlefish-Encoder-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-Encoder-Data)**, covering: |
| - Molecules (SMILES → 3D graph) |
| - Proteins (PDB/CIF → all-atom graph) |
| - DNA and RNA sequences |
|
|
| ## Model details |
|
|
| - **Architecture**: All-atom graph encoder with Scaling-Aware Patching. |
| - **Encoder hidden dim**: 256 |
| - **Modalities**: molecule, protein, dna, rna |
|
|
| ## Related resources |
|
|
| | Resource | Link | |
| |---|---| |
| | Full Cuttlefish LLM | [zihaojing/Cuttlefish](https://huggingface.co/zihaojing/Cuttlefish) | |
| | SFT instruction data | [zihaojing/Cuttlefish-SFT-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-SFT-Data) | |
| | Encoder pretraining data | [zihaojing/Cuttlefish-Encoder-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-Encoder-Data) | |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{jing2026cuttlefish, |
| title = {Cuttlefish: Scaling-Aware Adapter for Structure-Grounded LLM Reasoning}, |
| author = {Jing, Zihao and Zeng, Qiuhao and Fang, Ruiyi and Li, Yan Yi and Sun, Yan Table, Boyu and Hu, Pingzhao}, |
| booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)}, |
| year = {2026}, |
| url = {https://arxiv.org/abs/2602.02780} |
| } |
| ``` |