Cuttlefish-Encoder / README.md
nielsr's picture
nielsr HF Staff
Improve model card: add metadata, paper, and code links
caf986d verified
|
Raw
History Blame
2.15 kB
metadata
license: apache-2.0
pipeline_tag: graph-ml
tags:
  - biology
  - protein
  - molecule
  - dna
  - rna
  - graph-neural-network

Cuttlefish-Encoder

Graph encoder component of Cuttlefish, a unified all-atom LLM that grounds language reasoning in geometric cues while scaling modality tokens with structural complexity.

This model was presented in the paper Scaling-Aware Adapter for Structure-Grounded LLM Reasoning.

Usage

You can download the encoder using the huggingface_hub library:

from huggingface_hub import snapshot_download
encoder_dir = snapshot_download("zihaojing/Cuttlefish-Encoder")

# Load via the Cuttlefish codebase
# See https://github.com/zihao-jing/Cuttlefish for full usage

Pretraining data

Pretrained on Cuttlefish-Encoder-Data, covering:

  • Molecules (SMILES → 3D graph)
  • Proteins (PDB/CIF → all-atom graph)
  • DNA and RNA sequences

Model details

  • Architecture: All-atom graph encoder with Scaling-Aware Patching.
  • Encoder hidden dim: 256
  • Modalities: molecule, protein, dna, rna

Related resources

Resource Link
Full Cuttlefish LLM zihaojing/Cuttlefish
SFT instruction data zihaojing/Cuttlefish-SFT-Data
Encoder pretraining data zihaojing/Cuttlefish-Encoder-Data

Citation

@article{jing2026cuttlefish,
  title     = {Cuttlefish: Scaling-Aware Adapter for Structure-Grounded LLM Reasoning},
  author    = {Jing, Zihao and Zeng, Qiuhao and Fang, Ruiyi and Li, Yan Yi and Sun, Yan Table, Boyu and Hu, Pingzhao},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  year      = {2026},
  url       = {https://arxiv.org/abs/2602.02780}
}