| | --- |
| | tags: |
| | - model_hub_mixin |
| | - pytorch_model_hub_mixin |
| | --- |
| | |
| | # sCellTransformer |
| |
|
| | sCellTransformer (sCT) is a long-range foundation model designed for zero-shot |
| | prediction tasks in single-cell RNA-seq and spatial transcriptomics data. It processes |
| | raw gene expression profiles across multiple cells to predict discretized gene |
| | expression levels for unseen cells without retraining. The model can handle up to 20,000 |
| | protein-coding genes and a bag of 50 cells in the same sample. This ability |
| | (around a million-gene expressions tokens) allows it to learn cross-cell |
| | relationships and capture long-range dependencies in gene expression data, |
| | and to mitigate the sparsity typical in single-cell datasets. |
| |
|
| | sCT is trained on a large dataset of single-cell RNA-seq and finetuned on spatial |
| | transcriptomics data. Evaluation tasks include zero-shot imputation of masked gene |
| | expression, and zero-shot prediction of cell types. |
| |
|
| | **Developed by:** [InstaDeep](https://huggingface.co/InstaDeepAI) |
| |
|
| | ### Model Sources |
| |
|
| | <!-- Provide the basic links for the model. --> |
| |
|
| | - **Repository: |
| | ** [Nucleotide Transformer](https://github.com/instadeepai/nucleotide-transformer) |
| | - **Paper: |
| | ** [A long range foundation model for zero-shot predictions in single-cell and spatial transcriptomics data](https://openreview.net/pdf?id=VdX9tL3VXH) |
| |
|
| | ### How to use |
| |
|
| | Until its next release, the transformers library needs to be installed from source with |
| | the following command in order to use the models. |
| | PyTorch should also be installed. |
| |
|
| | ``` |
| | pip install --upgrade git+https://github.com/huggingface/transformers.git |
| | pip install torch |
| | ``` |
| |
|
| | A small snippet of code is given here in order to infer with the model from random |
| | input. |
| |
|
| | ``` |
| | import torch |
| | from transformers import AutoModel |
| | |
| | model = AutoModel.from_pretrained( |
| | "InstaDeepAI/sCellTransformer", |
| | trust_remote_code=True, |
| | ) |
| | num_cells = model.config.num_cells |
| | dummy_gene_expressions = torch.randint(0, 5, (1, 19968 * num_cells)) |
| | torch_output = model(dummy_gene_expressions) |
| | ``` |
| |
|
| | A more concrete example is provided in the notebook example on one of the downstream |
| | evaluation dataset. |
| |
|
| | #### Training data |
| |
|
| | The model was trained following a two-step procedure: |
| | pre-training on single-cell data, then finetuning on spatial transcriptomics data. |
| | The single-cell data used for pre-training, comes from the |
| | [Cellxgene Census collection datasets](https://cellxgene.cziscience.com/) |
| | used to train the scGPT models. It consists of around 50 millions |
| | cells and approximately 60,000 genes. The spatial data comes from both the [human |
| | breast cell atlas](https://cellxgene.cziscience.com/collections/4195ab4c-20bd-4cd3-8b3d-65601277e731) |
| | and [the human heart atlas](https://www.heartcellatlas.org/). |
| |
|
| | #### Training procedure |
| |
|
| | As detailed in the paper, the gene expressions are first binned into a pre-defined |
| | number of bins. This allows the model to better learn the distribution of the gene |
| | expressions through sparsity mitigation, noise reduction, and extreme-values handling. |
| | Then, the training objective is to predict the masked gene expressions in a cell, |
| | following a BERT-like style training. |
| |
|
| | ### BibTeX entry and citation info |
| |
|
| | ``` |
| | @misc{joshi2025a, |
| | title={A long range foundation model for zero-shot predictions in single-cell and |
| | spatial transcriptomics data}, |
| | author={Ameya Joshi and Raphael Boige and Lee Zamparo and Ugo Tanielian and Juan Jose |
| | Garau-Luis and Michail Chatzianastasis and Priyanka Pandey and Janik Sielemann and |
| | Alexander Seifert and Martin Brand and Maren Lang and Karim Beguir and Thomas PIERROT}, |
| | year={2025}, |
| | url={https://openreview.net/forum?id=VdX9tL3VXH} |
| | } |
| | ``` |