zjunlp
/

InstructCell-instruct

Model card Files Files and versions

InstructCell-instruct / README.md

Yin Fang

Update README.md

8d17a51 verified about 1 year ago

|

2.99 kB

	---
	license: mit
	---

	## 🗞️ Model description
	InstructCell is a multi-modal AI copilot that integrates natural language with single-cell RNA sequencing data, enabling researchers to perform tasks like cell type annotation, pseudo-cell generation, and drug sensitivity prediction through intuitive text commands.
	By leveraging a specialized multi-modal architecture and our multi-modal single-cell instruction dataset, InstructCell reduces technical barriers and enhances accessibility for single-cell analysis.

	Instruct Version: Supports generating only the answer portion without additional explanatory text, providing concise and task-specific outputs.


	### 🚀 How to use

	We provide a simple example for quick reference. This demonstrates a basic cell type annotation workflow.

	Make sure to specify the paths for `H5AD_PATH` and `GENE_VOCAB_PATH` appropriately:
	- `H5AD_PATH`: Path to your `.h5ad` single-cell data file (e.g., `H5AD_PATH = "path/to/your/data.h5ad"`).
	- `GENE_VOCAB_PATH`: Path to your gene vocabulary file (e.g., `GENE_VOCAB_PATH = "path/to/your/gene_vocab.npy"`).

	```python
	from mmllm.module import InstructCell
	import anndata
	import numpy as np
	from utils import unify_gene_features

	# Load the pre-trained InstructCell model from HuggingFace
	model = InstructCell.from_pretrained("zjunlp/InstructCell-instruct")

	# Load the single-cell data (H5AD format) and gene vocabulary file (numpy format)
	adata = anndata.read_h5ad(H5AD_PATH)
	gene_vocab = np.load(GENE_VOCAB_PATH)
	adata = unify_gene_features(adata, gene_vocab, force_gene_symbol_uppercase=False)

	# Select a random single-cell sample and extract its gene counts and metadata
	k = np.random.randint(0, len(adata))
	gene_counts = adata[k, :].X.toarray()
	sc_metadata = adata[k, :].obs.iloc[0].to_dict()

	# Define the model prompt with placeholders for metadata and gene expression profile
	prompt = (
	"Can you help me annotate this single cell from a {species}? "
	"It was sequenced using {sequencing_method} and is derived from {tissue}. "
	"The gene expression profile is {input}. Thanks!"
	)

	# Use the model to generate predictions
	for key, value in model.predict(
	prompt,
	gene_counts=gene_counts,
	sc_metadata=sc_metadata,
	do_sample=True,
	top_p=0.95,
	top_k=50,
	max_new_tokens=256,
	).items():
	# Print each key-value pair
	print(f"{key}: {value}")
	```

	For more detailed explanations and additional examples, please refer to the Jupyter notebook [demo.ipynb](https://github.com/zjunlp/InstructCell/blob/main/demo.ipynb).


	### 🔖 Citation

	If you use the code or data, please cite the following paper:

	```bibtex
	@article{fang2025instructcell,
	title={A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following},
	author={Fang, Yin and Deng, Xinle and Liu, Kangwei and Zhang, Ningyu and Qian, Jingyang and Yang, Penghui and Fan, Xiaohui and Chen, Huajun},
	journal={arXiv preprint arXiv:2501.08187},
	year={2025}
	}
	```