mizuno-group
/

ccbert-R-pretrained

Text Classification

relation-extraction

Model card Files Files and versions

ccbert-R-pretrained / README.md

DevWithKaiju's picture

Update README.md

c24c85e verified 3 months ago

|

history blame contribute delete

3.1 kB

	---
	license: mit
	language:
	- en
	base_model:
	- microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
	tags:
	- biomedical
	- relation-extraction
	- text-classification
	---

	# cell-cell-BERT

	Configuration: R-pretrained

	This model includes learned embeddings for special tokens (e.g., [CELL0], [CELL1]), acquired through continued pre-training on biomedical text.

	## Model Description

	This is a specific configuration of the cell-cell-BERT model for extracting cell-cell interactions from biomedical text. It determines whether a sentence describes a direct biological relationship between two target cell types.

	For full details, see our paper: "Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints" (bioRxiv, 2025).

	* Repository: [https://github.com/mizuno-group/cell-cell-bert](https://github.com/mizuno-group/cell-cell-bert)
	* Paper: [https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)

	## Model Configuration
	This model corresponds to the following experimental setting in the paper:

	* Entity Indication: [Replacement (e.g., `[CELL0]`) / Boundary Marking (e.g., `<E0>...`)]
	* Architecture: [Entity-aware (R-BERT style) / CLS-only]
	* Pre-training: [Continued Pre-training (CPT) / Base (Fine-tuning only)]

	Note: Please ensure your input data preprocessing matches the Entity Indication* method specified above.*

	## How to Get Started

	Preprocessing Requirement:
	Depending on the configuration above, you must insert specific special tokens into your input text before feeding it to the model.

	* For Replacement models: Replace cell names with `[CELL0]` and `[CELL1]`.
	* For Boundary models: Wrap cell names with `<E0>...</E0>` and `<E1>...</E1>`.

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# 1. Load the model
	model_name = "mizuno-group/ccbert-[INSERT-CONFIG-NAME]"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# 2. Prepare Input
	# CHANGE THIS LINE based on the Entity Indication method of this model:
	# text = "The [CELL0] activate [CELL1]." # If Replacement
	text = "The <E0> Macrophages </E0> activate <E1> T cells </E1>." # If Boundary Marking

	# 3. Inference
	inputs = tokenizer(text, return_tensors="pt")

	with torch.no_grad():
	logits = model(**inputs).logits
	predicted_class_id = logits.argmax().item()

	# 0 = No Relation, 1 = Relation Exists
	print(f"Predicted Class: {predicted_class_id}")

	```

	## Citation

	```bibtex
	@article{Yoshikawa2025CCBERT,
	title = {Defining and Evaluating Cell–Cell Relation Extraction from Biomedical Literature under Realistic Annotation Constraints},
	author = {Yoshikawa Mei and Mizuno Tadahaya and Ohto Yohei and Fujimoto Hiromi and Kusuhara Hiroyuki},
	journal = {bioRxiv},
	year = {2025},
	doi = {10.64898/2025.12.01.691726},
	url = {[https://doi.org/10.64898/2025.12.01.691726](https://doi.org/10.64898/2025.12.01.691726)}
	}

	```