waticlems
/

Prost40M

Image Feature Extraction

vision-transformer

feature-extraction

Model card Files Files and versions

Prost40M / README.md

waticlems's picture

Update README.md

8df05a3 verified 19 days ago

|

history blame contribute delete

2.83 kB

	---
	license: apache-2.0
	library_name: timm
	tags:
	- histopathology
	- pathology
	- dino
	- vision-transformer
	- prostate
	- feature-extraction
	pipeline_tag: image-feature-extraction
	---

	# Prost40M

	Prost40M is a prostatectomy-specific foundation model pretrained with DINO on a large corpus of H&E prostatectomy slides.
	It is designed as a strong feature extractor for computational pathology tasks where subtle prostate-specific morphology matters.


	## Model At a Glance

	\| Field \| Value \|
	\| --- \| --- \|
	\| Model name \| Prost40M \|
	\| Backbone architecture \| `vit_small` \|
	\| Input size \| `224 x 224` \|
	\| Patch size \| `14` \|
	\| Embedding dimension \| `384` \|
	\| Released weights \| Teacher backbone encoder \|
	\| Domain \| H&E prostatectomy histopathology \|

	## Quickstart

	```python
	import torch
	import timm
	from PIL import Image
	from timm.data import resolve_data_config
	from timm.data.transforms_factory import create_transform

	model = timm.create_model("hf-hub:waticlems/Prost40M", pretrained=True)
	model.eval()

	transform = create_transform(**resolve_data_config(model.pretrained_cfg, model=model))

	img = Image.open("tile.png").convert("RGB")
	x = transform(img).unsqueeze(0)
	with torch.inference_mode():
	embedding = model(x) # shape: [1, 384]
	print(embedding.shape)
	```

	## Motivation

	Large pathology foundation models are typically trained on broad, multi-organ
	data. Their generic features transfer well across many settings, but can be less
	sensitive to fine-grained morphology of a specific organ. Prost40M was developed
	to evaluate the value of organ-specific pretraining in prostate histopathology.

	## Training Data

	- Approx. 40 million image tiles at `0.50` microns per pixel
	- 1888 H&E-stained prostatectomy slides
	- 449 slides from 403 patients in the TCGA-PRAD cohort
	- 1439 slides from 508 patients in the LEOPARD cohort

	## Intended Use

	- Tile-level feature extraction for downstream prostate histopathology tasks

	## Limitations

	- Performance can degrade under domain shift (scanner, stain protocol, center)
	- Learned representations reflect dataset composition and preprocessing choices

	## License

	Apache-2.0

	## Citation

	If you use Prost40M, cite:

	```
	@misc{grisi2026bcr,
	title={Deep Learning From Routine Histology Improves Risk Stratification for Biochemical Recurrence in Prostate Cancer},
	author={Clément Grisi and Khrystyna Faryna and Nefise Uysal and Vittorio Agosti and Enrico Munari and Solène-Florence Kammerer-Jacquet and Paulo Guilherme de Oliveira Salles and Yuri Tolkach and Reinhard Büttner and Sofiya Semko and Maksym Pikul and Axel Heidenreich and Jeroen van der Laak and Geert Litjens},
	year={2026},
	eprint={2603.14187},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2603.14187},
	}
	```