| --- |
| license: apache-2.0 |
| library_name: timm |
| tags: |
| - histopathology |
| - pathology |
| - dino |
| - vision-transformer |
| - prostate |
| - feature-extraction |
| pipeline_tag: image-feature-extraction |
| --- |
| |
| # Prost40M |
|
|
| **Prost40M** is a prostatectomy-specific foundation model pretrained with DINO on a large corpus of H&E prostatectomy slides. |
| It is designed as a strong feature extractor for computational pathology tasks where subtle prostate-specific morphology matters. |
|
|
|
|
| ## Model At a Glance |
|
|
| | Field | Value | |
| | --- | --- | |
| | Model name | Prost40M | |
| | Backbone architecture | `vit_small` | |
| | Input size | `224 x 224` | |
| | Patch size | `14` | |
| | Embedding dimension | `384` | |
| | Released weights | Teacher backbone encoder | |
| | Domain | H&E prostatectomy histopathology | |
|
|
| ## Quickstart |
|
|
| ```python |
| import torch |
| import timm |
| from PIL import Image |
| from timm.data import resolve_data_config |
| from timm.data.transforms_factory import create_transform |
| |
| model = timm.create_model("hf-hub:waticlems/Prost40M", pretrained=True) |
| model.eval() |
| |
| transform = create_transform(**resolve_data_config(model.pretrained_cfg, model=model)) |
| |
| img = Image.open("tile.png").convert("RGB") |
| x = transform(img).unsqueeze(0) |
| with torch.inference_mode(): |
| embedding = model(x) # shape: [1, 384] |
| print(embedding.shape) |
| ``` |
|
|
| ## Motivation |
|
|
| Large pathology foundation models are typically trained on broad, multi-organ |
| data. Their generic features transfer well across many settings, but can be less |
| sensitive to fine-grained morphology of a specific organ. Prost40M was developed |
| to evaluate the value of organ-specific pretraining in prostate histopathology. |
|
|
| ## Training Data |
|
|
| - Approx. 40 million image tiles at `0.50` microns per pixel |
| - 1888 H&E-stained prostatectomy slides |
| - 449 slides from 403 patients in the TCGA-PRAD cohort |
| - 1439 slides from 508 patients in the LEOPARD cohort |
|
|
| ## Intended Use |
|
|
| - Tile-level feature extraction for downstream prostate histopathology tasks |
|
|
| ## Limitations |
|
|
| - Performance can degrade under domain shift (scanner, stain protocol, center) |
| - Learned representations reflect dataset composition and preprocessing choices |
|
|
| ## License |
|
|
| Apache-2.0 |
|
|
| ## Citation |
|
|
| If you use **Prost40M**, cite: |
|
|
| ``` |
| @misc{grisi2026bcr, |
| title={Deep Learning From Routine Histology Improves Risk Stratification for Biochemical Recurrence in Prostate Cancer}, |
| author={Clément Grisi and Khrystyna Faryna and Nefise Uysal and Vittorio Agosti and Enrico Munari and Solène-Florence Kammerer-Jacquet and Paulo Guilherme de Oliveira Salles and Yuri Tolkach and Reinhard Büttner and Sofiya Semko and Maksym Pikul and Axel Heidenreich and Jeroen van der Laak and Geert Litjens}, |
| year={2026}, |
| eprint={2603.14187}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV}, |
| url={https://arxiv.org/abs/2603.14187}, |
| } |
| ``` |
|
|