MeFEm: Medical Face Embedding model
Paper
• 2602.14672 • Published
Vision Transformers pre-trained on face data for potential medical applications. Available in Small (MeFEm-S) and Base (MeFEm-B) sizes.
import torch
import timm
# Load model (MeFEm-S example)
model = timm.create_model(
'vit_small_patch16_224',
pretrained=False,
num_classes=0, # No classification head
global_pool='token' # Use CLS token (default)
)
model.load_state_dict(torch.load('mefem-s.pt'))
model.eval()
# Forward pass
x = torch.randn(1, 3, 224, 224) # Your face image
embeddings = model(x) # [1, 384] CLS token embeddings
global_pool='token') or all tokens (global_pool='')# For all tokens (CLS + patches):
model = timm.create_model('vit_small_patch16_224', num_classes=0, global_pool='')
tokens = model(x) # [1, 197, 384]
# For patch embeddings only:
tokens = model.forward_features(x)
patch_embeddings = tokens[:, 1:] # [1, 196, 384]
Face images from FaceCaption-15M, AVSpeech, and SHFQ datasets (~6.5M total). Images were cropped with expanded (2×) face bounding boxes.
CC BY 4.0. Reference paper if used:
@misc{borets2026mefemmedicalfaceembedding,
title={MeFEm: Medical Face Embedding model},
author={Yury Borets and Stepan Botman},
year={2026},
eprint={2602.14672},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.14672},
}