StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition
Paper โข 2604.21689 โข Published โข 18
StyleID is a CLIP-based image encoder trained to produce identity embeddings that are robust to stylization.
It can be used for identity similarity, retrieval, evaluation, and conditioning in generative models.
pip install transformers pillow
import torch
from transformers import CLIPModel, CLIPProcessor
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
model = CLIPModel.from_pretrained("kwanY/styleid").to(device)
processor = CLIPProcessor.from_pretrained("kwanY/styleid")
img = Image.open(img_path).convert("RGB")
inputs = processor(images=img, return_tensors="pt").to(device)
with torch.no_grad():
emb = model.get_image_features(**inputs)
emb = emb / emb.norm(dim=-1, keepdim=True) # optional but recommended