PerceptCLIP
/

PerceptCLIP_Memorability

computer_vision

perceptual_tasks

Model card Files Files and versions

PerceptCLIP_Memorability / README.md

Amitz244's picture

Update README.md

6cb4705 verified 12 months ago

|

history blame contribute delete

2.99 kB

	---
	language:
	- en
	base_model:
	- openai/clip-vit-large-patch14
	tags:
	- memorability
	- computer_vision
	- perceptual_tasks
	- CLIP
	- LaMem
	- THINGS
	---
	PerceptCLIP-Memorability is a model designed to predict image memorability (the likelihood of an image to be remembered). This is the official model from the paper:
	📄 ["Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks"](https://arxiv.org/abs/2503.13260).
	We apply LoRA adaptation on the CLIP visual encoder and add an MLP head for memorability prediction. Our model achieves state-of-the-art results.

	## Training Details

	- Dataset: [LaMem](http://memorability.csail.mit.edu/download.html) (Large-Scale Image Memorability)
	- Architecture: CLIP Vision Encoder (ViT-L/14) with LoRA adaptation
	- Loss Function: Mean Squared Error (MSE) Loss for memorability prediction
	- Optimizer: AdamW
	- Learning Rate: 5e-05
	- Batch Size: 32

	## Installation & Requirements

	You can set up the environment using environment.yml or manually install dependencies:
	- python=3.9.15
	- cudatoolkit=11.7
	- torchvision=0.14.0
	- transformers=4.45.2
	- peft=0.14.0

	## Usage

	To use the model for inference:

	```python
	from torchvision import transforms
	import torch
	from PIL import Image
	from huggingface_hub import hf_hub_download
	import importlib.util

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# Load the model class definition dynamically
	class_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_Memorability", filename="modeling.py")
	spec = importlib.util.spec_from_file_location("modeling", class_path)
	modeling = importlib.util.module_from_spec(spec)
	spec.loader.exec_module(modeling)

	# initialize a model
	ModelClass = modeling.clip_lora_model
	model = ModelClass().to(device)

	# Load pretrained model
	model_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_Memorability", filename="perceptCLIP_Memorability.pth")
	model.load_state_dict(torch.load(model_path, map_location=device))
	model.eval()
	# Load an image
	image = Image.open("image_path.jpg").convert("RGB")

	# Preprocess and predict
	def Mem_preprocess():
	transform = transforms.Compose([
	transforms.Resize(224),
	transforms.CenterCrop(size=(224, 224)),
	transforms.ToTensor(),
	transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073),
	std=(0.26862954, 0.26130258, 0.27577711))
	])
	return transform

	image = Mem_preprocess()(image).unsqueeze(0).to(device)

	with torch.no_grad():
	mem_score = model(image).item()

	print(f"Predicted Memorability Score: {mem_score:.4f}")
	```

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@article{zalcher2025don,
	title={Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks},
	author={Zalcher, Amit and Wasserman, Navve and Beliy, Roman and Heinimann, Oliver and Irani, Michal},
	journal={arXiv preprint arXiv:2503.13260},
	year={2025}
	}