t0m-R
/

vit-sem-scale-classifier

Image Classification

vision-transformer

materials-science

Model card Files Files and versions

vit-sem-scale-classifier / README.md

t0m-R

Upload Vit-B/8 SEM scale classification model

a20e54a 6 months ago

|

history blame contribute delete

3.45 kB

	---
	license: apache-2.0
	language: en
	tags:
	- image-classification
	- vision-transformer
	- pytorch
	- sem
	- materials-science
	- nffa-di
	base_model: timm/vit_base_patch8_224.augreg2_in21k_ft_in1k
	pipeline_tag: image-classification
	---

	# Vision Transformer for SEM Image Scale Classification

	This is a fine-tuned Vision Transformer (ViT-B/8) model for classifying the magnification scale of Scanning Electron Microscopy (SEM) images—pico, nano, or micro—directly from pixel data.

	The model addresses the challenge of unreliable scale information in large SEM archives, which is often hindered by proprietary file formats or error-prone Optical Character Recognition (OCR).

	This model was developed as part of the NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure) project, funded by the European Union's NextGenerationEU program.

	## Model Description

	The model is based on the `timm/vit_base_patch8_224.augreg2_in21k_ft_in1k` checkpoint and has been fine-tuned for a 3-class image classification task on SEM images. The three scale categories are:

	1. Pico: Images where the pixel size is in the atomic or sub-nanometer scale (less than 1 nm).
	2. Nano: Images where the pixel size is in the nanometer range (1 nm to 1,000 nm, or 1 µm).
	3. Micro: Images where the pixel size is in the micrometer scale (greater than 1 µm).

	## Model Performance

	The model achieves 91,7% accuracy on a held-out test set. Notably, most misclassifications occur at the transitional nano-micro boundary, which indicates that the model is learning physically meaningful feature representations related to the magnification level.

	## How to Use

	The following Python code shows how to load the model and its processor from the Hub and use it to classify a local SEM image.

	```python
	from transformers import AutoImageProcessor, AutoModelForImageClassification
	from PIL import Image
	import torch

	# Load the model and image processor from the Hub
	model_name = "t0m-R/vit-sem-scale-classifier"
	image_processor = AutoImageProcessor.from_pretrained(model_name)
	model = AutoModelForImageClassification.from_pretrained(model_name)

	# Load and preprocess the image
	image_path = "path/to/your/sem_image.png"
	try:
	image = Image.open(image_path).convert("RGB")

	# Prepare the image for the model
	inputs = image_processor(images=image, return_tensors="pt")

	# Run inference
	with torch.no_grad():
	logits = model(**inputs).logits
	predicted_label_id = logits.argmax(-1).item()
	predicted_label = model.config.id2label[predicted_label_id]

	print(f"Predicted Scale: {predicted_label}")

	except FileNotFoundError:
	print(f"Error: The file at {image_path} was not found.")
	```
	## Training Data

	This model was fine-tuned on a custom dataset of 17,700 Scanning Electron Microscopy (SEM) images, curated specifically for this project.
	The images were selected to create a balanced dataset for the task of scale classification. This set contains an equal one-third split of images corresponding to the pico, nano, and micro scales (5,900 images per class).

	The 17,700 images were then divided into:

	Training set: 12,000 images

	Validation set: 3,000 images

	Test set: 2,700 images

	Note on Availability: This dataset is not publicly available at the moment but is planned for publication at a later stage. Please check this model card for future updates on data access.