| | --- |
| | license: apache-2.0 |
| | language: en |
| | tags: |
| | - image-classification |
| | - vision-transformer |
| | - pytorch |
| | - sem |
| | - materials-science |
| | - nffa-di |
| | base_model: timm/vit_base_patch8_224.augreg2_in21k_ft_in1k |
| | pipeline_tag: image-classification |
| | --- |
| | |
| | # Vision Transformer for SEM Image Scale Classification |
| |
|
| | This is a fine-tuned **Vision Transformer (ViT-B/8)** model for classifying the magnification scale of Scanning Electron Microscopy (SEM) images—**pico, nano, or micro**—directly from pixel data. |
| |
|
| | The model addresses the challenge of unreliable scale information in large SEM archives, which is often hindered by proprietary file formats or error-prone Optical Character Recognition (OCR). |
| |
|
| | This model was developed as part of the **NFFA-DI (Nano Foundries and Fine Analysis Digital Infrastructure)** project, funded by the European Union's NextGenerationEU program. |
| |
|
| | ## Model Description |
| |
|
| | The model is based on the `timm/vit_base_patch8_224.augreg2_in21k_ft_in1k` checkpoint and has been fine-tuned for a 3-class image classification task on SEM images. The three scale categories are: |
| |
|
| | 1. **Pico**: Images where the pixel size is in the atomic or sub-nanometer scale (less than 1 nm). |
| | 2. **Nano**: Images where the pixel size is in the nanometer range (1 nm to 1,000 nm, or 1 µm). |
| | 3. **Micro**: Images where the pixel size is in the micrometer scale (greater than 1 µm). |
| |
|
| | ## Model Performance |
| |
|
| | The model achieves **91,7% accuracy** on a held-out test set. Notably, most misclassifications occur at the transitional nano-micro boundary, which indicates that the model is learning physically meaningful feature representations related to the magnification level. |
| |
|
| | ## How to Use |
| |
|
| | The following Python code shows how to load the model and its processor from the Hub and use it to classify a local SEM image. |
| |
|
| | ```python |
| | from transformers import AutoImageProcessor, AutoModelForImageClassification |
| | from PIL import Image |
| | import torch |
| | |
| | # Load the model and image processor from the Hub |
| | model_name = "t0m-R/vit-sem-scale-classifier" |
| | image_processor = AutoImageProcessor.from_pretrained(model_name) |
| | model = AutoModelForImageClassification.from_pretrained(model_name) |
| | |
| | # Load and preprocess the image |
| | image_path = "path/to/your/sem_image.png" |
| | try: |
| | image = Image.open(image_path).convert("RGB") |
| | |
| | # Prepare the image for the model |
| | inputs = image_processor(images=image, return_tensors="pt") |
| | |
| | # Run inference |
| | with torch.no_grad(): |
| | logits = model(**inputs).logits |
| | predicted_label_id = logits.argmax(-1).item() |
| | predicted_label = model.config.id2label[predicted_label_id] |
| | |
| | print(f"Predicted Scale: {predicted_label}") |
| | |
| | except FileNotFoundError: |
| | print(f"Error: The file at {image_path} was not found.") |
| | ``` |
| | ## Training Data |
| |
|
| | This model was fine-tuned on a custom dataset of 17,700 Scanning Electron Microscopy (SEM) images, curated specifically for this project. |
| | The images were selected to create a balanced dataset for the task of scale classification. This set contains an equal one-third split of images corresponding to the pico, nano, and micro scales (5,900 images per class). |
| |
|
| | The 17,700 images were then divided into: |
| |
|
| | Training set: 12,000 images |
| | |
| | Validation set: 3,000 images |
| | |
| | Test set: 2,700 images |
| | |
| | **Note on Availability**: This dataset is not publicly available at the moment but is planned for publication at a later stage. Please check this model card for future updates on data access. |