Image Classification
Transformers
Tibetan
dinov3
tibetan
script-classification
paleography
fine-tuned
document-analysis
Eval Results (legacy)
Instructions to use openpecha/tibetan-script-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openpecha/tibetan-script-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="openpecha/tibetan-script-classifier") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openpecha/tibetan-script-classifier", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - bo | |
| license: apache-2.0 | |
| library_name: transformers | |
| tags: | |
| - image-classification | |
| - dinov3 | |
| - tibetan | |
| - script-classification | |
| - paleography | |
| - fine-tuned | |
| - document-analysis | |
| base_model: facebook/dinov3-vits16-pretrain-lvd1689m | |
| datasets: | |
| - openpecha/tibetan-script-images | |
| metrics: | |
| - f1 | |
| - accuracy | |
| pipeline_tag: image-classification | |
| model-index: | |
| - name: Tibetan Script Classifier (DINOv3 ViT-S) | |
| results: | |
| - task: | |
| type: image-classification | |
| name: Tibetan Script Classification | |
| metrics: | |
| - name: Macro F1 (whole page) | |
| type: f1 | |
| value: 0.512 | |
| - name: Accuracy (whole page) | |
| type: accuracy | |
| value: 0.571 | |
| - name: Macro F1 (CLAHE patches, page-level) | |
| type: f1 | |
| value: 0.529 | |
| # Tibetan Script Classifier (DINOv3) | |
| This repository contains fine-tuned checkpoints for identifying 18 distinct categories of Tibetan manuscript scripts. This research was conducted to develop automated paleographic identification tools for historical archives. | |
| ## Project Information | |
| - **Project Name:** The BDRC Etext Corpus | |
| - **Developed by:** Dharmaduta | |
| - **Specifications provided by:** [Buddhist Digital Resource Center (BDRC)](https://www.bdrc.io) | |
| - **Funded by:** Khyentse Foundation | |
| - **Core Model:** DINOv3 ViT-S/16 (`facebook/dinov3-vits16-pretrain-lvd1689m`) | |
| ## Evaluation Results | |
| | Experiment | Evaluation Level | Macro F1 | Accuracy | | |
| | :--- | :--- | :---: | :---: | | |
| | **whole_page** | Image-level | 0.512 | 57.11% | | |
| | **patches_clahe** | Page-level (Aggregated) | **0.529** | 52.61% | | |
| | **patches_color** | Page-level (Aggregated) | 0.504 | 50.17% | | |
| *Note: The **whole_page** model is recommended for general use due to its balanced performance and simpler inference pipeline.* | |
| ## Label Set (18 Classes) | |
| The model is trained to recognize the following scripts: | |
| `dhumri`, `difficult`, `drathung`, `drudring`, `druring`, `druthung`, `khyuyig`, `multi_scripts`, `non_tibetan`, `peri`, `petsuk`, `trinyig`, `tsegdrig`, `tsugchung`, `tsumachug`, `uchen_sugdring`, `uchen_sugthung`, `yigchung`. | |
| ## Preprocessing Variants | |
| - **whole_page**: Short-edge resize to 224px followed by a 224×224 center crop. | |
| - **patches_color**: Sliding-window 224×224 patches with 25% overlap. | |
| - **patches_clahe**: Same patch layout as above, but with **Contrast Limited Adaptive Histogram Equalization (CLAHE)** applied to grayscale inputs to enhance script visibility. | |
| ## Training Recipe | |
| Training was executed via a 3-stage progressive unfreezing strategy: | |
| 1. **Stage A (Head Only):** 20 epochs, backbone frozen (LR: 1e-3). | |
| 2. **Stage B (Partial):** 10 epochs, unfreezing the last 2 Transformer blocks (Backbone LR: 1e-5). | |
| 3. **Stage C (Full):** 10 epochs, unfreezing the last 4 Transformer blocks (Backbone LR: 5e-6). | |
| Class-weighted cross-entropy loss was utilized to mitigate high dataset imbalance across script types. | |
| ## How to Use | |
| ### Loading the Model | |
| ```python | |
| import torch | |
| from finetune_dinov3 import DINOv3Classifier | |
| # Load Stage B Whole Page Checkpoint | |
| payload = torch.load("whole_page/final_model.pt", map_location="cpu") | |
| model = DINOv3Classifier("facebook/dinov3-vits16-pretrain-lvd1689m", num_classes=18) | |
| model.load_state_dict(payload["model_state_dict"]) | |
| model.eval() |