Text Classification
Transformers
TensorBoard
Safetensors
roberta
Generated from Trainer
text-embeddings-inference
Instructions to use luluw/Roberta-devangari-script-classification with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use luluw/Roberta-devangari-script-classification with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="luluw/Roberta-devangari-script-classification")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("luluw/Roberta-devangari-script-classification") model = AutoModelForSequenceClassification.from_pretrained("luluw/Roberta-devangari-script-classification") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| language: | |
| - nep | |
| - hi | |
| - sa | |
| - mr | |
| base_model: RoBERTa | |
| tags: | |
| - generated_from_trainer | |
| metrics: | |
| - accuracy | |
| - f1 | |
| - precision | |
| - recall | |
| model-index: | |
| - name: RoBERTa-devangari-script-classification | |
| results: [] | |
| <!-- This model card has been generated automatically according to the information the Trainer had access to. You | |
| should probably proofread and complete it, then remove this comment. --> | |
| # RoBERTa-devangari-script-classification | |
| This model is a fine-tuned version of [RoBERTa](https://huggingface.co/RoBERTa) on the Custom Devangari Datasets dataset. | |
| It achieves the following results on the evaluation set: | |
| - Loss: 0.0329 | |
| - Accuracy: 0.9935 | |
| - F1: 0.9935 | |
| - Precision: 0.9935 | |
| - Recall: 0.9935 | |
| ## Model description | |
| This model is a fine-tuned version of RoBERTa, optimized for multiclass text classification on datasets written in | |
| Devanagari script across multiple languages, including Nepali, Marathi, Sanskrit, Bhojpuri, and Hindi. By leveraging the | |
| robust RoBERTa architecture, this model has been fine-tuned to recognize intricate patterns and contextual | |
| cues within Devanagari text, achieving high accuracy and F1 scores for multiclass classification tasks. | |
| ## Intended uses & limitations | |
| #### Intended Uses: | |
| - Multiclass text classification for Nepali, Marathi, Sanskrit, Bhojpuri, and Hindi, written in Devanagari script. | |
| - Suitable for sentiment analysis, topic categorization, and public opinion monitoring. | |
| #### Limitations: | |
| - Limited to Devanagari script; accuracy may drop on other scripts. | |
| - Fine-tuned for multiclass classification; may not generalize well to other tasks or binary classifications. | |
| - Language-specific nuances not present in the dataset may impact performance on certain dialects. | |
| ## Training procedure | |
| ### Training hyperparameters | |
| The following hyperparameters were used during training: | |
| - learning_rate: 3e-05 | |
| - train_batch_size: 16 | |
| - eval_batch_size: 16 | |
| - seed: 42 | |
| - gradient_accumulation_steps: 2 | |
| - total_train_batch_size: 32 | |
| - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 | |
| - lr_scheduler_type: linear | |
| - lr_scheduler_warmup_steps: 500 | |
| - num_epochs: 3 | |
| - mixed_precision_training: Native AMP | |
| ### Training results | |
| | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall | | |
| |:-------------:|:------:|:----:|:---------------:|:--------:|:------:|:---------:|:------:| | |
| | 0.2337 | 1.0 | 1638 | 0.0603 | 0.9874 | 0.9874 | 0.9875 | 0.9874 | | |
| | 0.0513 | 2.0 | 3277 | 0.0387 | 0.9919 | 0.9919 | 0.9919 | 0.9919 | | |
| | 0.0252 | 3.0 | 4914 | 0.0329 | 0.9935 | 0.9935 | 0.9935 | 0.9935 | | |
| ### Framework versions | |
| - Transformers 4.44.2 | |
| - Pytorch 2.4.1+cu121 | |
| - Datasets 3.0.2 | |
| - Tokenizers 0.19.1 |