Update README.md

b438507 over 2 years ago

7.12 kB

	---
	base_model: readerbench/RoBERT-base
	language:
	- ro
	tags:
	- sentiment
	- classification
	- romanian
	- nlp
	- bert
	datasets:
	- decathlon_reviews
	- cinemagia_reviews
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	- f1 weighted
	model-index:
	- name: ro-sentiment
	results:
	- task:
	type: text-classification # Required. Example: automatic-speech-recognition
	name: Text Classification # Optional. Example: Speech Recognition
	dataset:
	type: ro_sent # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
	name: Rommanian Sentiment Dataset # Required. A pretty name for the dataset. Example: Common Voice (French)
	config: default # Optional. The name of the dataset configuration used in `load_dataset()`. Example: fr in `load_dataset("common_voice", "fr")`. See the `datasets` docs for more info: https://huggingface.co/docs/datasets/package_reference/loading_methods#datasets.load_dataset.name
	split: all # Optional. Example: test
	metrics:
	- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.85 # Required. Example: 20.90
	name: Accuracy # Optional. Example: Test WER
	- type: precision # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.85 # Required. Example: 20.90
	name: Precision # Optional. Example: Test WER
	- type: recall # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.85 # Required. Example: 20.90
	name: Recall # Optional. Example: Test WER
	- type: f1_weighted # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.85 # Required. Example: 20.90
	name: Weighted F1 # Optional. Example: Test WER
	- type: f1_macro # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.84 # Required. Example: 20.90
	name: Macro F1 # Optional. Example: Test WER
	- task:
	type: text-classification # Required. Example: automatic-speech-recognition
	name: Text Classification # Optional. Example: Speech Recognition
	dataset:
	type: laroseda # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
	name: A Large Romanian Sentiment Data Set # Required. A pretty name for the dataset. Example: Common Voice (French)
	config: default # Optional. The name of the dataset configuration used in `load_dataset()`. Example: fr in `load_dataset("common_voice", "fr")`. See the `datasets` docs for more info: https://huggingface.co/docs/datasets/package_reference/loading_methods#datasets.load_dataset.name
	split: all # Optional. Example: test
	metrics:
	- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.85 # Required. Example: 20.90
	name: Accuracy # Optional. Example: Test WER
	- type: precision # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.86 # Required. Example: 20.90
	name: Precision # Optional. Example: Test WER
	- type: recall # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.85 # Required. Example: 20.90
	name: Recall # Optional. Example: Test WER
	- type: f1_weighted # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.84 # Required. Example: 20.90
	name: Weighted F1 # Optional. Example: Test WER
	- type: f1_macro # Required. Example: wer. Use metric id from https://hf.co/metrics
	value: 0.84 # Required. Example: 20.90
	name: Macro F1 # Optional. Example: Test WER

	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# RO-Sentiment

	This model is a fine-tuned version of [readerbench/RoBERT-base](https://huggingface.co/readerbench/RoBERT-base) on the Decathlon reviews and Cinemagia reviews dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.3923
	- Accuracy: 0.8307
	- Precision: 0.8366
	- Recall: 0.8959
	- F1: 0.8652
	- F1 Weighted: 0.8287

	Output labels:
	- LABEL_0 = Negative Sentiment
	- LABEL_1 = Positive Sentiment

	### Evaluation on other datasets

	SENT_RO


	\| \|precision \| recall \| f1-score \| support \|
	\|:-------------:\|:-----:\|:----:\|:------:\|:--------:\|
	\| Negative (0) \| 0.79 \| 0.83 \| 0.81 \| 11,675 \|
	\| Positive (1) \| 0.88 \| 0.85 \| 0.87 \| 17,271 \|
	\| \| \| \| \| \|
	\| Accuracy \| \| \| 0.85 \| 28,946 \|
	\| Macro Avg \| 0.84 \| 0.84 \| 0.84 \| 28,946 \|
	\| Weighted Avg \| 0.85 \| 0.85 \| 0.85 \| 28,946 \|

	LaRoSeDa


	\| \|precision \| recall \| f1-score \| support \|
	\|:-------------:\|:-----:\|:----:\|:------:\|:--------:\|
	\| Negative (0) \| 0.79 \| 0.94 \| 0.86 \| 7,500 \|
	\| Positive (1) \| 0.93 \| 0.75 \| 0.83 \| 7,500 \|
	\| \| \| \| \| \|
	\| Accuracy \| \| \| 0.85 \| 15,000 \|
	\| Macro Avg \| 0.86 \| 0.85 \| 0.84 \| 15,000 \|
	\| Weighted Avg \| 0.86 \| 0.85 \| 0.84 \| 15,000 \|


	## Model description

	Finetuned Romanian BERT model for sentiment classification.

	Trained on a mix of product reviews from Decathlon retailer website and movie reviews from cinemagia.



	## Intended uses & limitations

	Sentiment classification for Romanian Language.

	Biased towards Product reviews.

	There is no "neutral" sentiment label.

	## Training and evaluation data

	Trained on:
	- Decathlon Dataset available on request

	- Cinemagia Movie reviews public on kaggle [Link](https://www.kaggle.com/datasets/gringoandy/romanian-sentiment-movie-reviews)

	Evaluated on

	- Holdout data from training dataset
	- RO_SENT Dataset
	- LaROSeDa Dataset


	## Training procedure


	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 6e-05
	- train_batch_size: 64
	- eval_batch_size: 128
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.2
	- num_epochs: 10 (Early stop epoch 3, best epoch 2)

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| Precision \| Recall \| F1 \| F1 Weighted \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|:---------:\|:------:\|:------:\|:-----------:\|
	\| 0.4198 \| 1.0 \| 1629 \| 0.3983 \| 0.8377 \| 0.8791 \| 0.8721 \| 0.8756 \| 0.8380 \|
	\| 0.3861 \| 2.0 \| 3258 \| 0.4312 \| 0.8429 \| 0.8963 \| 0.8665 \| 0.8812 \| 0.8442 \|
	\| 0.3189 \| 3.0 \| 4887 \| 0.3923 \| 0.8307 \| 0.8366 \| 0.8959 \| 0.8652 \| 0.8287 \|


	### Framework versions

	- Transformers 4.31.0
	- Pytorch 2.0.1+cu118
	- Datasets 2.14.3
	- Tokenizers 0.13.3