BluebrainAI
/

parallel-gpt2-medium-wikitext

Feature Extraction

Generated from Trainer

Model card Files Files and versions

parallel-gpt2-medium-wikitext / README.md

shivanandmn's picture

Model save

1eca900 verified 12 months ago

|

history blame contribute delete

2.91 kB

	---
	library_name: transformers
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	- bleu
	model-index:
	- name: parallel-gpt2-medium-wikitext
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# parallel-gpt2-medium-wikitext

	This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.2350
	- Accuracy: 0.4161
	- Perplexity: 25.4075
	- Bleu: 0.1473

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 64
	- eval_batch_size: 64
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| Perplexity \| Bleu \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|:----------:\|:------:\|
	\| 6.077 \| 0.2806 \| 500 \| 5.9554 \| 0.1870 \| 385.8189 \| 0.0352 \|
	\| 5.1123 \| 0.5612 \| 1000 \| 4.9836 \| 0.2568 \| 145.9931 \| 0.0625 \|
	\| 4.4123 \| 0.8418 \| 1500 \| 4.3035 \| 0.3159 \| 73.9588 \| 0.0843 \|
	\| 4.0245 \| 1.1223 \| 2000 \| 3.9678 \| 0.3470 \| 52.8693 \| 0.1076 \|
	\| 3.8298 \| 1.4029 \| 2500 \| 3.7842 \| 0.3630 \| 44.0014 \| 0.1166 \|
	\| 3.7181 \| 1.6835 \| 3000 \| 3.6620 \| 0.3733 \| 38.9404 \| 0.1272 \|
	\| 3.6123 \| 1.9641 \| 3500 \| 3.5694 \| 0.3818 \| 35.4958 \| 0.1311 \|
	\| 3.4993 \| 2.2447 \| 4000 \| 3.5029 \| 0.3877 \| 33.2118 \| 0.1384 \|
	\| 3.4358 \| 2.5253 \| 4500 \| 3.4484 \| 0.3930 \| 31.4506 \| 0.1358 \|
	\| 3.4039 \| 2.8058 \| 5000 \| 3.3989 \| 0.3979 \| 29.9323 \| 0.1403 \|
	\| 3.2908 \| 3.0864 \| 5500 \| 3.3633 \| 0.4018 \| 28.8837 \| 0.1409 \|
	\| 3.2828 \| 3.3670 \| 6000 \| 3.3326 \| 0.4051 \| 28.0103 \| 0.1446 \|
	\| 3.2606 \| 3.6476 \| 6500 \| 3.3031 \| 0.4081 \| 27.1958 \| 0.1457 \|
	\| 3.234 \| 3.9282 \| 7000 \| 3.2796 \| 0.4106 \| 26.5655 \| 0.1433 \|
	\| 3.1713 \| 4.2088 \| 7500 \| 3.2621 \| 0.4126 \| 26.1045 \| 0.1461 \|
	\| 3.1314 \| 4.4893 \| 8000 \| 3.2476 \| 0.4145 \| 25.7281 \| 0.1455 \|
	\| 3.1412 \| 4.7699 \| 8500 \| 3.2350 \| 0.4161 \| 25.4075 \| 0.1473 \|


	### Framework versions

	- Transformers 4.49.0
	- Pytorch 2.6.0+cu124
	- Datasets 3.3.2
	- Tokenizers 0.21.0