mazesmazes commited on
Commit
648911e
·
verified ·
1 Parent(s): ad2c339

Model save

Browse files
Files changed (1) hide show
  1. README.md +89 -71
README.md CHANGED
@@ -14,7 +14,7 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  This model is a fine-tuned version of [](https://huggingface.co/) on the None dataset.
16
  It achieves the following results on the evaluation set:
17
- - Loss: 0.2044
18
 
19
  ## Model description
20
 
@@ -33,87 +33,105 @@ More information needed
33
  ### Training hyperparameters
34
 
35
  The following hyperparameters were used during training:
36
- - learning_rate: 0.001
37
  - train_batch_size: 32
38
  - eval_batch_size: 32
39
- - seed: 42
40
  - gradient_accumulation_steps: 2
41
  - total_train_batch_size: 64
42
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
43
- - lr_scheduler_type: cosine
44
- - lr_scheduler_warmup_steps: 2000
45
  - num_epochs: 1
46
 
47
  ### Training results
48
 
49
  | Training Loss | Epoch | Step | Validation Loss |
50
  |:-------------:|:------:|:-----:|:---------------:|
51
- | 0.9934 | 0.0153 | 1000 | 0.3840 |
52
- | 0.9974 | 0.0306 | 2000 | 0.4156 |
53
- | 1.0350 | 0.0459 | 3000 | 0.3944 |
54
- | 0.9922 | 0.0612 | 4000 | 0.3625 |
55
- | 1.0129 | 0.0765 | 5000 | 0.3386 |
56
- | 0.8650 | 0.0918 | 6000 | 0.3348 |
57
- | 0.9696 | 0.1071 | 7000 | 0.3241 |
58
- | 0.9879 | 0.1224 | 8000 | 0.3174 |
59
- | 0.9225 | 0.1377 | 9000 | 0.3154 |
60
- | 0.8560 | 0.1530 | 10000 | 0.3139 |
61
- | 0.8554 | 0.1683 | 11000 | 0.3062 |
62
- | 0.9126 | 0.1836 | 12000 | 0.3000 |
63
- | 0.9142 | 0.1989 | 13000 | 0.2994 |
64
- | 0.8358 | 0.2142 | 14000 | 0.2943 |
65
- | 0.8452 | 0.2295 | 15000 | 0.2916 |
66
- | 0.8372 | 0.2449 | 16000 | 0.2822 |
67
- | 0.8776 | 0.2602 | 17000 | 0.2783 |
68
- | 0.8697 | 0.2755 | 18000 | 0.2809 |
69
- | 0.8541 | 0.2908 | 19000 | 0.2765 |
70
- | 0.8511 | 0.3061 | 20000 | 0.2728 |
71
- | 0.8440 | 0.3214 | 21000 | 0.2739 |
72
- | 0.7897 | 0.3367 | 22000 | 0.2648 |
73
- | 0.8196 | 0.3520 | 23000 | 0.2608 |
74
- | 0.8320 | 0.3673 | 24000 | 0.2614 |
75
- | 0.8043 | 0.3826 | 25000 | 0.2636 |
76
- | 0.7875 | 0.3979 | 26000 | 0.2551 |
77
- | 0.8257 | 0.4132 | 27000 | 0.2501 |
78
- | 0.7276 | 0.4285 | 28000 | 0.2519 |
79
- | 0.8196 | 0.4438 | 29000 | 0.2482 |
80
- | 0.7727 | 0.4591 | 30000 | 0.2497 |
81
- | 0.8316 | 0.4744 | 31000 | 0.2467 |
82
- | 0.7738 | 0.4897 | 32000 | 0.2404 |
83
- | 0.8146 | 0.5050 | 33000 | 0.2410 |
84
- | 0.7571 | 0.5203 | 34000 | 0.2370 |
85
- | 0.7921 | 0.5356 | 35000 | 0.2344 |
86
- | 0.7792 | 0.5509 | 36000 | 0.2319 |
87
- | 0.7014 | 0.5662 | 37000 | 0.2322 |
88
- | 0.7425 | 0.5815 | 38000 | 0.2281 |
89
- | 0.7644 | 0.5968 | 39000 | 0.2265 |
90
- | 0.7048 | 0.6121 | 40000 | 0.2251 |
91
- | 0.6970 | 0.6274 | 41000 | 0.2229 |
92
- | 0.7856 | 0.6427 | 42000 | 0.2214 |
93
- | 0.7114 | 0.6580 | 43000 | 0.2194 |
94
- | 0.7751 | 0.6733 | 44000 | 0.2183 |
95
- | 0.6482 | 0.6886 | 45000 | 0.2169 |
96
- | 0.6889 | 0.7040 | 46000 | 0.2154 |
97
- | 0.7554 | 0.7193 | 47000 | 0.2147 |
98
- | 0.7050 | 0.7346 | 48000 | 0.2124 |
99
- | 0.7927 | 0.7499 | 49000 | 0.2118 |
100
- | 0.7309 | 0.7652 | 50000 | 0.2108 |
101
- | 0.7264 | 0.7805 | 51000 | 0.2108 |
102
- | 0.7256 | 0.7958 | 52000 | 0.2087 |
103
- | 0.7605 | 0.8111 | 53000 | 0.2078 |
104
- | 0.7391 | 0.8264 | 54000 | 0.2082 |
105
- | 0.6781 | 0.8417 | 55000 | 0.2065 |
106
- | 0.7206 | 0.8570 | 56000 | 0.2060 |
107
- | 0.7342 | 0.8723 | 57000 | 0.2051 |
108
- | 0.7519 | 0.8876 | 58000 | 0.2055 |
109
- | 0.7258 | 0.9029 | 59000 | 0.2051 |
110
- | 0.7932 | 0.9182 | 60000 | 0.2047 |
111
- | 0.7391 | 0.9335 | 61000 | 0.2047 |
112
- | 0.7416 | 0.9488 | 62000 | 0.2046 |
113
- | 0.7249 | 0.9641 | 63000 | 0.2045 |
114
- | 0.7000 | 0.9794 | 64000 | 0.2044 |
115
- | 0.6958 | 0.9947 | 65000 | 0.2044 |
116
- | 0.6692 | 1.0 | 65346 | 0.2044 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
 
119
  ### Framework versions
 
14
 
15
  This model is a fine-tuned version of [](https://huggingface.co/) on the None dataset.
16
  It achieves the following results on the evaluation set:
17
+ - Loss: 0.1981
18
 
19
  ## Model description
20
 
 
33
  ### Training hyperparameters
34
 
35
  The following hyperparameters were used during training:
36
+ - learning_rate: 0.0001
37
  - train_batch_size: 32
38
  - eval_batch_size: 32
39
+ - seed: 43
40
  - gradient_accumulation_steps: 2
41
  - total_train_batch_size: 64
42
  - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
43
+ - lr_scheduler_type: constant_with_warmup
44
+ - lr_scheduler_warmup_steps: 500
45
  - num_epochs: 1
46
 
47
  ### Training results
48
 
49
  | Training Loss | Epoch | Step | Validation Loss |
50
  |:-------------:|:------:|:-----:|:---------------:|
51
+ | 0.6455 | 0.0119 | 1000 | 0.2053 |
52
+ | 0.6832 | 0.0238 | 2000 | 0.2058 |
53
+ | 0.6383 | 0.0357 | 3000 | 0.2058 |
54
+ | 0.6507 | 0.0476 | 4000 | 0.2069 |
55
+ | 0.6877 | 0.0596 | 5000 | 0.2060 |
56
+ | 0.6479 | 0.0715 | 6000 | 0.2054 |
57
+ | 0.7227 | 0.0834 | 7000 | 0.2056 |
58
+ | 0.7055 | 0.0953 | 8000 | 0.2057 |
59
+ | 0.6465 | 0.1072 | 9000 | 0.2052 |
60
+ | 0.7416 | 0.1191 | 10000 | 0.2046 |
61
+ | 0.7090 | 0.1310 | 11000 | 0.2048 |
62
+ | 0.6912 | 0.1429 | 12000 | 0.2060 |
63
+ | 0.5886 | 0.1549 | 13000 | 0.2056 |
64
+ | 0.7237 | 0.1668 | 14000 | 0.2045 |
65
+ | 0.6725 | 0.1787 | 15000 | 0.2046 |
66
+ | 0.6518 | 0.1906 | 16000 | 0.2038 |
67
+ | 0.6546 | 0.2025 | 17000 | 0.2042 |
68
+ | 0.6793 | 0.2144 | 18000 | 0.2032 |
69
+ | 0.6697 | 0.2263 | 19000 | 0.2035 |
70
+ | 0.7108 | 0.2382 | 20000 | 0.2042 |
71
+ | 0.7447 | 0.2502 | 21000 | 0.2038 |
72
+ | 0.6575 | 0.2621 | 22000 | 0.2039 |
73
+ | 0.7154 | 0.2740 | 23000 | 0.2034 |
74
+ | 0.6833 | 0.2859 | 24000 | 0.2024 |
75
+ | 0.6613 | 0.2978 | 25000 | 0.2028 |
76
+ | 0.6906 | 0.3097 | 26000 | 0.2025 |
77
+ | 0.6843 | 0.3216 | 27000 | 0.2027 |
78
+ | 0.6966 | 0.3335 | 28000 | 0.2023 |
79
+ | 0.6801 | 0.3454 | 29000 | 0.2027 |
80
+ | 0.7171 | 0.3574 | 30000 | 0.2027 |
81
+ | 0.7029 | 0.3693 | 31000 | 0.2017 |
82
+ | 0.6876 | 0.3812 | 32000 | 0.2019 |
83
+ | 0.6646 | 0.3931 | 33000 | 0.2022 |
84
+ | 0.6834 | 0.4050 | 34000 | 0.2022 |
85
+ | 0.6868 | 0.4169 | 35000 | 0.2014 |
86
+ | 0.6831 | 0.4288 | 36000 | 0.2019 |
87
+ | 0.6309 | 0.4407 | 37000 | 0.2009 |
88
+ | 0.6603 | 0.4527 | 38000 | 0.2007 |
89
+ | 0.6818 | 0.4646 | 39000 | 0.2006 |
90
+ | 0.6539 | 0.4765 | 40000 | 0.2001 |
91
+ | 0.6999 | 0.4884 | 41000 | 0.2001 |
92
+ | 0.6870 | 0.5003 | 42000 | 0.1997 |
93
+ | 0.5977 | 0.5122 | 43000 | 0.2000 |
94
+ | 0.6747 | 0.5241 | 44000 | 0.2002 |
95
+ | 0.6695 | 0.5360 | 45000 | 0.2005 |
96
+ | 0.6763 | 0.5479 | 46000 | 0.1992 |
97
+ | 0.6656 | 0.5599 | 47000 | 0.2006 |
98
+ | 0.6674 | 0.5718 | 48000 | 0.2000 |
99
+ | 0.7177 | 0.5837 | 49000 | 0.1995 |
100
+ | 0.6904 | 0.5956 | 50000 | 0.1999 |
101
+ | 0.6421 | 0.6075 | 51000 | 0.2003 |
102
+ | 0.6555 | 0.6194 | 52000 | 0.2004 |
103
+ | 0.7010 | 0.6313 | 53000 | 0.2003 |
104
+ | 0.6520 | 0.6432 | 54000 | 0.1993 |
105
+ | 0.6284 | 0.6552 | 55000 | 0.1999 |
106
+ | 0.6770 | 0.6671 | 56000 | 0.1994 |
107
+ | 0.7453 | 0.6790 | 57000 | 0.1993 |
108
+ | 0.6441 | 0.6909 | 58000 | 0.1978 |
109
+ | 0.6670 | 0.7028 | 59000 | 0.1980 |
110
+ | 0.6380 | 0.7147 | 60000 | 0.1979 |
111
+ | 0.7013 | 0.7266 | 61000 | 0.1984 |
112
+ | 0.6442 | 0.7385 | 62000 | 0.1988 |
113
+ | 0.6750 | 0.7505 | 63000 | 0.1981 |
114
+ | 0.6776 | 0.7624 | 64000 | 0.1985 |
115
+ | 0.6316 | 0.7743 | 65000 | 0.1992 |
116
+ | 0.6929 | 0.7862 | 66000 | 0.1988 |
117
+ | 0.6887 | 0.7981 | 67000 | 0.1982 |
118
+ | 0.6502 | 0.8100 | 68000 | 0.1975 |
119
+ | 0.7152 | 0.8219 | 69000 | 0.1983 |
120
+ | 0.6906 | 0.8338 | 70000 | 0.1985 |
121
+ | 0.6128 | 0.8457 | 71000 | 0.1978 |
122
+ | 0.5966 | 0.8577 | 72000 | 0.1973 |
123
+ | 0.6726 | 0.8696 | 73000 | 0.1983 |
124
+ | 0.6668 | 0.8815 | 74000 | 0.1984 |
125
+ | 0.6337 | 0.8934 | 75000 | 0.1982 |
126
+ | 0.6272 | 0.9053 | 76000 | 0.1973 |
127
+ | 0.7112 | 0.9172 | 77000 | 0.1978 |
128
+ | 0.5871 | 0.9291 | 78000 | 0.1989 |
129
+ | 0.6428 | 0.9410 | 79000 | 0.1972 |
130
+ | 0.6740 | 0.9530 | 80000 | 0.1966 |
131
+ | 0.6933 | 0.9649 | 81000 | 0.1976 |
132
+ | 0.6668 | 0.9768 | 82000 | 0.1975 |
133
+ | 0.5919 | 0.9887 | 83000 | 0.1977 |
134
+ | 0.7215 | 1.0 | 83950 | 0.1981 |
135
 
136
 
137
  ### Framework versions