File size: 16,319 Bytes
8add3b4
 
 
 
 
 
06f0066
8add3b4
 
 
 
 
 
 
 
 
 
 
06f0066
8add3b4
06f0066
8add3b4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
library_name: peft
license: gemma
base_model: google/gemma-3-1b-it
tags:
- llama-factory
- prompt-tuning
- generated_from_trainer
model-index:
- name: train_sst2_1744902618
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# train_sst2_1744902618

This model is a fine-tuned version of [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it) on the sst2 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0835
- Num Input Tokens Seen: 36181120

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.3
- train_batch_size: 4
- eval_batch_size: 4
- seed: 123
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- training_steps: 40000

### Training results

| Training Loss | Epoch   | Step  | Validation Loss | Input Tokens Seen |
|:-------------:|:-------:|:-----:|:---------------:|:-----------------:|
| 0.2762        | 0.0528  | 200   | 0.2764          | 180224            |
| 0.2356        | 0.1056  | 400   | 0.2239          | 361024            |
| 0.224         | 0.1584  | 600   | 0.2098          | 541408            |
| 0.202         | 0.2112  | 800   | 0.1897          | 722496            |
| 0.149         | 0.2640  | 1000  | 0.1768          | 903200            |
| 0.1626        | 0.3167  | 1200  | 0.1708          | 1084928           |
| 0.1431        | 0.3695  | 1400  | 0.1564          | 1265312           |
| 0.1372        | 0.4223  | 1600  | 0.2022          | 1447200           |
| 0.1373        | 0.4751  | 1800  | 0.1439          | 1628352           |
| 0.1464        | 0.5279  | 2000  | 0.2129          | 1809312           |
| 0.1684        | 0.5807  | 2200  | 0.1422          | 1992416           |
| 0.1122        | 0.6335  | 2400  | 0.1412          | 2171744           |
| 0.1436        | 0.6863  | 2600  | 0.1599          | 2352352           |
| 0.1558        | 0.7391  | 2800  | 0.1569          | 2532128           |
| 0.1363        | 0.7919  | 3000  | 0.1301          | 2713888           |
| 0.0991        | 0.8447  | 3200  | 0.1553          | 2894304           |
| 0.1103        | 0.8975  | 3400  | 0.1312          | 3076640           |
| 0.1415        | 0.9502  | 3600  | 0.1360          | 3257632           |
| 0.107         | 1.0029  | 3800  | 0.1303          | 3436976           |
| 0.1665        | 1.0557  | 4000  | 0.1465          | 3618672           |
| 0.178         | 1.1085  | 4200  | 0.1431          | 3800592           |
| 0.1298        | 1.1613  | 4400  | 0.1502          | 3980592           |
| 0.1262        | 1.2141  | 4600  | 0.1594          | 4161936           |
| 0.1062        | 1.2669  | 4800  | 0.1274          | 4343440           |
| 0.1005        | 1.3197  | 5000  | 0.1231          | 4526576           |
| 0.1116        | 1.3724  | 5200  | 0.1335          | 4707952           |
| 0.1202        | 1.4252  | 5400  | 0.1231          | 4887824           |
| 0.1279        | 1.4780  | 5600  | 0.1259          | 5068368           |
| 0.1274        | 1.5308  | 5800  | 0.1187          | 5250800           |
| 0.1038        | 1.5836  | 6000  | 0.1196          | 5431440           |
| 0.1364        | 1.6364  | 6200  | 0.1236          | 5611184           |
| 0.132         | 1.6892  | 6400  | 0.1157          | 5792144           |
| 0.1404        | 1.7420  | 6600  | 0.1149          | 5974000           |
| 0.1209        | 1.7948  | 6800  | 0.1158          | 6154320           |
| 0.1249        | 1.8476  | 7000  | 0.1182          | 6334608           |
| 0.0722        | 1.9004  | 7200  | 0.1163          | 6515152           |
| 0.1209        | 1.9531  | 7400  | 0.1158          | 6695472           |
| 0.1409        | 2.0058  | 7600  | 0.1144          | 6875184           |
| 0.1443        | 2.0586  | 7800  | 0.1148          | 7057584           |
| 0.1163        | 2.1114  | 8000  | 0.1174          | 7236880           |
| 0.0961        | 2.1642  | 8200  | 0.1296          | 7418160           |
| 0.1211        | 2.2170  | 8400  | 0.1127          | 7598544           |
| 0.0878        | 2.2698  | 8600  | 0.1183          | 7777744           |
| 0.1417        | 2.3226  | 8800  | 0.1131          | 7957552           |
| 0.0817        | 2.3753  | 9000  | 0.1093          | 8138448           |
| 0.0916        | 2.4281  | 9200  | 0.1110          | 8321584           |
| 0.092         | 2.4809  | 9400  | 0.1101          | 8502288           |
| 0.0983        | 2.5337  | 9600  | 0.1103          | 8684240           |
| 0.0718        | 2.5865  | 9800  | 0.1090          | 8866128           |
| 0.1192        | 2.6393  | 10000 | 0.1216          | 9045584           |
| 0.1236        | 2.6921  | 10200 | 0.1096          | 9225744           |
| 0.134         | 2.7449  | 10400 | 0.1126          | 9409616           |
| 0.1187        | 2.7977  | 10600 | 0.1050          | 9590384           |
| 0.0766        | 2.8505  | 10800 | 0.1155          | 9772624           |
| 0.0848        | 2.9033  | 11000 | 0.1032          | 9953712           |
| 0.1019        | 2.9561  | 11200 | 0.1102          | 10132784          |
| 0.115         | 3.0087  | 11400 | 0.1022          | 10312800          |
| 0.1159        | 3.0615  | 11600 | 0.1158          | 10492768          |
| 0.1052        | 3.1143  | 11800 | 0.1048          | 10673088          |
| 0.0716        | 3.1671  | 12000 | 0.1080          | 10854592          |
| 0.0559        | 3.2199  | 12200 | 0.1174          | 11035424          |
| 0.0517        | 3.2727  | 12400 | 0.1050          | 11217728          |
| 0.1134        | 3.3255  | 12600 | 0.1029          | 11400032          |
| 0.1257        | 3.3782  | 12800 | 0.1116          | 11580736          |
| 0.0788        | 3.4310  | 13000 | 0.1078          | 11761344          |
| 0.0954        | 3.4838  | 13200 | 0.1008          | 11940800          |
| 0.0898        | 3.5366  | 13400 | 0.1065          | 12121216          |
| 0.0888        | 3.5894  | 13600 | 0.1123          | 12302080          |
| 0.0668        | 3.6422  | 13800 | 0.1123          | 12482400          |
| 0.1347        | 3.6950  | 14000 | 0.1007          | 12664768          |
| 0.1077        | 3.7478  | 14200 | 0.0982          | 12845696          |
| 0.1079        | 3.8006  | 14400 | 0.0974          | 13026720          |
| 0.0745        | 3.8534  | 14600 | 0.0975          | 13207904          |
| 0.0452        | 3.9062  | 14800 | 0.0968          | 13389408          |
| 0.0793        | 3.9590  | 15000 | 0.0985          | 13569120          |
| 0.0547        | 4.0116  | 15200 | 0.0954          | 13749232          |
| 0.0511        | 4.0644  | 15400 | 0.1017          | 13929232          |
| 0.0941        | 4.1172  | 15600 | 0.0977          | 14111056          |
| 0.0524        | 4.1700  | 15800 | 0.0961          | 14290896          |
| 0.0615        | 4.2228  | 16000 | 0.1098          | 14470384          |
| 0.0992        | 4.2756  | 16200 | 0.0995          | 14650736          |
| 0.1054        | 4.3284  | 16400 | 0.0926          | 14834416          |
| 0.0435        | 4.3812  | 16600 | 0.0995          | 15014800          |
| 0.0581        | 4.4339  | 16800 | 0.0973          | 15194064          |
| 0.0435        | 4.4867  | 17000 | 0.0960          | 15376368          |
| 0.0862        | 4.5395  | 17200 | 0.0944          | 15556464          |
| 0.0601        | 4.5923  | 17400 | 0.0999          | 15738448          |
| 0.0972        | 4.6451  | 17600 | 0.0918          | 15919856          |
| 0.1025        | 4.6979  | 17800 | 0.0972          | 16100016          |
| 0.1033        | 4.7507  | 18000 | 0.0927          | 16282288          |
| 0.0942        | 4.8035  | 18200 | 0.0894          | 16461520          |
| 0.0679        | 4.8563  | 18400 | 0.0895          | 16642640          |
| 0.0635        | 4.9091  | 18600 | 0.0911          | 16825040          |
| 0.0637        | 4.9619  | 18800 | 0.0913          | 17006160          |
| 0.066         | 5.0145  | 19000 | 0.0909          | 17188336          |
| 0.0575        | 5.0673  | 19200 | 0.0934          | 17369104          |
| 0.0478        | 5.1201  | 19400 | 0.0901          | 17549968          |
| 0.0693        | 5.1729  | 19600 | 0.0919          | 17730032          |
| 0.1146        | 5.2257  | 19800 | 0.0902          | 17910128          |
| 0.0553        | 5.2785  | 20000 | 0.0890          | 18091056          |
| 0.0548        | 5.3313  | 20200 | 0.0906          | 18271440          |
| 0.1047        | 5.3841  | 20400 | 0.0868          | 18450832          |
| 0.0931        | 5.4368  | 20600 | 0.0866          | 18632304          |
| 0.0593        | 5.4896  | 20800 | 0.0887          | 18813264          |
| 0.0695        | 5.5424  | 21000 | 0.0907          | 18994928          |
| 0.0715        | 5.5952  | 21200 | 0.0906          | 19174960          |
| 0.0352        | 5.6480  | 21400 | 0.0894          | 19356976          |
| 0.0478        | 5.7008  | 21600 | 0.0897          | 19538672          |
| 0.0496        | 5.7536  | 21800 | 0.0846          | 19719600          |
| 0.0872        | 5.8064  | 22000 | 0.0862          | 19900592          |
| 0.0757        | 5.8592  | 22200 | 0.0880          | 20080976          |
| 0.0351        | 5.9120  | 22400 | 0.0839          | 20262096          |
| 0.0361        | 5.9648  | 22600 | 0.0880          | 20442992          |
| 0.0276        | 6.0174  | 22800 | 0.0912          | 20624000          |
| 0.0642        | 6.0702  | 23000 | 0.0901          | 20805728          |
| 0.0368        | 6.1230  | 23200 | 0.0905          | 20986976          |
| 0.0453        | 6.1758  | 23400 | 0.0891          | 21167808          |
| 0.0535        | 6.2286  | 23600 | 0.0918          | 21349184          |
| 0.0355        | 6.2814  | 23800 | 0.0945          | 21529376          |
| 0.018         | 6.3342  | 24000 | 0.0920          | 21710304          |
| 0.0335        | 6.3870  | 24200 | 0.0906          | 21889920          |
| 0.0718        | 6.4398  | 24400 | 0.0887          | 22070176          |
| 0.0411        | 6.4925  | 24600 | 0.0860          | 22249984          |
| 0.0506        | 6.5453  | 24800 | 0.0908          | 22432352          |
| 0.0504        | 6.5981  | 25000 | 0.0893          | 22612672          |
| 0.0514        | 6.6509  | 25200 | 0.0894          | 22793920          |
| 0.1031        | 6.7037  | 25400 | 0.0864          | 22974976          |
| 0.0644        | 6.7565  | 25600 | 0.0873          | 23155872          |
| 0.0625        | 6.8093  | 25800 | 0.0860          | 23338048          |
| 0.0601        | 6.8621  | 26000 | 0.0853          | 23518912          |
| 0.0691        | 6.9149  | 26200 | 0.0835          | 23700448          |
| 0.0419        | 6.9677  | 26400 | 0.0841          | 23880704          |
| 0.0413        | 7.0203  | 26600 | 0.0900          | 24061744          |
| 0.0474        | 7.0731  | 26800 | 0.0972          | 24240720          |
| 0.0218        | 7.1259  | 27000 | 0.0954          | 24423344          |
| 0.0721        | 7.1787  | 27200 | 0.0926          | 24603344          |
| 0.0302        | 7.2315  | 27400 | 0.0966          | 24784688          |
| 0.0196        | 7.2843  | 27600 | 0.0925          | 24965104          |
| 0.0322        | 7.3371  | 27800 | 0.0931          | 25146416          |
| 0.0148        | 7.3899  | 28000 | 0.0937          | 25327344          |
| 0.0333        | 7.4427  | 28200 | 0.0908          | 25507504          |
| 0.0388        | 7.4954  | 28400 | 0.0940          | 25688464          |
| 0.0489        | 7.5482  | 28600 | 0.0934          | 25870064          |
| 0.0296        | 7.6010  | 28800 | 0.0934          | 26051856          |
| 0.0221        | 7.6538  | 29000 | 0.0914          | 26232080          |
| 0.037         | 7.7066  | 29200 | 0.0880          | 26415344          |
| 0.0199        | 7.7594  | 29400 | 0.0934          | 26597616          |
| 0.0197        | 7.8122  | 29600 | 0.0916          | 26779344          |
| 0.0293        | 7.8650  | 29800 | 0.0936          | 26960208          |
| 0.012         | 7.9178  | 30000 | 0.0933          | 27142320          |
| 0.0393        | 7.9706  | 30200 | 0.0930          | 27322864          |
| 0.0263        | 8.0232  | 30400 | 0.0981          | 27502208          |
| 0.0355        | 8.0760  | 30600 | 0.1016          | 27683072          |
| 0.0093        | 8.1288  | 30800 | 0.1000          | 27864736          |
| 0.0177        | 8.1816  | 31000 | 0.1028          | 28044640          |
| 0.0156        | 8.2344  | 31200 | 0.1045          | 28226016          |
| 0.0373        | 8.2872  | 31400 | 0.1024          | 28406464          |
| 0.0261        | 8.3400  | 31600 | 0.1033          | 28587264          |
| 0.0439        | 8.3928  | 31800 | 0.1067          | 28767840          |
| 0.0105        | 8.4456  | 32000 | 0.1014          | 28948576          |
| 0.0402        | 8.4984  | 32200 | 0.1027          | 29130656          |
| 0.0163        | 8.5511  | 32400 | 0.1047          | 29312288          |
| 0.0351        | 8.6039  | 32600 | 0.1015          | 29492416          |
| 0.0599        | 8.6567  | 32800 | 0.1023          | 29672864          |
| 0.0217        | 8.7095  | 33000 | 0.0995          | 29854336          |
| 0.0126        | 8.7623  | 33200 | 0.1003          | 30036448          |
| 0.0088        | 8.8151  | 33400 | 0.0993          | 30216896          |
| 0.0306        | 8.8679  | 33600 | 0.1010          | 30396960          |
| 0.0189        | 8.9207  | 33800 | 0.1013          | 30576960          |
| 0.0232        | 8.9735  | 34000 | 0.1004          | 30759200          |
| 0.0119        | 9.0261  | 34200 | 0.1030          | 30938880          |
| 0.013         | 9.0789  | 34400 | 0.1044          | 31120480          |
| 0.0235        | 9.1317  | 34600 | 0.1055          | 31301056          |
| 0.0111        | 9.1845  | 34800 | 0.1084          | 31481824          |
| 0.0262        | 9.2373  | 35000 | 0.1098          | 31661536          |
| 0.0061        | 9.2901  | 35200 | 0.1102          | 31842016          |
| 0.0098        | 9.3429  | 35400 | 0.1106          | 32021408          |
| 0.0155        | 9.3957  | 35600 | 0.1111          | 32202368          |
| 0.0077        | 9.4485  | 35800 | 0.1096          | 32381184          |
| 0.008         | 9.5013  | 36000 | 0.1101          | 32562688          |
| 0.0194        | 9.5540  | 36200 | 0.1113          | 32743456          |
| 0.0371        | 9.6068  | 36400 | 0.1116          | 32926592          |
| 0.0185        | 9.6596  | 36600 | 0.1117          | 33105696          |
| 0.0174        | 9.7124  | 36800 | 0.1115          | 33286560          |
| 0.0269        | 9.7652  | 37000 | 0.1112          | 33468000          |
| 0.0089        | 9.8180  | 37200 | 0.1113          | 33650176          |
| 0.0098        | 9.8708  | 37400 | 0.1114          | 33831008          |
| 0.0086        | 9.9236  | 37600 | 0.1113          | 34012992          |
| 0.0102        | 9.9764  | 37800 | 0.1114          | 34195168          |
| 0.0149        | 10.0290 | 38000 | 0.1117          | 34373792          |
| 0.014         | 10.0818 | 38200 | 0.1113          | 34553856          |
| 0.0127        | 10.1346 | 38400 | 0.1121          | 34734976          |
| 0.0143        | 10.1874 | 38600 | 0.1117          | 34915936          |
| 0.0073        | 10.2402 | 38800 | 0.1116          | 35096960          |
| 0.0124        | 10.2930 | 39000 | 0.1121          | 35276448          |
| 0.0143        | 10.3458 | 39200 | 0.1120          | 35457088          |
| 0.0212        | 10.3986 | 39400 | 0.1123          | 35637600          |
| 0.0083        | 10.4514 | 39600 | 0.1119          | 35817824          |
| 0.0159        | 10.5042 | 39800 | 0.1114          | 35999840          |
| 0.0205        | 10.5569 | 40000 | 0.1121          | 36181120          |


### Framework versions

- PEFT 0.15.1
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1