You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

gpt2moe_het2_1000mb

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.7218

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 37035
  • training_steps: 370358
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 11.0404
8.6089 0.0540 2000 7.9850
7.5891 0.1080 4000 6.9885
7.021 0.1620 6000 6.4072
6.601 0.2160 8000 5.9924
6.3052 0.2700 10000 5.6787
6.0684 0.3240 12000 5.4355
5.8674 0.3780 14000 5.2383
5.7119 0.4320 16000 5.0759
5.5771 0.4860 18000 4.9257
5.4478 0.5400 20000 4.8131
5.3629 0.5940 22000 4.7186
5.2861 0.6480 24000 4.6508
5.2247 0.7020 26000 4.5947
5.1728 0.7560 28000 4.5430
5.1195 0.8100 30000 4.5001
5.0843 0.8640 32000 4.4586
5.0484 0.9180 34000 4.4238
5.0174 0.9720 36000 4.3903
4.9677 1.0260 38000 4.3583
4.9396 1.0800 40000 4.3316
4.9187 1.1340 42000 4.3021
4.8834 1.1880 44000 4.2751
4.8697 1.2420 46000 4.2509
4.8421 1.2960 48000 4.2305
4.831 1.3500 50000 4.2102
4.812 1.4040 52000 4.1905
4.789 1.4580 54000 4.1759
4.7778 1.5120 56000 4.1602
4.765 1.5660 58000 4.1463
4.7513 1.6200 60000 4.1310
4.7314 1.6740 62000 4.1172
4.7206 1.7280 64000 4.1064
4.7216 1.7820 66000 4.0936
4.709 1.8361 68000 4.0826
4.6926 1.8901 70000 4.0741
4.6861 1.9441 72000 4.0628
4.6841 1.9981 74000 4.0539
4.6384 2.0521 76000 4.0461
4.635 2.1061 78000 4.0370
4.6324 2.1601 80000 4.0301
4.6261 2.2141 82000 4.0234
4.6182 2.2681 84000 4.0163
4.6178 2.3221 86000 4.0089
4.6005 2.3761 88000 4.0028
4.6002 2.4301 90000 3.9960
4.6008 2.4841 92000 3.9891
4.5939 2.5381 94000 3.9844
4.5945 2.5921 96000 3.9790
4.5836 2.6461 98000 3.9717
4.5798 2.7001 100000 3.9657
4.5736 2.7541 102000 3.9619
4.5673 2.8081 104000 3.9566
4.568 2.8621 106000 3.9517
4.5528 2.9161 108000 3.9466
4.5535 2.9701 110000 3.9410
4.5124 3.0241 112000 3.9394
4.5236 3.0781 114000 3.9346
4.5183 3.1321 116000 3.9305
4.5173 3.1861 118000 3.9269
4.5177 3.2401 120000 3.9213
4.5127 3.2941 122000 3.9193
4.5143 3.3481 124000 3.9153
4.5073 3.4021 126000 3.9115
4.5079 3.4561 128000 3.9072
4.5062 3.5101 130000 3.9031
4.5012 3.5641 132000 3.9004
4.5043 3.6181 134000 3.8961
4.496 3.6721 136000 3.8935
4.4957 3.7261 138000 3.8898
4.4946 3.7801 140000 3.8871
4.4902 3.8341 142000 3.8854
4.4888 3.8881 144000 3.8803
4.4893 3.9421 146000 3.8768
4.4828 3.9961 148000 3.8741
4.4497 4.0501 150000 3.8737
4.454 4.1041 152000 3.8716
4.4552 4.1581 154000 3.8688
4.452 4.2121 156000 3.8662
4.4561 4.2661 158000 3.8633
4.4511 4.3201 160000 3.8612
4.4481 4.3741 162000 3.8574
4.4442 4.4281 164000 3.8554
4.449 4.4821 166000 3.8528
4.4401 4.5361 168000 3.8508
4.4439 4.5901 170000 3.8482
4.4422 4.6441 172000 3.8460
4.4414 4.6981 174000 3.8429
4.4374 4.7521 176000 3.8406
4.4391 4.8061 178000 3.8383
4.4355 4.8601 180000 3.8360
4.4375 4.9141 182000 3.8344
4.4319 4.9681 184000 3.8311
4.4008 5.0221 186000 3.8310
4.3976 5.0761 188000 3.8296
4.4069 5.1301 190000 3.8282
4.4045 5.1841 192000 3.8265
4.4073 5.2381 194000 3.8240
4.4018 5.2921 196000 3.8221
4.4043 5.3461 198000 3.8203
4.4059 5.4002 200000 3.8173
4.4053 5.4542 202000 3.8157
4.4035 5.5082 204000 3.8145
4.4002 5.5622 206000 3.8124
4.3997 5.6162 208000 3.8113
4.3893 5.6702 210000 3.8085
4.3951 5.7242 212000 3.8063
4.4003 5.7782 214000 3.8042
4.4008 5.8322 216000 3.8029
4.3973 5.8862 218000 3.8011
4.3942 5.9402 220000 3.7991
4.3916 5.9942 222000 3.7972
4.3605 6.0482 224000 3.7986
4.3661 6.1022 226000 3.7968
4.3624 6.1562 228000 3.7953
4.3667 6.2102 230000 3.7943
4.371 6.2642 232000 3.7933
4.3705 6.3182 234000 3.7902
4.3704 6.3722 236000 3.7885
4.3714 6.4262 238000 3.7880
4.3646 6.4802 240000 3.7869
4.3686 6.5342 242000 3.7839
4.3632 6.5882 244000 3.7828
4.3679 6.6422 246000 3.7814
4.3678 6.6962 248000 3.7798
4.3646 6.7502 250000 3.7778
4.3628 6.8042 252000 3.7761
4.3636 6.8582 254000 3.7749
4.3608 6.9122 256000 3.7730
4.3595 6.9662 258000 3.7717
4.3304 7.0202 260000 3.7723
4.3374 7.0742 262000 3.7719
4.3326 7.1282 264000 3.7703
4.3401 7.1822 266000 3.7696
4.3437 7.2362 268000 3.7676
4.3332 7.2902 270000 3.7667
4.3424 7.3442 272000 3.7651
4.3331 7.3982 274000 3.7639
4.3385 7.4522 276000 3.7621
4.3303 7.5062 278000 3.7605
4.3409 7.5602 280000 3.7602
4.3344 7.6142 282000 3.7580
4.3371 7.6682 284000 3.7569
4.338 7.7222 286000 3.7557
4.3347 7.7762 288000 3.7544
4.335 7.8302 290000 3.7530
4.3308 7.8842 292000 3.7526
4.3293 7.9382 294000 3.7510
4.3318 7.9922 296000 3.7499
4.3089 8.0462 298000 3.7501
4.308 8.1002 300000 3.7489
4.3088 8.1542 302000 3.7485
4.3076 8.2082 304000 3.7470
4.3023 8.2622 306000 3.7467
4.3057 8.3162 308000 3.7447
4.3107 8.3702 310000 3.7444
4.3086 8.4242 312000 3.7432
4.3074 8.4782 314000 3.7421
4.3182 8.5322 316000 3.7409
4.3085 8.5862 318000 3.7398
4.3085 8.6402 320000 3.7390
4.3086 8.6942 322000 3.7377
4.3069 8.7482 324000 3.7367
4.309 8.8022 326000 3.7352
4.3075 8.8562 328000 3.7350
4.3071 8.9102 330000 3.7329
4.3074 8.9643 332000 3.7330
4.2814 9.0183 334000 3.7325
4.2861 9.0723 336000 3.7323
4.2886 9.1263 338000 3.7316
4.2859 9.1803 340000 3.7312
4.2897 9.2343 342000 3.7299
4.287 9.2883 344000 3.7296
4.2855 9.3423 346000 3.7286
4.2817 9.3963 348000 3.7278
4.2816 9.4503 350000 3.7271
4.2857 9.5043 352000 3.7265
4.2845 9.5583 354000 3.7255
4.279 9.6123 356000 3.7250
4.2798 9.6663 358000 3.7243
4.2833 9.7203 360000 3.7236
4.2837 9.7743 362000 3.7233
4.2842 9.8283 364000 3.7224
4.2922 9.8823 366000 3.7222
4.2801 9.9363 368000 3.7220
4.2817 9.9903 370000 3.7218

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.1+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results