File size: 209,012 Bytes
211a309
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
2025-11-23 00:39:18,735 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_vision_base_lm_20251123_003859', objective='lm', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='base', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='free_ocr', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251123_003859', batch_size=8, gradient_accumulation_steps=6, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=True, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name='production_vision_base_lm_20251123_003859', resume_from_checkpoint=None, resume=None, init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=16, prefetch_factor=4, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=False, compile_mode='default', use_optimized_model=True, use_encoder_checkpointing=True, use_decoder_checkpointing=True)
2025-11-23 00:39:18,735 - INFO - Using preset vision prompt: 'free_ocr'''\nFree OCR.''
2025-11-23 00:39:18,735 - INFO - Setting random seed: 42
2025-11-23 00:39:20,423 - INFO - Initialized W&B run: vision-compression-2/production_vision_base_lm_20251123_003859 (ID: 7aj57hve)
2025-11-23 00:39:20,423 - INFO - Loading model and tokenizer...
2025-11-23 00:39:30,234 - INFO - Enabling decoder gradient checkpointing...
2025-11-23 00:39:30,242 - INFO -   ✓ Decoder checkpointing enabled for 12 transformer layers
2025-11-23 00:39:30,242 - INFO -   Expected: ~30-50% activation memory reduction, ~15-20% compute overhead
2025-11-23 00:39:30,270 - INFO - Created Vision Compression trainer (mode: base)
2025-11-23 00:39:30,270 - INFO - Training objective: lm
2025-11-23 00:39:30,303 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640
2025-11-23 00:39:30,303 - INFO - Loading training data from data/training/splits_510k/train.jsonl
2025-11-23 00:42:11,102 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl
2025-11-23 00:42:11,103 - INFO - Vision mode: base (273 tokens, 1024x1024)
2025-11-23 00:42:11,143 - INFO - Loading validation data from data/training/splits_510k/val.jsonl
2025-11-23 00:42:13,810 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl
2025-11-23 00:42:13,811 - INFO - Vision mode: base (273 tokens, 1024x1024)
2025-11-23 00:42:13,843 - INFO - Created AdamW optimizer with differential LR:
  Encoder: 474 param tensors @ lr=1e-05
  Decoder: 2236 param tensors @ lr=0.0001
  Fused kernels: True
2025-11-23 00:42:13,843 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417
2025-11-23 00:42:13,843 - INFO - Starting training loop...
2025-11-23 00:42:13,844 - INFO - 
======================================================================
2025-11-23 00:42:13,844 - INFO - Running initial validation (before any training)...
2025-11-23 00:42:13,844 - INFO - ======================================================================
2025-11-23 00:52:15,914 - INFO - Validation loss: 2.0308, perplexity: 7.62
2025-11-23 00:52:15,914 - INFO - Qualitative metrics (n=5):
2025-11-23 00:52:15,914 - INFO -   BLEU: 0.0539
2025-11-23 00:52:15,914 - INFO -   METEOR: 0.2296
2025-11-23 00:52:15,914 - INFO -   Edit Distance: 0.7797
2025-11-23 00:52:15,915 - INFO -   F-measure: 0.2083
2025-11-23 00:52:15,915 - INFO - 
======================================================================
2025-11-23 00:52:15,915 - INFO - Qualitative Evaluation Samples:
2025-11-23 00:52:15,915 - INFO - ======================================================================
2025-11-23 00:52:15,915 - INFO - 
Sample 1 (ID: sample_141920_chunk_1):
2025-11-23 00:52:15,915 - INFO - Context:      [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-23 00:52:15,915 - INFO - Generated:    'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...'
2025-11-23 00:52:15,916 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-23 00:52:15,916 - INFO - ----------------------------------------------------------------------
2025-11-23 00:52:15,916 - INFO - 
Sample 2 (ID: sample_170543_chunk_2):
2025-11-23 00:52:15,916 - INFO - Context:      [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-23 00:52:15,916 - INFO - Generated:    'was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROTC;...'
2025-11-23 00:52:15,916 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-23 00:52:15,916 - INFO - ----------------------------------------------------------------------
2025-11-23 00:52:15,916 - INFO - 
Sample 3 (ID: sample_107152_chunk_9):
2025-11-23 00:52:15,916 - INFO - Context:      [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-23 00:52:15,917 - INFO - Generated:    'at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and bo...'
2025-11-23 00:52:15,917 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-23 00:52:15,917 - INFO - ----------------------------------------------------------------------
2025-11-23 00:52:15,917 - INFO - 
Sample 4 (ID: sample_069148_chunk_0):
2025-11-23 00:52:15,917 - INFO - Context:      [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-23 00:52:15,917 - INFO - Generated:    '# Oriya (Unicode block) Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....'
2025-11-23 00:52:15,918 - INFO - Ground Truth: '-056  |             |                  | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam                                                 ...'
2025-11-23 00:52:15,918 - INFO - ----------------------------------------------------------------------
2025-11-23 00:52:15,918 - INFO - 
Sample 5 (ID: sample_103176_chunk_4):
2025-11-23 00:52:15,919 - INFO - Context:      [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-23 00:52:15,919 - INFO - Generated:    '| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores |\n|-----------------------|------------|---------|---------------------|\n| [ 132 ] | Ultima Underworld: The Stygian Abyss and ...'
2025-11-23 00:52:15,919 - INFO - Ground Truth: '1                     | PlayStation 2             | EA Tiburon                                                        | [ 150 ]                 |\n| Madden NFL 12                                       ...'
2025-11-23 00:52:15,919 - INFO - ----------------------------------------------------------------------
2025-11-23 00:52:15,920 - INFO - 
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_0.jsonl
2025-11-23 00:52:17,178 - INFO - Initial validation - Loss: 2.0308, Perplexity: 7.62
2025-11-23 00:52:17,178 - INFO - ======================================================================

2025-11-23 00:52:17,178 - INFO - 
======================================================================
2025-11-23 00:52:17,179 - INFO - Epoch 1/1
2025-11-23 00:52:17,179 - INFO - ======================================================================
2025-11-23 00:52:43,666 - WARNING - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`transformers.
2025-11-23 00:52:45,490 - INFO - Effective context tokens (per-sample): 278 | Compression ratio: 3.60x
2025-11-23 00:52:45,491 - INFO - Target tokens per sample: 1000
2025-11-23 00:57:24,015 - INFO - Epoch 1 Step 10 (Global: 10): loss=1.7593, ppl=5.81, grad_norm=1.97, lr=1.09e-06, throughput=1564 tok/s
2025-11-23 01:01:57,820 - INFO - Epoch 1 Step 20 (Global: 20): loss=1.8714, ppl=6.50, grad_norm=2.02, lr=1.17e-06, throughput=1753 tok/s
2025-11-23 01:06:41,116 - INFO - Epoch 1 Step 30 (Global: 30): loss=1.9781, ppl=7.23, grad_norm=1.91, lr=1.26e-06, throughput=1694 tok/s
2025-11-23 01:11:05,187 - INFO - Epoch 1 Step 40 (Global: 40): loss=1.9456, ppl=7.00, grad_norm=1.62, lr=1.35e-06, throughput=1818 tok/s
2025-11-23 01:15:37,139 - INFO - Epoch 1 Step 50 (Global: 50): loss=1.8491, ppl=6.35, grad_norm=1.58, lr=1.43e-06, throughput=1765 tok/s
2025-11-23 01:20:04,821 - INFO - Epoch 1 Step 60 (Global: 60): loss=1.9780, ppl=7.23, grad_norm=1.67, lr=1.52e-06, throughput=1793 tok/s
2025-11-23 01:24:37,195 - INFO - Epoch 1 Step 70 (Global: 70): loss=1.6874, ppl=5.41, grad_norm=1.50, lr=1.61e-06, throughput=1762 tok/s
2025-11-23 01:29:04,834 - INFO - Epoch 1 Step 80 (Global: 80): loss=1.9371, ppl=6.94, grad_norm=1.38, lr=1.69e-06, throughput=1793 tok/s
2025-11-23 01:33:37,286 - INFO - Epoch 1 Step 90 (Global: 90): loss=1.9360, ppl=6.93, grad_norm=1.56, lr=1.78e-06, throughput=1762 tok/s
2025-11-23 01:38:03,630 - INFO - Epoch 1 Step 100 (Global: 100): loss=1.8629, ppl=6.44, grad_norm=1.52, lr=1.86e-06, throughput=1802 tok/s
2025-11-23 01:42:29,466 - INFO - Epoch 1 Step 110 (Global: 110): loss=1.6511, ppl=5.21, grad_norm=1.53, lr=1.95e-06, throughput=1806 tok/s
2025-11-23 01:47:02,233 - INFO - Epoch 1 Step 120 (Global: 120): loss=1.9235, ppl=6.84, grad_norm=1.64, lr=2.04e-06, throughput=1760 tok/s
2025-11-23 01:51:33,410 - INFO - Epoch 1 Step 130 (Global: 130): loss=1.9405, ppl=6.96, grad_norm=1.54, lr=2.12e-06, throughput=1770 tok/s
2025-11-23 01:56:07,108 - INFO - Epoch 1 Step 140 (Global: 140): loss=1.8248, ppl=6.20, grad_norm=1.50, lr=2.21e-06, throughput=1754 tok/s
2025-11-23 02:00:35,146 - INFO - Epoch 1 Step 150 (Global: 150): loss=1.8491, ppl=6.35, grad_norm=1.62, lr=2.30e-06, throughput=1791 tok/s
2025-11-23 02:05:05,505 - INFO - Epoch 1 Step 160 (Global: 160): loss=1.8049, ppl=6.08, grad_norm=1.58, lr=2.38e-06, throughput=1775 tok/s
2025-11-23 02:09:30,912 - INFO - Epoch 1 Step 170 (Global: 170): loss=1.6813, ppl=5.37, grad_norm=1.81, lr=2.47e-06, throughput=1809 tok/s
2025-11-23 02:14:01,439 - INFO - Epoch 1 Step 180 (Global: 180): loss=2.0785, ppl=7.99, grad_norm=2.08, lr=2.56e-06, throughput=1774 tok/s
2025-11-23 02:18:25,224 - INFO - Epoch 1 Step 190 (Global: 190): loss=1.9836, ppl=7.27, grad_norm=1.52, lr=2.64e-06, throughput=1820 tok/s
2025-11-23 02:22:58,056 - INFO - Epoch 1 Step 200 (Global: 200): loss=1.9136, ppl=6.78, grad_norm=1.78, lr=2.73e-06, throughput=1759 tok/s
2025-11-23 02:27:24,835 - INFO - Epoch 1 Step 210 (Global: 210): loss=1.8671, ppl=6.47, grad_norm=1.83, lr=2.82e-06, throughput=1799 tok/s
2025-11-23 02:32:01,169 - INFO - Epoch 1 Step 220 (Global: 220): loss=1.7657, ppl=5.85, grad_norm=1.38, lr=2.90e-06, throughput=1737 tok/s
2025-11-23 02:36:25,089 - INFO - Epoch 1 Step 230 (Global: 230): loss=1.7978, ppl=6.04, grad_norm=1.97, lr=2.99e-06, throughput=1819 tok/s
2025-11-23 02:40:56,278 - INFO - Epoch 1 Step 240 (Global: 240): loss=1.7170, ppl=5.57, grad_norm=1.44, lr=3.07e-06, throughput=1770 tok/s
2025-11-23 02:45:22,171 - INFO - Epoch 1 Step 250 (Global: 250): loss=1.5863, ppl=4.89, grad_norm=3.17, lr=3.16e-06, throughput=1805 tok/s
2025-11-23 02:49:47,545 - INFO - Epoch 1 Step 260 (Global: 260): loss=1.6857, ppl=5.40, grad_norm=1.66, lr=3.25e-06, throughput=1809 tok/s
2025-11-23 02:54:21,931 - INFO - Epoch 1 Step 270 (Global: 270): loss=1.6470, ppl=5.19, grad_norm=1.64, lr=3.33e-06, throughput=1749 tok/s
2025-11-23 02:58:46,065 - INFO - Epoch 1 Step 280 (Global: 280): loss=1.7513, ppl=5.76, grad_norm=1.70, lr=3.42e-06, throughput=1817 tok/s
2025-11-23 03:03:17,628 - INFO - Epoch 1 Step 290 (Global: 290): loss=1.7990, ppl=6.04, grad_norm=1.77, lr=3.51e-06, throughput=1768 tok/s
2025-11-23 03:07:39,529 - INFO - Epoch 1 Step 300 (Global: 300): loss=1.8349, ppl=6.26, grad_norm=1.75, lr=3.59e-06, throughput=1833 tok/s
2025-11-23 03:12:10,072 - INFO - Epoch 1 Step 310 (Global: 310): loss=1.7573, ppl=5.80, grad_norm=1.47, lr=3.68e-06, throughput=1774 tok/s
2025-11-23 03:16:33,201 - INFO - Epoch 1 Step 320 (Global: 320): loss=1.7261, ppl=5.62, grad_norm=1.39, lr=3.77e-06, throughput=1824 tok/s
2025-11-23 03:21:06,264 - INFO - Epoch 1 Step 330 (Global: 330): loss=1.8710, ppl=6.49, grad_norm=1.66, lr=3.85e-06, throughput=1758 tok/s
2025-11-23 03:25:30,199 - INFO - Epoch 1 Step 340 (Global: 340): loss=1.8300, ppl=6.23, grad_norm=1.73, lr=3.94e-06, throughput=1819 tok/s
2025-11-23 03:30:01,991 - INFO - Epoch 1 Step 350 (Global: 350): loss=1.8026, ppl=6.07, grad_norm=1.97, lr=4.03e-06, throughput=1766 tok/s
2025-11-23 03:34:23,484 - INFO - Epoch 1 Step 360 (Global: 360): loss=1.7302, ppl=5.64, grad_norm=2.05, lr=4.11e-06, throughput=1836 tok/s
2025-11-23 03:38:55,945 - INFO - Epoch 1 Step 370 (Global: 370): loss=1.9307, ppl=6.89, grad_norm=1.64, lr=4.20e-06, throughput=1762 tok/s
2025-11-23 03:43:13,440 - INFO - Epoch 1 Step 380 (Global: 380): loss=1.8742, ppl=6.52, grad_norm=1.98, lr=4.29e-06, throughput=1864 tok/s
2025-11-23 03:47:40,900 - INFO - Epoch 1 Step 390 (Global: 390): loss=1.8013, ppl=6.06, grad_norm=2.14, lr=4.37e-06, throughput=1795 tok/s
2025-11-23 03:52:06,421 - INFO - Epoch 1 Step 400 (Global: 400): loss=1.8168, ppl=6.15, grad_norm=3.77, lr=4.46e-06, throughput=1808 tok/s
2025-11-23 03:56:41,183 - INFO - Epoch 1 Step 410 (Global: 410): loss=1.7967, ppl=6.03, grad_norm=1.97, lr=4.54e-06, throughput=1747 tok/s
2025-11-23 04:01:07,390 - INFO - Epoch 1 Step 420 (Global: 420): loss=1.9419, ppl=6.97, grad_norm=1.66, lr=4.63e-06, throughput=1803 tok/s
2025-11-23 04:05:26,674 - INFO - Epoch 1 Step 430 (Global: 430): loss=2.0034, ppl=7.41, grad_norm=1.66, lr=4.72e-06, throughput=1851 tok/s
2025-11-23 04:09:57,532 - INFO - Epoch 1 Step 440 (Global: 440): loss=1.8251, ppl=6.20, grad_norm=1.52, lr=4.80e-06, throughput=1772 tok/s
2025-11-23 04:14:22,077 - INFO - Epoch 1 Step 450 (Global: 450): loss=1.7433, ppl=5.72, grad_norm=1.47, lr=4.89e-06, throughput=1814 tok/s
2025-11-23 04:18:52,978 - INFO - Epoch 1 Step 460 (Global: 460): loss=1.7405, ppl=5.70, grad_norm=1.48, lr=4.98e-06, throughput=1772 tok/s
2025-11-23 04:23:13,662 - INFO - Epoch 1 Step 470 (Global: 470): loss=2.0292, ppl=7.61, grad_norm=1.52, lr=5.06e-06, throughput=1841 tok/s
2025-11-23 04:27:52,546 - INFO - Epoch 1 Step 480 (Global: 480): loss=1.7619, ppl=5.82, grad_norm=1.92, lr=5.15e-06, throughput=1721 tok/s
2025-11-23 04:32:22,310 - INFO - Epoch 1 Step 490 (Global: 490): loss=1.6590, ppl=5.25, grad_norm=1.68, lr=5.24e-06, throughput=1779 tok/s
2025-11-23 04:36:57,430 - INFO - Epoch 1 Step 500 (Global: 500): loss=1.9220, ppl=6.83, grad_norm=1.53, lr=5.32e-06, throughput=1745 tok/s
2025-11-23 04:41:18,878 - INFO - Epoch 1 Step 510 (Global: 510): loss=1.9820, ppl=7.26, grad_norm=1.72, lr=5.41e-06, throughput=1836 tok/s
2025-11-23 04:45:47,594 - INFO - Epoch 1 Step 520 (Global: 520): loss=1.7555, ppl=5.79, grad_norm=1.45, lr=5.50e-06, throughput=1786 tok/s
2025-11-23 04:50:31,443 - INFO - Epoch 1 Step 530 (Global: 530): loss=1.8434, ppl=6.32, grad_norm=1.84, lr=5.58e-06, throughput=1691 tok/s
2025-11-23 04:55:13,819 - INFO - Epoch 1 Step 540 (Global: 540): loss=1.8080, ppl=6.10, grad_norm=1.96, lr=5.67e-06, throughput=1700 tok/s
2025-11-23 04:59:41,762 - INFO - Epoch 1 Step 550 (Global: 550): loss=1.7525, ppl=5.77, grad_norm=1.88, lr=5.76e-06, throughput=1791 tok/s
2025-11-23 05:04:19,053 - INFO - Epoch 1 Step 560 (Global: 560): loss=1.6107, ppl=5.01, grad_norm=2.19, lr=5.84e-06, throughput=1731 tok/s
2025-11-23 05:08:48,835 - INFO - Epoch 1 Step 570 (Global: 570): loss=1.7103, ppl=5.53, grad_norm=1.76, lr=5.93e-06, throughput=1779 tok/s
2025-11-23 05:13:24,176 - INFO - Epoch 1 Step 580 (Global: 580): loss=1.6759, ppl=5.34, grad_norm=1.56, lr=6.01e-06, throughput=1743 tok/s
2025-11-23 05:17:49,658 - INFO - Epoch 1 Step 590 (Global: 590): loss=1.7054, ppl=5.50, grad_norm=1.86, lr=6.10e-06, throughput=1808 tok/s
2025-11-23 05:22:16,893 - INFO - Epoch 1 Step 600 (Global: 600): loss=1.8302, ppl=6.24, grad_norm=2.12, lr=6.19e-06, throughput=1796 tok/s
2025-11-23 05:26:56,819 - INFO - Epoch 1 Step 610 (Global: 610): loss=1.7796, ppl=5.93, grad_norm=1.88, lr=6.27e-06, throughput=1715 tok/s
2025-11-23 05:31:23,370 - INFO - Epoch 1 Step 620 (Global: 620): loss=1.8601, ppl=6.42, grad_norm=1.50, lr=6.36e-06, throughput=1801 tok/s
2025-11-23 05:35:54,665 - INFO - Epoch 1 Step 630 (Global: 630): loss=1.9594, ppl=7.10, grad_norm=1.54, lr=6.45e-06, throughput=1769 tok/s
2025-11-23 05:40:18,241 - INFO - Epoch 1 Step 640 (Global: 640): loss=1.7596, ppl=5.81, grad_norm=1.97, lr=6.53e-06, throughput=1821 tok/s
2025-11-23 05:44:57,561 - INFO - Epoch 1 Step 650 (Global: 650): loss=1.8469, ppl=6.34, grad_norm=1.32, lr=6.62e-06, throughput=1718 tok/s
2025-11-23 05:49:29,856 - INFO - Epoch 1 Step 660 (Global: 660): loss=1.4370, ppl=4.21, grad_norm=1.66, lr=6.71e-06, throughput=1763 tok/s
2025-11-23 05:54:04,170 - INFO - Epoch 1 Step 670 (Global: 670): loss=1.8127, ppl=6.13, grad_norm=1.54, lr=6.79e-06, throughput=1750 tok/s
2025-11-23 05:58:34,903 - INFO - Epoch 1 Step 680 (Global: 680): loss=1.8977, ppl=6.67, grad_norm=1.72, lr=6.88e-06, throughput=1773 tok/s
2025-11-23 06:03:13,360 - INFO - Epoch 1 Step 690 (Global: 690): loss=1.6731, ppl=5.33, grad_norm=2.06, lr=6.97e-06, throughput=1724 tok/s
2025-11-23 06:07:43,687 - INFO - Epoch 1 Step 700 (Global: 700): loss=1.9364, ppl=6.93, grad_norm=1.72, lr=7.05e-06, throughput=1776 tok/s
2025-11-23 06:12:22,812 - INFO - Epoch 1 Step 710 (Global: 710): loss=1.7367, ppl=5.68, grad_norm=1.79, lr=7.14e-06, throughput=1720 tok/s
2025-11-23 06:16:54,320 - INFO - Epoch 1 Step 720 (Global: 720): loss=1.9686, ppl=7.16, grad_norm=1.56, lr=7.22e-06, throughput=1768 tok/s
2025-11-23 06:21:37,629 - INFO - Epoch 1 Step 730 (Global: 730): loss=1.9925, ppl=7.33, grad_norm=1.66, lr=7.31e-06, throughput=1694 tok/s
2025-11-23 06:26:12,390 - INFO - Epoch 1 Step 740 (Global: 740): loss=1.7124, ppl=5.54, grad_norm=1.61, lr=7.40e-06, throughput=1747 tok/s
2025-11-23 06:30:42,732 - INFO - Epoch 1 Step 750 (Global: 750): loss=1.6240, ppl=5.07, grad_norm=1.38, lr=7.48e-06, throughput=1776 tok/s
2025-11-23 06:35:19,563 - INFO - Epoch 1 Step 760 (Global: 760): loss=1.8446, ppl=6.33, grad_norm=1.66, lr=7.57e-06, throughput=1734 tok/s
2025-11-23 06:39:49,085 - INFO - Epoch 1 Step 770 (Global: 770): loss=1.7033, ppl=5.49, grad_norm=1.75, lr=7.66e-06, throughput=1781 tok/s
2025-11-23 06:44:24,611 - INFO - Epoch 1 Step 780 (Global: 780): loss=1.6641, ppl=5.28, grad_norm=1.59, lr=7.74e-06, throughput=1742 tok/s
2025-11-23 06:48:49,166 - INFO - Epoch 1 Step 790 (Global: 790): loss=1.5717, ppl=4.81, grad_norm=2.38, lr=7.83e-06, throughput=1814 tok/s
2025-11-23 06:53:27,207 - INFO - Epoch 1 Step 800 (Global: 800): loss=1.8194, ppl=6.17, grad_norm=1.51, lr=7.92e-06, throughput=1726 tok/s
2025-11-23 06:57:59,735 - INFO - Epoch 1 Step 810 (Global: 810): loss=1.8363, ppl=6.27, grad_norm=1.62, lr=8.00e-06, throughput=1761 tok/s
2025-11-23 07:02:39,560 - INFO - Epoch 1 Step 820 (Global: 820): loss=1.6740, ppl=5.33, grad_norm=1.87, lr=8.09e-06, throughput=1715 tok/s
2025-11-23 07:07:10,158 - INFO - Epoch 1 Step 830 (Global: 830): loss=1.8374, ppl=6.28, grad_norm=1.66, lr=8.18e-06, throughput=1774 tok/s
2025-11-23 07:11:55,310 - INFO - Epoch 1 Step 840 (Global: 840): loss=1.8046, ppl=6.08, grad_norm=1.80, lr=8.26e-06, throughput=1683 tok/s
2025-11-23 07:16:27,710 - INFO - Epoch 1 Step 850 (Global: 850): loss=2.1125, ppl=8.27, grad_norm=1.51, lr=8.35e-06, throughput=1762 tok/s
2025-11-23 07:21:05,529 - INFO - Epoch 1 Step 860 (Global: 860): loss=1.8099, ppl=6.11, grad_norm=1.74, lr=8.44e-06, throughput=1728 tok/s
2025-11-23 07:25:34,528 - INFO - Epoch 1 Step 870 (Global: 870): loss=1.5395, ppl=4.66, grad_norm=1.65, lr=8.52e-06, throughput=1784 tok/s
2025-11-23 07:30:20,213 - INFO - Epoch 1 Step 880 (Global: 880): loss=1.8677, ppl=6.47, grad_norm=1.95, lr=8.61e-06, throughput=1680 tok/s
2025-11-23 07:34:52,863 - INFO - Epoch 1 Step 890 (Global: 890): loss=1.6371, ppl=5.14, grad_norm=2.11, lr=8.69e-06, throughput=1761 tok/s
2025-11-23 07:39:32,779 - INFO - Epoch 1 Step 900 (Global: 900): loss=1.6667, ppl=5.29, grad_norm=1.55, lr=8.78e-06, throughput=1715 tok/s
2025-11-23 07:44:00,274 - INFO - Epoch 1 Step 910 (Global: 910): loss=1.7994, ppl=6.05, grad_norm=1.45, lr=8.87e-06, throughput=1794 tok/s
2025-11-23 07:48:30,835 - INFO - Epoch 1 Step 920 (Global: 920): loss=1.7632, ppl=5.83, grad_norm=1.41, lr=8.95e-06, throughput=1774 tok/s
2025-11-23 07:53:09,484 - INFO - Epoch 1 Step 930 (Global: 930): loss=1.9497, ppl=7.03, grad_norm=2.33, lr=9.04e-06, throughput=1723 tok/s
2025-11-23 07:57:36,327 - INFO - Epoch 1 Step 940 (Global: 940): loss=1.6791, ppl=5.36, grad_norm=2.25, lr=9.13e-06, throughput=1799 tok/s
2025-11-23 08:02:11,875 - INFO - Epoch 1 Step 950 (Global: 950): loss=1.7534, ppl=5.77, grad_norm=1.56, lr=9.21e-06, throughput=1742 tok/s
2025-11-23 08:06:40,795 - INFO - Epoch 1 Step 960 (Global: 960): loss=1.7496, ppl=5.75, grad_norm=1.69, lr=9.30e-06, throughput=1785 tok/s
2025-11-23 08:11:22,399 - INFO - Epoch 1 Step 970 (Global: 970): loss=1.8325, ppl=6.25, grad_norm=1.59, lr=9.39e-06, throughput=1705 tok/s
2025-11-23 08:15:57,763 - INFO - Epoch 1 Step 980 (Global: 980): loss=1.7496, ppl=5.75, grad_norm=1.56, lr=9.47e-06, throughput=1743 tok/s
2025-11-23 08:20:38,394 - INFO - Epoch 1 Step 990 (Global: 990): loss=1.7299, ppl=5.64, grad_norm=1.63, lr=9.56e-06, throughput=1710 tok/s
2025-11-23 08:25:11,235 - INFO - Epoch 1 Step 1000 (Global: 1000): loss=1.8158, ppl=6.15, grad_norm=2.00, lr=9.65e-06, throughput=1759 tok/s
2025-11-23 08:29:51,571 - INFO - Epoch 1 Step 1010 (Global: 1010): loss=2.0486, ppl=7.76, grad_norm=1.88, lr=9.73e-06, throughput=1712 tok/s
2025-11-23 08:34:18,589 - INFO - Epoch 1 Step 1020 (Global: 1020): loss=2.0579, ppl=7.83, grad_norm=1.54, lr=9.82e-06, throughput=1798 tok/s
2025-11-23 08:38:57,262 - INFO - Epoch 1 Step 1030 (Global: 1030): loss=1.8081, ppl=6.10, grad_norm=1.70, lr=9.90e-06, throughput=1722 tok/s
2025-11-23 08:43:27,900 - INFO - Epoch 1 Step 1040 (Global: 1040): loss=1.6845, ppl=5.39, grad_norm=1.29, lr=9.99e-06, throughput=1774 tok/s
2025-11-23 08:48:04,489 - INFO - Epoch 1 Step 1050 (Global: 1050): loss=1.7266, ppl=5.62, grad_norm=1.70, lr=1.00e-05, throughput=1735 tok/s
2025-11-23 08:52:39,228 - INFO - Epoch 1 Step 1060 (Global: 1060): loss=1.7992, ppl=6.04, grad_norm=1.59, lr=1.00e-05, throughput=1747 tok/s
2025-11-23 08:57:09,915 - INFO - Epoch 1 Step 1070 (Global: 1070): loss=1.7306, ppl=5.64, grad_norm=1.70, lr=1.00e-05, throughput=1773 tok/s
2025-11-23 09:01:50,384 - INFO - Epoch 1 Step 1080 (Global: 1080): loss=1.9913, ppl=7.33, grad_norm=1.55, lr=1.00e-05, throughput=1711 tok/s
2025-11-23 09:06:30,346 - INFO - Epoch 1 Step 1090 (Global: 1090): loss=1.7767, ppl=5.91, grad_norm=1.89, lr=1.00e-05, throughput=1715 tok/s
2025-11-23 09:11:52,775 - INFO - Epoch 1 Step 1100 (Global: 1100): loss=1.7927, ppl=6.01, grad_norm=1.58, lr=1.00e-05, throughput=1489 tok/s
2025-11-23 09:17:02,533 - INFO - Epoch 1 Step 1110 (Global: 1110): loss=1.6752, ppl=5.34, grad_norm=1.80, lr=1.00e-05, throughput=1550 tok/s
2025-11-23 09:22:25,918 - INFO - Epoch 1 Step 1120 (Global: 1120): loss=1.9292, ppl=6.88, grad_norm=1.73, lr=1.00e-05, throughput=1484 tok/s
2025-11-23 09:27:38,164 - INFO - Epoch 1 Step 1130 (Global: 1130): loss=1.7571, ppl=5.80, grad_norm=1.81, lr=1.00e-05, throughput=1537 tok/s
2025-11-23 09:32:55,186 - INFO - Epoch 1 Step 1140 (Global: 1140): loss=1.7812, ppl=5.94, grad_norm=1.58, lr=1.00e-05, throughput=1514 tok/s
2025-11-23 09:38:00,286 - INFO - Epoch 1 Step 1150 (Global: 1150): loss=1.8185, ppl=6.16, grad_norm=1.55, lr=1.00e-05, throughput=1573 tok/s
2025-11-23 09:43:21,131 - INFO - Epoch 1 Step 1160 (Global: 1160): loss=1.8385, ppl=6.29, grad_norm=1.74, lr=1.00e-05, throughput=1496 tok/s
2025-11-23 09:48:16,144 - INFO - Epoch 1 Step 1170 (Global: 1170): loss=1.6718, ppl=5.32, grad_norm=1.78, lr=1.00e-05, throughput=1627 tok/s
2025-11-23 09:53:03,079 - INFO - Epoch 1 Step 1180 (Global: 1180): loss=1.7708, ppl=5.88, grad_norm=1.36, lr=9.99e-06, throughput=1673 tok/s
2025-11-23 09:57:38,189 - INFO - Epoch 1 Step 1190 (Global: 1190): loss=1.9106, ppl=6.76, grad_norm=1.41, lr=9.99e-06, throughput=1745 tok/s
2025-11-23 10:02:19,571 - INFO - Epoch 1 Step 1200 (Global: 1200): loss=1.5958, ppl=4.93, grad_norm=1.42, lr=9.99e-06, throughput=1706 tok/s
2025-11-23 10:06:57,515 - INFO - Epoch 1 Step 1210 (Global: 1210): loss=1.6352, ppl=5.13, grad_norm=2.16, lr=9.99e-06, throughput=1727 tok/s
2025-11-23 10:11:33,689 - INFO - Epoch 1 Step 1220 (Global: 1220): loss=1.8879, ppl=6.61, grad_norm=1.84, lr=9.99e-06, throughput=1738 tok/s
2025-11-23 10:16:20,558 - INFO - Epoch 1 Step 1230 (Global: 1230): loss=1.7523, ppl=5.77, grad_norm=1.75, lr=9.99e-06, throughput=1673 tok/s
2025-11-23 10:20:56,565 - INFO - Epoch 1 Step 1240 (Global: 1240): loss=1.5437, ppl=4.68, grad_norm=2.09, lr=9.99e-06, throughput=1739 tok/s
2025-11-23 10:25:41,681 - INFO - Epoch 1 Step 1250 (Global: 1250): loss=1.7735, ppl=5.89, grad_norm=3.05, lr=9.99e-06, throughput=1684 tok/s
2025-11-23 10:30:18,488 - INFO - Epoch 1 Step 1260 (Global: 1260): loss=1.8289, ppl=6.23, grad_norm=1.38, lr=9.99e-06, throughput=1734 tok/s
2025-11-23 10:34:59,212 - INFO - Epoch 1 Step 1270 (Global: 1270): loss=1.8474, ppl=6.34, grad_norm=1.41, lr=9.99e-06, throughput=1710 tok/s
2025-11-23 10:39:33,590 - INFO - Epoch 1 Step 1280 (Global: 1280): loss=1.7974, ppl=6.03, grad_norm=1.38, lr=9.98e-06, throughput=1749 tok/s
2025-11-23 10:44:16,615 - INFO - Epoch 1 Step 1290 (Global: 1290): loss=1.7240, ppl=5.61, grad_norm=1.48, lr=9.98e-06, throughput=1696 tok/s
2025-11-23 10:48:52,263 - INFO - Epoch 1 Step 1300 (Global: 1300): loss=1.6631, ppl=5.28, grad_norm=1.70, lr=9.98e-06, throughput=1741 tok/s
2025-11-23 10:53:36,619 - INFO - Epoch 1 Step 1310 (Global: 1310): loss=1.5080, ppl=4.52, grad_norm=1.40, lr=9.98e-06, throughput=1688 tok/s
2025-11-23 10:58:09,923 - INFO - Epoch 1 Step 1320 (Global: 1320): loss=1.9739, ppl=7.20, grad_norm=1.45, lr=9.98e-06, throughput=1756 tok/s
2025-11-23 11:02:53,499 - INFO - Epoch 1 Step 1330 (Global: 1330): loss=2.0231, ppl=7.56, grad_norm=1.62, lr=9.98e-06, throughput=1693 tok/s
2025-11-23 11:07:26,633 - INFO - Epoch 1 Step 1340 (Global: 1340): loss=1.7390, ppl=5.69, grad_norm=2.58, lr=9.97e-06, throughput=1757 tok/s
2025-11-23 11:12:12,535 - INFO - Epoch 1 Step 1350 (Global: 1350): loss=1.9460, ppl=7.00, grad_norm=1.43, lr=9.97e-06, throughput=1679 tok/s
2025-11-23 11:16:46,129 - INFO - Epoch 1 Step 1360 (Global: 1360): loss=1.9002, ppl=6.69, grad_norm=1.53, lr=9.97e-06, throughput=1754 tok/s
2025-11-23 11:21:25,242 - INFO - Epoch 1 Step 1370 (Global: 1370): loss=1.9676, ppl=7.15, grad_norm=1.77, lr=9.97e-06, throughput=1720 tok/s
2025-11-23 11:26:14,529 - INFO - Epoch 1 Step 1380 (Global: 1380): loss=1.6781, ppl=5.36, grad_norm=1.28, lr=9.97e-06, throughput=1659 tok/s
2025-11-23 11:30:47,107 - INFO - Epoch 1 Step 1390 (Global: 1390): loss=1.8783, ppl=6.54, grad_norm=1.48, lr=9.97e-06, throughput=1761 tok/s
2025-11-23 11:35:34,554 - INFO - Epoch 1 Step 1400 (Global: 1400): loss=1.9413, ppl=6.97, grad_norm=1.79, lr=9.96e-06, throughput=1671 tok/s
2025-11-23 11:40:12,920 - INFO - Epoch 1 Step 1410 (Global: 1410): loss=1.7207, ppl=5.59, grad_norm=1.60, lr=9.96e-06, throughput=1724 tok/s
2025-11-23 11:44:53,839 - INFO - Epoch 1 Step 1420 (Global: 1420): loss=1.7518, ppl=5.76, grad_norm=1.48, lr=9.96e-06, throughput=1709 tok/s
2025-11-23 11:49:29,268 - INFO - Epoch 1 Step 1430 (Global: 1430): loss=1.6266, ppl=5.09, grad_norm=1.66, lr=9.96e-06, throughput=1743 tok/s
2025-11-23 11:54:10,230 - INFO - Epoch 1 Step 1440 (Global: 1440): loss=1.8610, ppl=6.43, grad_norm=1.78, lr=9.96e-06, throughput=1708 tok/s
2025-11-23 11:58:35,404 - INFO - Epoch 1 Step 1450 (Global: 1450): loss=2.0201, ppl=7.54, grad_norm=2.25, lr=9.95e-06, throughput=1810 tok/s
2025-11-23 12:02:57,173 - INFO - Epoch 1 Step 1460 (Global: 1460): loss=1.6170, ppl=5.04, grad_norm=1.30, lr=9.95e-06, throughput=1834 tok/s
2025-11-23 12:07:14,296 - INFO - Epoch 1 Step 1470 (Global: 1470): loss=1.8996, ppl=6.68, grad_norm=1.90, lr=9.95e-06, throughput=1867 tok/s
2025-11-23 12:11:41,080 - INFO - Epoch 1 Step 1480 (Global: 1480): loss=1.8149, ppl=6.14, grad_norm=1.54, lr=9.95e-06, throughput=1799 tok/s
2025-11-23 12:15:58,189 - INFO - Epoch 1 Step 1490 (Global: 1490): loss=1.7161, ppl=5.56, grad_norm=1.42, lr=9.94e-06, throughput=1867 tok/s
2025-11-23 12:20:23,729 - INFO - Epoch 1 Step 1500 (Global: 1500): loss=1.8604, ppl=6.43, grad_norm=1.43, lr=9.94e-06, throughput=1808 tok/s
2025-11-23 12:24:40,943 - INFO - Epoch 1 Step 1510 (Global: 1510): loss=1.8503, ppl=6.36, grad_norm=1.25, lr=9.94e-06, throughput=1866 tok/s
2025-11-23 12:28:57,145 - INFO - Epoch 1 Step 1520 (Global: 1520): loss=1.7975, ppl=6.03, grad_norm=1.33, lr=9.94e-06, throughput=1874 tok/s
2025-11-23 12:33:19,906 - INFO - Epoch 1 Step 1530 (Global: 1530): loss=1.6239, ppl=5.07, grad_norm=1.44, lr=9.93e-06, throughput=1827 tok/s
2025-11-23 12:37:33,860 - INFO - Epoch 1 Step 1540 (Global: 1540): loss=1.3864, ppl=4.00, grad_norm=1.95, lr=9.93e-06, throughput=1890 tok/s
2025-11-23 12:41:56,450 - INFO - Epoch 1 Step 1550 (Global: 1550): loss=1.5598, ppl=4.76, grad_norm=1.52, lr=9.93e-06, throughput=1828 tok/s
2025-11-23 12:46:11,614 - INFO - Epoch 1 Step 1560 (Global: 1560): loss=2.0108, ppl=7.47, grad_norm=1.69, lr=9.92e-06, throughput=1881 tok/s
2025-11-23 12:50:37,762 - INFO - Epoch 1 Step 1570 (Global: 1570): loss=1.8453, ppl=6.33, grad_norm=1.24, lr=9.92e-06, throughput=1804 tok/s
2025-11-23 12:55:03,335 - INFO - Epoch 1 Step 1580 (Global: 1580): loss=1.8278, ppl=6.22, grad_norm=1.59, lr=9.92e-06, throughput=1807 tok/s
2025-11-23 12:59:53,884 - INFO - Epoch 1 Step 1590 (Global: 1590): loss=1.7180, ppl=5.57, grad_norm=1.38, lr=9.92e-06, throughput=1652 tok/s
2025-11-23 13:04:32,368 - INFO - Epoch 1 Step 1600 (Global: 1600): loss=1.7084, ppl=5.52, grad_norm=2.12, lr=9.91e-06, throughput=1724 tok/s
2025-11-23 13:09:14,567 - INFO - Epoch 1 Step 1610 (Global: 1610): loss=1.8012, ppl=6.06, grad_norm=1.38, lr=9.91e-06, throughput=1701 tok/s
2025-11-23 13:13:49,979 - INFO - Epoch 1 Step 1620 (Global: 1620): loss=1.5019, ppl=4.49, grad_norm=1.57, lr=9.91e-06, throughput=1743 tok/s
2025-11-23 13:18:35,141 - INFO - Epoch 1 Step 1630 (Global: 1630): loss=1.8531, ppl=6.38, grad_norm=2.42, lr=9.90e-06, throughput=1683 tok/s
2025-11-23 13:23:04,933 - INFO - Epoch 1 Step 1640 (Global: 1640): loss=1.6424, ppl=5.17, grad_norm=1.34, lr=9.90e-06, throughput=1779 tok/s
2025-11-23 13:27:38,269 - INFO - Epoch 1 Step 1650 (Global: 1650): loss=1.8881, ppl=6.61, grad_norm=1.42, lr=9.90e-06, throughput=1756 tok/s
2025-11-23 13:32:07,003 - INFO - Epoch 1 Step 1660 (Global: 1660): loss=1.9101, ppl=6.75, grad_norm=1.31, lr=9.89e-06, throughput=1786 tok/s
2025-11-23 13:36:24,095 - INFO - Epoch 1 Step 1670 (Global: 1670): loss=1.9131, ppl=6.77, grad_norm=1.22, lr=9.89e-06, throughput=1867 tok/s
2025-11-23 13:40:44,944 - INFO - Epoch 1 Step 1680 (Global: 1680): loss=1.6259, ppl=5.08, grad_norm=1.61, lr=9.89e-06, throughput=1840 tok/s
2025-11-23 13:45:03,642 - INFO - Epoch 1 Step 1690 (Global: 1690): loss=1.7649, ppl=5.84, grad_norm=1.59, lr=9.88e-06, throughput=1855 tok/s
2025-11-23 13:49:27,295 - INFO - Epoch 1 Step 1700 (Global: 1700): loss=1.8114, ppl=6.12, grad_norm=1.99, lr=9.88e-06, throughput=1821 tok/s
2025-11-23 13:53:45,537 - INFO - Epoch 1 Step 1710 (Global: 1710): loss=1.8832, ppl=6.57, grad_norm=1.64, lr=9.87e-06, throughput=1859 tok/s
2025-11-23 13:58:11,232 - INFO - Epoch 1 Step 1720 (Global: 1720): loss=1.8805, ppl=6.56, grad_norm=1.81, lr=9.87e-06, throughput=1807 tok/s
2025-11-23 14:02:25,345 - INFO - Epoch 1 Step 1730 (Global: 1730): loss=1.5991, ppl=4.95, grad_norm=1.86, lr=9.87e-06, throughput=1889 tok/s
2025-11-23 14:07:02,287 - INFO - Epoch 1 Step 1740 (Global: 1740): loss=1.8932, ppl=6.64, grad_norm=1.60, lr=9.86e-06, throughput=1733 tok/s
2025-11-23 14:11:19,478 - INFO - Epoch 1 Step 1750 (Global: 1750): loss=1.6949, ppl=5.45, grad_norm=1.55, lr=9.86e-06, throughput=1866 tok/s
2025-11-23 14:16:01,709 - INFO - Epoch 1 Step 1760 (Global: 1760): loss=1.5372, ppl=4.65, grad_norm=1.60, lr=9.86e-06, throughput=1701 tok/s
2025-11-23 14:20:41,772 - INFO - Epoch 1 Step 1770 (Global: 1770): loss=1.7446, ppl=5.72, grad_norm=1.45, lr=9.85e-06, throughput=1714 tok/s
2025-11-23 14:25:42,085 - INFO - Epoch 1 Step 1780 (Global: 1780): loss=1.6518, ppl=5.22, grad_norm=1.41, lr=9.85e-06, throughput=1598 tok/s
2025-11-23 14:30:19,646 - INFO - Epoch 1 Step 1790 (Global: 1790): loss=1.7556, ppl=5.79, grad_norm=1.40, lr=9.84e-06, throughput=1729 tok/s
2025-11-23 14:35:10,893 - INFO - Epoch 1 Step 1800 (Global: 1800): loss=1.6196, ppl=5.05, grad_norm=1.24, lr=9.84e-06, throughput=1648 tok/s
2025-11-23 14:39:46,086 - INFO - Epoch 1 Step 1810 (Global: 1810): loss=1.7911, ppl=6.00, grad_norm=1.59, lr=9.83e-06, throughput=1744 tok/s
2025-11-23 14:44:27,445 - INFO - Epoch 1 Step 1820 (Global: 1820): loss=1.6682, ppl=5.30, grad_norm=1.61, lr=9.83e-06, throughput=1706 tok/s
2025-11-23 14:48:58,949 - INFO - Epoch 1 Step 1830 (Global: 1830): loss=1.5032, ppl=4.50, grad_norm=1.55, lr=9.83e-06, throughput=1768 tok/s
2025-11-23 14:53:26,038 - INFO - Epoch 1 Step 1840 (Global: 1840): loss=1.8502, ppl=6.36, grad_norm=1.35, lr=9.82e-06, throughput=1797 tok/s
2025-11-23 14:57:54,220 - INFO - Epoch 1 Step 1850 (Global: 1850): loss=1.8354, ppl=6.27, grad_norm=1.79, lr=9.82e-06, throughput=1790 tok/s
2025-11-23 15:02:09,241 - INFO - Epoch 1 Step 1860 (Global: 1860): loss=1.7477, ppl=5.74, grad_norm=1.36, lr=9.81e-06, throughput=1882 tok/s
2025-11-23 15:06:54,167 - INFO - Epoch 1 Step 1870 (Global: 1870): loss=1.7156, ppl=5.56, grad_norm=1.34, lr=9.81e-06, throughput=1685 tok/s
2025-11-23 15:11:30,550 - INFO - Epoch 1 Step 1880 (Global: 1880): loss=1.4651, ppl=4.33, grad_norm=1.43, lr=9.80e-06, throughput=1737 tok/s
2025-11-23 15:16:29,033 - INFO - Epoch 1 Step 1890 (Global: 1890): loss=1.7502, ppl=5.76, grad_norm=1.38, lr=9.80e-06, throughput=1608 tok/s
2025-11-23 15:21:32,586 - INFO - Epoch 1 Step 1900 (Global: 1900): loss=1.6253, ppl=5.08, grad_norm=1.41, lr=9.79e-06, throughput=1581 tok/s
2025-11-23 15:26:42,126 - INFO - Epoch 1 Step 1910 (Global: 1910): loss=1.8108, ppl=6.12, grad_norm=1.51, lr=9.79e-06, throughput=1551 tok/s
2025-11-23 15:31:36,099 - INFO - Epoch 1 Step 1920 (Global: 1920): loss=1.8351, ppl=6.27, grad_norm=2.00, lr=9.78e-06, throughput=1633 tok/s
2025-11-23 15:36:24,551 - INFO - Epoch 1 Step 1930 (Global: 1930): loss=1.5741, ppl=4.83, grad_norm=1.38, lr=9.78e-06, throughput=1664 tok/s
2025-11-23 15:40:46,885 - INFO - Epoch 1 Step 1940 (Global: 1940): loss=1.7876, ppl=5.98, grad_norm=4.44, lr=9.77e-06, throughput=1830 tok/s
2025-11-23 15:45:30,781 - INFO - Epoch 1 Step 1950 (Global: 1950): loss=1.6304, ppl=5.11, grad_norm=1.20, lr=9.77e-06, throughput=1691 tok/s
2025-11-23 15:50:22,398 - INFO - Epoch 1 Step 1960 (Global: 1960): loss=1.6829, ppl=5.38, grad_norm=1.80, lr=9.76e-06, throughput=1646 tok/s
2025-11-23 15:54:51,970 - INFO - Epoch 1 Step 1970 (Global: 1970): loss=1.7271, ppl=5.62, grad_norm=2.12, lr=9.76e-06, throughput=1781 tok/s
2025-11-23 15:59:22,747 - INFO - Epoch 1 Step 1980 (Global: 1980): loss=1.7655, ppl=5.84, grad_norm=1.55, lr=9.75e-06, throughput=1773 tok/s
2025-11-23 16:03:44,889 - INFO - Epoch 1 Step 1990 (Global: 1990): loss=1.6094, ppl=5.00, grad_norm=1.35, lr=9.75e-06, throughput=1831 tok/s
2025-11-23 16:08:40,786 - INFO - Epoch 1 Step 2000 (Global: 2000): loss=1.8451, ppl=6.33, grad_norm=1.32, lr=9.74e-06, throughput=1622 tok/s
2025-11-23 16:08:40,786 - INFO - 
Running validation at step 2000...
2025-11-23 16:22:20,399 - INFO - Validation loss: 1.7648, perplexity: 5.84
2025-11-23 16:22:20,400 - INFO - Qualitative metrics (n=5):
2025-11-23 16:22:20,400 - INFO -   BLEU: 0.1070
2025-11-23 16:22:20,400 - INFO -   METEOR: 0.1883
2025-11-23 16:22:20,401 - INFO -   Edit Distance: 0.7417
2025-11-23 16:22:20,401 - INFO -   F-measure: 0.2050
2025-11-23 16:22:20,401 - INFO - 
======================================================================
2025-11-23 16:22:20,402 - INFO - Qualitative Evaluation Samples:
2025-11-23 16:22:20,402 - INFO - ======================================================================
2025-11-23 16:22:20,402 - INFO - 
Sample 1 (ID: sample_141920_chunk_1):
2025-11-23 16:22:20,402 - INFO - Context:      [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-23 16:22:20,403 - INFO - Generated:    ' to the work of the Beatles, saying that "the Beatles\' work is a work of art, and the work of the Beatles is a work of art. The Beatles\' work is a work of art, and the work of the Beatles is a work of...'
2025-11-23 16:22:20,404 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-23 16:22:20,404 - INFO - ----------------------------------------------------------------------
2025-11-23 16:22:20,404 - INFO - 
Sample 2 (ID: sample_170543_chunk_2):
2025-11-23 16:22:20,405 - INFO - Context:      [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-23 16:22:20,405 - INFO - Generated:    "aternities, but rather among the student body. The Order of Angel's first meeting was held in the home of a student, and the Order's first official meeting was held in the home of a student. The Order..."
2025-11-23 16:22:20,405 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-23 16:22:20,406 - INFO - ----------------------------------------------------------------------
2025-11-23 16:22:20,406 - INFO - 
Sample 3 (ID: sample_107152_chunk_9):
2025-11-23 16:22:20,407 - INFO - Context:      [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-23 16:22:20,407 - INFO - Generated:    " be defeated by Teimou's son, Teimou's son's son, and Teimou's son's son. Teimou's son, Teimou's son's son, and Teimou's son's son, all defeat Teimou's son, and Teimou's son's son, and Teimou's son's ..."
2025-11-23 16:22:20,408 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-23 16:22:20,408 - INFO - ----------------------------------------------------------------------
2025-11-23 16:22:20,408 - INFO - 
Sample 4 (ID: sample_069148_chunk_0):
2025-11-23 16:22:20,409 - INFO - Context:      [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-23 16:22:20,409 - INFO - Generated:    ' | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x...'
2025-11-23 16:22:20,410 - INFO - Ground Truth: '-056  |             |                  | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam                                                 ...'
2025-11-23 16:22:20,410 - INFO - ----------------------------------------------------------------------
2025-11-23 16:22:20,411 - INFO - 
Sample 5 (ID: sample_103176_chunk_4):
2025-11-23 16:22:20,411 - INFO - Context:      [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-23 16:22:20,411 - INFO - Generated:    '1 | iOS  | EA Tiburon        | [ 150 ] |\n| Madden NFL 12 2011 | August 30, 2011 | Android | EA Tiburon        | [ 151 ] |\n| Madden NFL 12 2011 | August 30, 2011 | iOS  | EA Tiburon        | [ 152 ] |\n...'
2025-11-23 16:22:20,412 - INFO - Ground Truth: '1                     | PlayStation 2             | EA Tiburon                                                        | [ 150 ]                 |\n| Madden NFL 12                                       ...'
2025-11-23 16:22:20,412 - INFO - ----------------------------------------------------------------------
2025-11-23 16:22:20,414 - INFO - 
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_2000.jsonl
2025-11-23 16:23:24,391 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-23 16:23:24,411 - INFO - New best validation loss: 1.7648, perplexity: 5.84
2025-11-23 16:29:28,251 - INFO - Epoch 1 Step 2010 (Global: 2010): loss=1.5840, ppl=4.87, grad_norm=1.48, lr=9.74e-06, throughput=1319 tok/s
2025-11-23 16:35:15,580 - INFO - Epoch 1 Step 2020 (Global: 2020): loss=1.8787, ppl=6.55, grad_norm=1.56, lr=9.73e-06, throughput=1382 tok/s
2025-11-23 16:40:44,874 - INFO - Epoch 1 Step 2030 (Global: 2030): loss=1.6081, ppl=4.99, grad_norm=1.38, lr=9.73e-06, throughput=1458 tok/s
2025-11-23 16:46:12,759 - INFO - Epoch 1 Step 2040 (Global: 2040): loss=1.7043, ppl=5.50, grad_norm=1.53, lr=9.72e-06, throughput=1464 tok/s
2025-11-23 16:51:24,387 - INFO - Epoch 1 Step 2050 (Global: 2050): loss=2.0032, ppl=7.41, grad_norm=2.56, lr=9.72e-06, throughput=1540 tok/s
2025-11-23 16:56:45,686 - INFO - Epoch 1 Step 2060 (Global: 2060): loss=1.5537, ppl=4.73, grad_norm=1.30, lr=9.71e-06, throughput=1494 tok/s
2025-11-23 17:01:31,315 - INFO - Epoch 1 Step 2070 (Global: 2070): loss=1.8877, ppl=6.60, grad_norm=1.36, lr=9.71e-06, throughput=1681 tok/s
2025-11-23 17:06:46,520 - INFO - Epoch 1 Step 2080 (Global: 2080): loss=1.5625, ppl=4.77, grad_norm=1.55, lr=9.70e-06, throughput=1523 tok/s
2025-11-23 17:12:12,084 - INFO - Epoch 1 Step 2090 (Global: 2090): loss=1.5253, ppl=4.60, grad_norm=1.78, lr=9.69e-06, throughput=1474 tok/s
2025-11-23 17:17:20,195 - INFO - Epoch 1 Step 2100 (Global: 2100): loss=1.7189, ppl=5.58, grad_norm=2.14, lr=9.69e-06, throughput=1558 tok/s
2025-11-23 17:22:41,589 - INFO - Epoch 1 Step 2110 (Global: 2110): loss=1.7053, ppl=5.50, grad_norm=1.94, lr=9.68e-06, throughput=1494 tok/s
2025-11-23 17:27:57,429 - INFO - Epoch 1 Step 2120 (Global: 2120): loss=1.7178, ppl=5.57, grad_norm=1.29, lr=9.68e-06, throughput=1520 tok/s
2025-11-23 17:32:59,749 - INFO - Epoch 1 Step 2130 (Global: 2130): loss=1.7913, ppl=6.00, grad_norm=1.50, lr=9.67e-06, throughput=1588 tok/s
2025-11-23 17:37:44,189 - INFO - Epoch 1 Step 2140 (Global: 2140): loss=1.7613, ppl=5.82, grad_norm=1.55, lr=9.66e-06, throughput=1688 tok/s
2025-11-23 17:42:46,475 - INFO - Epoch 1 Step 2150 (Global: 2150): loss=1.7173, ppl=5.57, grad_norm=1.41, lr=9.66e-06, throughput=1588 tok/s
2025-11-23 17:47:43,749 - INFO - Epoch 1 Step 2160 (Global: 2160): loss=1.6595, ppl=5.26, grad_norm=1.48, lr=9.65e-06, throughput=1615 tok/s
2025-11-23 17:52:54,880 - INFO - Epoch 1 Step 2170 (Global: 2170): loss=1.6350, ppl=5.13, grad_norm=1.43, lr=9.65e-06, throughput=1543 tok/s
2025-11-23 17:58:12,598 - INFO - Epoch 1 Step 2180 (Global: 2180): loss=1.6980, ppl=5.46, grad_norm=1.35, lr=9.64e-06, throughput=1511 tok/s
2025-11-23 18:03:15,460 - INFO - Epoch 1 Step 2190 (Global: 2190): loss=1.7360, ppl=5.67, grad_norm=1.80, lr=9.63e-06, throughput=1585 tok/s
2025-11-23 18:08:32,374 - INFO - Epoch 1 Step 2200 (Global: 2200): loss=1.7259, ppl=5.62, grad_norm=1.47, lr=9.63e-06, throughput=1515 tok/s
2025-11-23 18:14:43,266 - INFO - Epoch 1 Step 2210 (Global: 2210): loss=1.6563, ppl=5.24, grad_norm=1.30, lr=9.62e-06, throughput=1294 tok/s
2025-11-23 18:20:24,539 - INFO - Epoch 1 Step 2220 (Global: 2220): loss=1.7047, ppl=5.50, grad_norm=1.62, lr=9.61e-06, throughput=1407 tok/s
2025-11-23 18:25:35,616 - INFO - Epoch 1 Step 2230 (Global: 2230): loss=1.7395, ppl=5.69, grad_norm=1.20, lr=9.61e-06, throughput=1543 tok/s
2025-11-23 18:30:53,709 - INFO - Epoch 1 Step 2240 (Global: 2240): loss=1.6651, ppl=5.29, grad_norm=1.91, lr=9.60e-06, throughput=1509 tok/s
2025-11-23 18:36:08,481 - INFO - Epoch 1 Step 2250 (Global: 2250): loss=1.6573, ppl=5.25, grad_norm=1.48, lr=9.60e-06, throughput=1525 tok/s
2025-11-23 18:41:28,262 - INFO - Epoch 1 Step 2260 (Global: 2260): loss=1.7934, ppl=6.01, grad_norm=1.52, lr=9.59e-06, throughput=1501 tok/s
2025-11-23 18:47:00,899 - INFO - Epoch 1 Step 2270 (Global: 2270): loss=1.6940, ppl=5.44, grad_norm=1.73, lr=9.58e-06, throughput=1443 tok/s
2025-11-23 18:52:23,802 - INFO - Epoch 1 Step 2280 (Global: 2280): loss=1.7043, ppl=5.50, grad_norm=1.18, lr=9.58e-06, throughput=1487 tok/s
2025-11-23 18:57:45,819 - INFO - Epoch 1 Step 2290 (Global: 2290): loss=1.7401, ppl=5.70, grad_norm=1.68, lr=9.57e-06, throughput=1491 tok/s
2025-11-23 19:02:51,879 - INFO - Epoch 1 Step 2300 (Global: 2300): loss=1.3915, ppl=4.02, grad_norm=1.76, lr=9.56e-06, throughput=1568 tok/s
2025-11-23 19:08:20,279 - INFO - Epoch 1 Step 2310 (Global: 2310): loss=1.7187, ppl=5.58, grad_norm=1.37, lr=9.55e-06, throughput=1462 tok/s
2025-11-23 19:13:45,519 - INFO - Epoch 1 Step 2320 (Global: 2320): loss=1.8331, ppl=6.25, grad_norm=1.18, lr=9.55e-06, throughput=1476 tok/s
2025-11-23 19:19:33,214 - INFO - Epoch 1 Step 2330 (Global: 2330): loss=1.9408, ppl=6.96, grad_norm=1.71, lr=9.54e-06, throughput=1381 tok/s
2025-11-23 19:24:49,551 - INFO - Epoch 1 Step 2340 (Global: 2340): loss=1.6437, ppl=5.17, grad_norm=1.62, lr=9.53e-06, throughput=1517 tok/s
2025-11-23 19:29:32,350 - INFO - Epoch 1 Step 2350 (Global: 2350): loss=1.7757, ppl=5.90, grad_norm=1.48, lr=9.53e-06, throughput=1697 tok/s
2025-11-23 19:34:27,415 - INFO - Epoch 1 Step 2360 (Global: 2360): loss=1.6096, ppl=5.00, grad_norm=1.28, lr=9.52e-06, throughput=1627 tok/s
2025-11-23 19:39:09,033 - INFO - Epoch 1 Step 2370 (Global: 2370): loss=1.6994, ppl=5.47, grad_norm=1.27, lr=9.51e-06, throughput=1704 tok/s
2025-11-23 19:44:01,369 - INFO - Epoch 1 Step 2380 (Global: 2380): loss=1.7777, ppl=5.92, grad_norm=1.48, lr=9.51e-06, throughput=1642 tok/s
2025-11-23 19:49:34,188 - INFO - Epoch 1 Step 2390 (Global: 2390): loss=1.5939, ppl=4.92, grad_norm=1.40, lr=9.50e-06, throughput=1442 tok/s
2025-11-23 19:56:01,035 - INFO - Epoch 1 Step 2400 (Global: 2400): loss=1.9223, ppl=6.84, grad_norm=1.26, lr=9.49e-06, throughput=1241 tok/s
2025-11-23 20:01:11,406 - INFO - Epoch 1 Step 2410 (Global: 2410): loss=1.6921, ppl=5.43, grad_norm=1.36, lr=9.48e-06, throughput=1547 tok/s
2025-11-23 20:06:31,240 - INFO - Epoch 1 Step 2420 (Global: 2420): loss=1.7208, ppl=5.59, grad_norm=1.23, lr=9.48e-06, throughput=1501 tok/s
2025-11-23 20:12:07,542 - INFO - Epoch 1 Step 2430 (Global: 2430): loss=1.6635, ppl=5.28, grad_norm=1.26, lr=9.47e-06, throughput=1427 tok/s
2025-11-23 20:17:19,543 - INFO - Epoch 1 Step 2440 (Global: 2440): loss=1.7167, ppl=5.57, grad_norm=1.40, lr=9.46e-06, throughput=1538 tok/s
2025-11-23 20:25:42,156 - INFO - Epoch 1 Step 2450 (Global: 2450): loss=1.6049, ppl=4.98, grad_norm=1.96, lr=9.45e-06, throughput=955 tok/s
2025-11-23 20:34:37,864 - INFO - Epoch 1 Step 2460 (Global: 2460): loss=1.6283, ppl=5.10, grad_norm=1.23, lr=9.45e-06, throughput=896 tok/s
2025-11-23 20:40:02,971 - INFO - Epoch 1 Step 2470 (Global: 2470): loss=1.8982, ppl=6.67, grad_norm=1.90, lr=9.44e-06, throughput=1476 tok/s
2025-11-23 20:45:34,212 - INFO - Epoch 1 Step 2480 (Global: 2480): loss=1.7487, ppl=5.75, grad_norm=1.52, lr=9.43e-06, throughput=1449 tok/s
2025-11-23 20:51:00,279 - INFO - Epoch 1 Step 2490 (Global: 2490): loss=1.7698, ppl=5.87, grad_norm=2.34, lr=9.42e-06, throughput=1472 tok/s
2025-11-23 20:56:36,283 - INFO - Epoch 1 Step 2500 (Global: 2500): loss=1.6904, ppl=5.42, grad_norm=1.37, lr=9.41e-06, throughput=1429 tok/s
2025-11-23 21:02:36,511 - INFO - Epoch 1 Step 2510 (Global: 2510): loss=1.6698, ppl=5.31, grad_norm=1.56, lr=9.41e-06, throughput=1332 tok/s
2025-11-23 21:08:47,354 - INFO - Epoch 1 Step 2520 (Global: 2520): loss=1.8006, ppl=6.05, grad_norm=1.36, lr=9.40e-06, throughput=1294 tok/s
2025-11-23 21:14:35,534 - INFO - Epoch 1 Step 2530 (Global: 2530): loss=1.6619, ppl=5.27, grad_norm=1.85, lr=9.39e-06, throughput=1379 tok/s
2025-11-23 21:20:28,094 - INFO - Epoch 1 Step 2540 (Global: 2540): loss=1.8045, ppl=6.08, grad_norm=1.65, lr=9.38e-06, throughput=1361 tok/s
2025-11-23 21:26:06,522 - INFO - Epoch 1 Step 2550 (Global: 2550): loss=1.8761, ppl=6.53, grad_norm=1.59, lr=9.37e-06, throughput=1418 tok/s
2025-11-23 21:32:04,632 - INFO - Epoch 1 Step 2560 (Global: 2560): loss=1.6369, ppl=5.14, grad_norm=1.77, lr=9.37e-06, throughput=1340 tok/s
2025-11-23 21:37:50,259 - INFO - Epoch 1 Step 2570 (Global: 2570): loss=1.6688, ppl=5.31, grad_norm=2.12, lr=9.36e-06, throughput=1389 tok/s
2025-11-23 21:43:38,570 - INFO - Epoch 1 Step 2580 (Global: 2580): loss=1.7748, ppl=5.90, grad_norm=1.41, lr=9.35e-06, throughput=1378 tok/s
2025-11-23 21:49:38,941 - INFO - Epoch 1 Step 2590 (Global: 2590): loss=1.7028, ppl=5.49, grad_norm=1.73, lr=9.34e-06, throughput=1332 tok/s
2025-11-23 21:55:28,819 - INFO - Epoch 1 Step 2600 (Global: 2600): loss=1.7614, ppl=5.82, grad_norm=1.34, lr=9.33e-06, throughput=1372 tok/s
2025-11-23 22:01:42,325 - INFO - Epoch 1 Step 2610 (Global: 2610): loss=1.6468, ppl=5.19, grad_norm=1.38, lr=9.32e-06, throughput=1285 tok/s
2025-11-23 22:07:37,175 - INFO - Epoch 1 Step 2620 (Global: 2620): loss=1.7488, ppl=5.75, grad_norm=1.42, lr=9.32e-06, throughput=1353 tok/s
2025-11-23 22:13:40,025 - INFO - Epoch 1 Step 2630 (Global: 2630): loss=1.4240, ppl=4.15, grad_norm=1.95, lr=9.31e-06, throughput=1323 tok/s
2025-11-23 22:18:58,677 - INFO - Epoch 1 Step 2640 (Global: 2640): loss=1.8505, ppl=6.36, grad_norm=2.38, lr=9.30e-06, throughput=1506 tok/s
2025-11-23 22:24:18,519 - INFO - Epoch 1 Step 2650 (Global: 2650): loss=1.8924, ppl=6.64, grad_norm=1.27, lr=9.29e-06, throughput=1501 tok/s
2025-11-23 22:29:51,566 - INFO - Epoch 1 Step 2660 (Global: 2660): loss=1.5050, ppl=4.50, grad_norm=1.20, lr=9.28e-06, throughput=1441 tok/s
2025-11-23 22:35:09,727 - INFO - Epoch 1 Step 2670 (Global: 2670): loss=1.6824, ppl=5.38, grad_norm=1.56, lr=9.27e-06, throughput=1509 tok/s
2025-11-23 22:40:45,836 - INFO - Epoch 1 Step 2680 (Global: 2680): loss=1.8919, ppl=6.63, grad_norm=2.00, lr=9.26e-06, throughput=1428 tok/s
2025-11-23 22:46:19,911 - INFO - Epoch 1 Step 2690 (Global: 2690): loss=1.6255, ppl=5.08, grad_norm=1.25, lr=9.26e-06, throughput=1437 tok/s
2025-11-23 22:51:53,547 - INFO - Epoch 1 Step 2700 (Global: 2700): loss=1.7758, ppl=5.90, grad_norm=1.88, lr=9.25e-06, throughput=1439 tok/s
2025-11-23 22:57:04,268 - INFO - Epoch 1 Step 2710 (Global: 2710): loss=1.8293, ppl=6.23, grad_norm=1.91, lr=9.24e-06, throughput=1545 tok/s
2025-11-23 23:02:27,662 - INFO - Epoch 1 Step 2720 (Global: 2720): loss=1.7584, ppl=5.80, grad_norm=1.23, lr=9.23e-06, throughput=1484 tok/s
2025-11-23 23:08:19,420 - INFO - Epoch 1 Step 2730 (Global: 2730): loss=1.7922, ppl=6.00, grad_norm=1.37, lr=9.22e-06, throughput=1365 tok/s
2025-11-23 23:13:47,425 - INFO - Epoch 1 Step 2740 (Global: 2740): loss=1.7002, ppl=5.47, grad_norm=1.41, lr=9.21e-06, throughput=1463 tok/s
2025-11-23 23:18:53,596 - INFO - Epoch 1 Step 2750 (Global: 2750): loss=1.5691, ppl=4.80, grad_norm=1.39, lr=9.20e-06, throughput=1568 tok/s
2025-11-23 23:23:41,528 - INFO - Epoch 1 Step 2760 (Global: 2760): loss=1.6779, ppl=5.35, grad_norm=1.44, lr=9.19e-06, throughput=1667 tok/s
2025-11-23 23:29:14,487 - INFO - Epoch 1 Step 2770 (Global: 2770): loss=1.6777, ppl=5.35, grad_norm=1.19, lr=9.18e-06, throughput=1442 tok/s
2025-11-23 23:35:28,897 - INFO - Epoch 1 Step 2780 (Global: 2780): loss=1.5896, ppl=4.90, grad_norm=1.55, lr=9.17e-06, throughput=1282 tok/s
2025-11-23 23:41:30,878 - INFO - Epoch 1 Step 2790 (Global: 2790): loss=1.5973, ppl=4.94, grad_norm=1.32, lr=9.17e-06, throughput=1326 tok/s
2025-11-23 23:47:16,233 - INFO - Epoch 1 Step 2800 (Global: 2800): loss=1.7595, ppl=5.81, grad_norm=1.69, lr=9.16e-06, throughput=1390 tok/s
2025-11-23 23:52:42,369 - INFO - Epoch 1 Step 2810 (Global: 2810): loss=1.5911, ppl=4.91, grad_norm=2.72, lr=9.15e-06, throughput=1472 tok/s
2025-11-23 23:57:44,159 - INFO - Epoch 1 Step 2820 (Global: 2820): loss=1.7860, ppl=5.97, grad_norm=1.94, lr=9.14e-06, throughput=1591 tok/s
2025-11-24 00:03:17,739 - INFO - Epoch 1 Step 2830 (Global: 2830): loss=1.6397, ppl=5.15, grad_norm=1.30, lr=9.13e-06, throughput=1439 tok/s
2025-11-24 00:09:25,197 - INFO - Epoch 1 Step 2840 (Global: 2840): loss=1.8479, ppl=6.35, grad_norm=1.59, lr=9.12e-06, throughput=1306 tok/s
2025-11-24 00:15:05,193 - INFO - Epoch 1 Step 2850 (Global: 2850): loss=1.5765, ppl=4.84, grad_norm=1.45, lr=9.11e-06, throughput=1412 tok/s
2025-11-24 00:20:54,612 - INFO - Epoch 1 Step 2860 (Global: 2860): loss=1.7694, ppl=5.87, grad_norm=1.53, lr=9.10e-06, throughput=1374 tok/s
2025-11-24 00:26:28,038 - INFO - Epoch 1 Step 2870 (Global: 2870): loss=1.6778, ppl=5.35, grad_norm=1.23, lr=9.09e-06, throughput=1440 tok/s
2025-11-24 00:31:41,738 - INFO - Epoch 1 Step 2880 (Global: 2880): loss=1.6309, ppl=5.11, grad_norm=1.38, lr=9.08e-06, throughput=1530 tok/s
2025-11-24 00:36:57,004 - INFO - Epoch 1 Step 2890 (Global: 2890): loss=1.7919, ppl=6.00, grad_norm=2.22, lr=9.07e-06, throughput=1523 tok/s
2025-11-24 00:42:18,190 - INFO - Epoch 1 Step 2900 (Global: 2900): loss=1.7632, ppl=5.83, grad_norm=1.21, lr=9.06e-06, throughput=1494 tok/s
2025-11-24 00:47:38,828 - INFO - Epoch 1 Step 2910 (Global: 2910): loss=1.6727, ppl=5.33, grad_norm=2.03, lr=9.05e-06, throughput=1497 tok/s
2025-11-24 00:52:30,881 - INFO - Epoch 1 Step 2920 (Global: 2920): loss=1.6450, ppl=5.18, grad_norm=1.41, lr=9.04e-06, throughput=1644 tok/s
2025-11-24 00:57:37,126 - INFO - Epoch 1 Step 2930 (Global: 2930): loss=1.6672, ppl=5.30, grad_norm=1.30, lr=9.03e-06, throughput=1567 tok/s
2025-11-24 01:02:20,409 - INFO - Epoch 1 Step 2940 (Global: 2940): loss=1.6456, ppl=5.18, grad_norm=1.23, lr=9.02e-06, throughput=1694 tok/s
2025-11-24 01:07:27,826 - INFO - Epoch 1 Step 2950 (Global: 2950): loss=1.7564, ppl=5.79, grad_norm=1.60, lr=9.01e-06, throughput=1561 tok/s
2025-11-24 01:12:15,201 - INFO - Epoch 1 Step 2960 (Global: 2960): loss=1.6244, ppl=5.08, grad_norm=1.32, lr=9.00e-06, throughput=1670 tok/s
2025-11-24 01:16:59,070 - INFO - Epoch 1 Step 2970 (Global: 2970): loss=1.7833, ppl=5.95, grad_norm=1.48, lr=8.99e-06, throughput=1691 tok/s
2025-11-24 01:21:56,480 - INFO - Epoch 1 Step 2980 (Global: 2980): loss=1.6224, ppl=5.07, grad_norm=1.27, lr=8.98e-06, throughput=1614 tok/s
2025-11-24 01:26:41,012 - INFO - Epoch 1 Step 2990 (Global: 2990): loss=1.6986, ppl=5.47, grad_norm=1.91, lr=8.97e-06, throughput=1687 tok/s
2025-11-24 01:31:43,010 - INFO - Epoch 1 Step 3000 (Global: 3000): loss=1.5507, ppl=4.71, grad_norm=1.50, lr=8.96e-06, throughput=1589 tok/s
2025-11-24 01:36:38,089 - INFO - Epoch 1 Step 3010 (Global: 3010): loss=1.6490, ppl=5.20, grad_norm=1.27, lr=8.95e-06, throughput=1627 tok/s
2025-11-24 01:41:37,816 - INFO - Epoch 1 Step 3020 (Global: 3020): loss=1.8260, ppl=6.21, grad_norm=1.59, lr=8.94e-06, throughput=1601 tok/s
2025-11-24 01:46:21,123 - INFO - Epoch 1 Step 3030 (Global: 3030): loss=1.7844, ppl=5.96, grad_norm=1.46, lr=8.93e-06, throughput=1694 tok/s
2025-11-24 01:51:00,091 - INFO - Epoch 1 Step 3040 (Global: 3040): loss=1.5789, ppl=4.85, grad_norm=2.00, lr=8.92e-06, throughput=1721 tok/s
2025-11-24 01:56:14,467 - INFO - Epoch 1 Step 3050 (Global: 3050): loss=1.7341, ppl=5.66, grad_norm=1.31, lr=8.91e-06, throughput=1527 tok/s
2025-11-24 02:01:45,395 - INFO - Epoch 1 Step 3060 (Global: 3060): loss=1.7360, ppl=5.67, grad_norm=1.38, lr=8.90e-06, throughput=1450 tok/s
2025-11-24 02:07:39,006 - INFO - Epoch 1 Step 3070 (Global: 3070): loss=1.7230, ppl=5.60, grad_norm=1.21, lr=8.89e-06, throughput=1357 tok/s
2025-11-24 02:13:16,143 - INFO - Epoch 1 Step 3080 (Global: 3080): loss=1.5886, ppl=4.90, grad_norm=1.72, lr=8.88e-06, throughput=1424 tok/s
2025-11-24 02:19:15,935 - INFO - Epoch 1 Step 3090 (Global: 3090): loss=1.3508, ppl=3.86, grad_norm=1.16, lr=8.87e-06, throughput=1334 tok/s
2025-11-24 02:24:40,801 - INFO - Epoch 1 Step 3100 (Global: 3100): loss=1.6252, ppl=5.08, grad_norm=1.83, lr=8.86e-06, throughput=1478 tok/s
2025-11-24 02:30:10,013 - INFO - Epoch 1 Step 3110 (Global: 3110): loss=1.5357, ppl=4.64, grad_norm=1.09, lr=8.85e-06, throughput=1458 tok/s
2025-11-24 02:35:51,781 - INFO - Epoch 1 Step 3120 (Global: 3120): loss=1.7571, ppl=5.80, grad_norm=1.12, lr=8.84e-06, throughput=1404 tok/s
2025-11-24 02:41:14,067 - INFO - Epoch 1 Step 3130 (Global: 3130): loss=1.7818, ppl=5.94, grad_norm=1.23, lr=8.82e-06, throughput=1489 tok/s
2025-11-24 02:46:50,770 - INFO - Epoch 1 Step 3140 (Global: 3140): loss=1.6905, ppl=5.42, grad_norm=1.34, lr=8.81e-06, throughput=1426 tok/s
2025-11-24 02:52:18,913 - INFO - Epoch 1 Step 3150 (Global: 3150): loss=1.5032, ppl=4.50, grad_norm=1.27, lr=8.80e-06, throughput=1463 tok/s
2025-11-24 02:58:07,690 - INFO - Epoch 1 Step 3160 (Global: 3160): loss=1.5661, ppl=4.79, grad_norm=1.30, lr=8.79e-06, throughput=1376 tok/s
2025-11-24 03:03:32,119 - INFO - Epoch 1 Step 3170 (Global: 3170): loss=1.8191, ppl=6.17, grad_norm=1.62, lr=8.78e-06, throughput=1480 tok/s
2025-11-24 03:09:08,975 - INFO - Epoch 1 Step 3180 (Global: 3180): loss=1.8804, ppl=6.56, grad_norm=1.70, lr=8.77e-06, throughput=1425 tok/s
2025-11-24 03:15:00,880 - INFO - Epoch 1 Step 3190 (Global: 3190): loss=1.7724, ppl=5.88, grad_norm=1.08, lr=8.76e-06, throughput=1364 tok/s
2025-11-24 03:20:26,196 - INFO - Epoch 1 Step 3200 (Global: 3200): loss=1.5930, ppl=4.92, grad_norm=1.60, lr=8.75e-06, throughput=1475 tok/s
2025-11-24 03:25:55,741 - INFO - Epoch 1 Step 3210 (Global: 3210): loss=1.8602, ppl=6.42, grad_norm=2.20, lr=8.74e-06, throughput=1457 tok/s
2025-11-24 03:31:19,744 - INFO - Epoch 1 Step 3220 (Global: 3220): loss=1.7019, ppl=5.48, grad_norm=1.60, lr=8.73e-06, throughput=1481 tok/s
2025-11-24 03:36:57,856 - INFO - Epoch 1 Step 3230 (Global: 3230): loss=1.4074, ppl=4.09, grad_norm=1.30, lr=8.71e-06, throughput=1420 tok/s
2025-11-24 03:42:20,732 - INFO - Epoch 1 Step 3240 (Global: 3240): loss=1.8178, ppl=6.16, grad_norm=1.64, lr=8.70e-06, throughput=1487 tok/s
2025-11-24 03:47:47,012 - INFO - Epoch 1 Step 3250 (Global: 3250): loss=1.5800, ppl=4.86, grad_norm=1.63, lr=8.69e-06, throughput=1471 tok/s
2025-11-24 03:53:19,792 - INFO - Epoch 1 Step 3260 (Global: 3260): loss=1.5935, ppl=4.92, grad_norm=1.19, lr=8.68e-06, throughput=1442 tok/s
2025-11-24 03:58:39,414 - INFO - Epoch 1 Step 3270 (Global: 3270): loss=1.9031, ppl=6.71, grad_norm=1.32, lr=8.67e-06, throughput=1502 tok/s
2025-11-24 04:04:08,989 - INFO - Epoch 1 Step 3280 (Global: 3280): loss=1.8715, ppl=6.50, grad_norm=1.67, lr=8.66e-06, throughput=1456 tok/s
2025-11-24 04:09:28,740 - INFO - Epoch 1 Step 3290 (Global: 3290): loss=1.9338, ppl=6.92, grad_norm=2.42, lr=8.65e-06, throughput=1501 tok/s
2025-11-24 04:14:58,238 - INFO - Epoch 1 Step 3300 (Global: 3300): loss=2.0750, ppl=7.96, grad_norm=1.55, lr=8.63e-06, throughput=1457 tok/s
2025-11-24 04:20:10,343 - INFO - Epoch 1 Step 3310 (Global: 3310): loss=1.6609, ppl=5.26, grad_norm=1.47, lr=8.62e-06, throughput=1538 tok/s
2025-11-24 04:25:39,870 - INFO - Epoch 1 Step 3320 (Global: 3320): loss=1.6898, ppl=5.42, grad_norm=1.61, lr=8.61e-06, throughput=1457 tok/s
2025-11-24 04:30:59,545 - INFO - Epoch 1 Step 3330 (Global: 3330): loss=1.3696, ppl=3.93, grad_norm=1.20, lr=8.60e-06, throughput=1502 tok/s
2025-11-24 04:36:17,206 - INFO - Epoch 1 Step 3340 (Global: 3340): loss=2.0836, ppl=8.03, grad_norm=1.38, lr=8.59e-06, throughput=1511 tok/s
2025-11-24 04:41:56,060 - INFO - Epoch 1 Step 3350 (Global: 3350): loss=1.8718, ppl=6.50, grad_norm=1.69, lr=8.58e-06, throughput=1417 tok/s
2025-11-24 04:47:18,524 - INFO - Epoch 1 Step 3360 (Global: 3360): loss=1.5494, ppl=4.71, grad_norm=1.12, lr=8.57e-06, throughput=1489 tok/s
2025-11-24 04:52:52,289 - INFO - Epoch 1 Step 3370 (Global: 3370): loss=1.7058, ppl=5.51, grad_norm=1.77, lr=8.55e-06, throughput=1438 tok/s
2025-11-24 04:58:03,644 - INFO - Epoch 1 Step 3380 (Global: 3380): loss=1.6674, ppl=5.30, grad_norm=1.83, lr=8.54e-06, throughput=1542 tok/s
2025-11-24 05:03:20,299 - INFO - Epoch 1 Step 3390 (Global: 3390): loss=1.8189, ppl=6.17, grad_norm=1.97, lr=8.53e-06, throughput=1516 tok/s
2025-11-24 05:08:25,849 - INFO - Epoch 1 Step 3400 (Global: 3400): loss=1.4819, ppl=4.40, grad_norm=1.77, lr=8.52e-06, throughput=1571 tok/s
2025-11-24 05:13:59,249 - INFO - Epoch 1 Step 3410 (Global: 3410): loss=1.7473, ppl=5.74, grad_norm=1.65, lr=8.51e-06, throughput=1440 tok/s
2025-11-24 05:19:15,307 - INFO - Epoch 1 Step 3420 (Global: 3420): loss=1.8197, ppl=6.17, grad_norm=1.98, lr=8.49e-06, throughput=1519 tok/s
2025-11-24 05:24:21,311 - INFO - Epoch 1 Step 3430 (Global: 3430): loss=1.4965, ppl=4.47, grad_norm=1.45, lr=8.48e-06, throughput=1569 tok/s
2025-11-24 05:29:43,568 - INFO - Epoch 1 Step 3440 (Global: 3440): loss=1.5994, ppl=4.95, grad_norm=1.33, lr=8.47e-06, throughput=1490 tok/s
2025-11-24 05:35:04,421 - INFO - Epoch 1 Step 3450 (Global: 3450): loss=1.6012, ppl=4.96, grad_norm=1.20, lr=8.46e-06, throughput=1496 tok/s
2025-11-24 05:40:51,719 - INFO - Epoch 1 Step 3460 (Global: 3460): loss=1.8142, ppl=6.14, grad_norm=2.08, lr=8.45e-06, throughput=1382 tok/s
2025-11-24 05:46:17,164 - INFO - Epoch 1 Step 3470 (Global: 3470): loss=1.7211, ppl=5.59, grad_norm=1.55, lr=8.43e-06, throughput=1475 tok/s
2025-11-24 05:51:55,436 - INFO - Epoch 1 Step 3480 (Global: 3480): loss=1.5514, ppl=4.72, grad_norm=1.51, lr=8.42e-06, throughput=1419 tok/s
2025-11-24 05:57:21,601 - INFO - Epoch 1 Step 3490 (Global: 3490): loss=1.7301, ppl=5.64, grad_norm=3.50, lr=8.41e-06, throughput=1472 tok/s
2025-11-24 06:03:00,345 - INFO - Epoch 1 Step 3500 (Global: 3500): loss=1.8923, ppl=6.63, grad_norm=2.00, lr=8.40e-06, throughput=1417 tok/s
2025-11-24 06:08:45,393 - INFO - Epoch 1 Step 3510 (Global: 3510): loss=1.8113, ppl=6.12, grad_norm=1.16, lr=8.38e-06, throughput=1391 tok/s
2025-11-24 06:14:37,678 - INFO - Epoch 1 Step 3520 (Global: 3520): loss=1.8193, ppl=6.17, grad_norm=1.69, lr=8.37e-06, throughput=1363 tok/s
2025-11-24 06:20:27,441 - INFO - Epoch 1 Step 3530 (Global: 3530): loss=1.5946, ppl=4.93, grad_norm=1.19, lr=8.36e-06, throughput=1372 tok/s
2025-11-24 06:25:49,937 - INFO - Epoch 1 Step 3540 (Global: 3540): loss=1.6871, ppl=5.40, grad_norm=1.34, lr=8.35e-06, throughput=1488 tok/s
2025-11-24 06:31:44,947 - INFO - Epoch 1 Step 3550 (Global: 3550): loss=1.8530, ppl=6.38, grad_norm=1.33, lr=8.33e-06, throughput=1352 tok/s
2025-11-24 06:37:57,900 - INFO - Epoch 1 Step 3560 (Global: 3560): loss=1.8897, ppl=6.62, grad_norm=1.21, lr=8.32e-06, throughput=1287 tok/s
2025-11-24 06:43:53,069 - INFO - Epoch 1 Step 3570 (Global: 3570): loss=1.5388, ppl=4.66, grad_norm=1.54, lr=8.31e-06, throughput=1351 tok/s
2025-11-24 06:49:28,442 - INFO - Epoch 1 Step 3580 (Global: 3580): loss=1.8543, ppl=6.39, grad_norm=1.44, lr=8.30e-06, throughput=1431 tok/s
2025-11-24 06:54:54,167 - INFO - Epoch 1 Step 3590 (Global: 3590): loss=1.5853, ppl=4.88, grad_norm=1.80, lr=8.28e-06, throughput=1474 tok/s
2025-11-24 07:00:52,272 - INFO - Epoch 1 Step 3600 (Global: 3600): loss=1.4740, ppl=4.37, grad_norm=1.42, lr=8.27e-06, throughput=1340 tok/s
2025-11-24 07:05:56,452 - INFO - Epoch 1 Step 3610 (Global: 3610): loss=1.6454, ppl=5.18, grad_norm=1.46, lr=8.26e-06, throughput=1578 tok/s
2025-11-24 07:11:17,551 - INFO - Epoch 1 Step 3620 (Global: 3620): loss=1.5001, ppl=4.48, grad_norm=2.09, lr=8.25e-06, throughput=1495 tok/s
2025-11-24 07:16:19,755 - INFO - Epoch 1 Step 3630 (Global: 3630): loss=1.3267, ppl=3.77, grad_norm=1.27, lr=8.23e-06, throughput=1588 tok/s
2025-11-24 07:21:32,891 - INFO - Epoch 1 Step 3640 (Global: 3640): loss=1.4707, ppl=4.35, grad_norm=1.77, lr=8.22e-06, throughput=1533 tok/s
2025-11-24 07:26:25,944 - INFO - Epoch 1 Step 3650 (Global: 3650): loss=1.8117, ppl=6.12, grad_norm=1.45, lr=8.21e-06, throughput=1638 tok/s
2025-11-24 07:31:33,781 - INFO - Epoch 1 Step 3660 (Global: 3660): loss=1.7129, ppl=5.55, grad_norm=1.70, lr=8.20e-06, throughput=1559 tok/s
2025-11-24 07:36:27,168 - INFO - Epoch 1 Step 3670 (Global: 3670): loss=1.5937, ppl=4.92, grad_norm=1.70, lr=8.18e-06, throughput=1636 tok/s
2025-11-24 07:41:15,906 - INFO - Epoch 1 Step 3680 (Global: 3680): loss=1.2619, ppl=3.53, grad_norm=2.50, lr=8.17e-06, throughput=1662 tok/s
2025-11-24 07:46:23,753 - INFO - Epoch 1 Step 3690 (Global: 3690): loss=1.6501, ppl=5.21, grad_norm=1.36, lr=8.16e-06, throughput=1559 tok/s
2025-11-24 07:51:13,200 - INFO - Epoch 1 Step 3700 (Global: 3700): loss=1.5883, ppl=4.90, grad_norm=1.42, lr=8.14e-06, throughput=1658 tok/s
2025-11-24 07:56:08,503 - INFO - Epoch 1 Step 3710 (Global: 3710): loss=1.4121, ppl=4.10, grad_norm=1.22, lr=8.13e-06, throughput=1625 tok/s
2025-11-24 08:00:50,688 - INFO - Epoch 1 Step 3720 (Global: 3720): loss=1.5653, ppl=4.78, grad_norm=1.75, lr=8.12e-06, throughput=1701 tok/s
2025-11-24 08:05:56,531 - INFO - Epoch 1 Step 3730 (Global: 3730): loss=1.8384, ppl=6.29, grad_norm=1.90, lr=8.10e-06, throughput=1569 tok/s
2025-11-24 08:10:46,192 - INFO - Epoch 1 Step 3740 (Global: 3740): loss=1.7941, ppl=6.01, grad_norm=4.50, lr=8.09e-06, throughput=1657 tok/s
2025-11-24 08:15:49,504 - INFO - Epoch 1 Step 3750 (Global: 3750): loss=1.5211, ppl=4.58, grad_norm=1.43, lr=8.08e-06, throughput=1583 tok/s
2025-11-24 08:20:45,287 - INFO - Epoch 1 Step 3760 (Global: 3760): loss=1.6703, ppl=5.31, grad_norm=1.98, lr=8.06e-06, throughput=1623 tok/s
2025-11-24 08:25:38,926 - INFO - Epoch 1 Step 3770 (Global: 3770): loss=1.5521, ppl=4.72, grad_norm=1.57, lr=8.05e-06, throughput=1635 tok/s
2025-11-24 08:30:51,433 - INFO - Epoch 1 Step 3780 (Global: 3780): loss=1.7336, ppl=5.66, grad_norm=1.19, lr=8.04e-06, throughput=1536 tok/s
2025-11-24 08:35:45,517 - INFO - Epoch 1 Step 3790 (Global: 3790): loss=1.4077, ppl=4.09, grad_norm=1.45, lr=8.02e-06, throughput=1632 tok/s
2025-11-24 08:40:49,663 - INFO - Epoch 1 Step 3800 (Global: 3800): loss=1.7765, ppl=5.91, grad_norm=1.19, lr=8.01e-06, throughput=1578 tok/s
2025-11-24 08:45:46,218 - INFO - Epoch 1 Step 3810 (Global: 3810): loss=1.7064, ppl=5.51, grad_norm=1.70, lr=8.00e-06, throughput=1619 tok/s
2025-11-24 08:50:51,738 - INFO - Epoch 1 Step 3820 (Global: 3820): loss=1.7030, ppl=5.49, grad_norm=1.18, lr=7.98e-06, throughput=1571 tok/s
2025-11-24 08:55:44,644 - INFO - Epoch 1 Step 3830 (Global: 3830): loss=1.7979, ppl=6.04, grad_norm=2.44, lr=7.97e-06, throughput=1639 tok/s
2025-11-24 09:00:48,510 - INFO - Epoch 1 Step 3840 (Global: 3840): loss=1.6486, ppl=5.20, grad_norm=1.50, lr=7.96e-06, throughput=1580 tok/s
2025-11-24 09:05:35,414 - INFO - Epoch 1 Step 3850 (Global: 3850): loss=1.7207, ppl=5.59, grad_norm=1.24, lr=7.94e-06, throughput=1673 tok/s
2025-11-24 09:10:21,520 - INFO - Epoch 1 Step 3860 (Global: 3860): loss=1.4644, ppl=4.32, grad_norm=1.80, lr=7.93e-06, throughput=1678 tok/s
2025-11-24 09:15:24,042 - INFO - Epoch 1 Step 3870 (Global: 3870): loss=1.6293, ppl=5.10, grad_norm=1.85, lr=7.92e-06, throughput=1587 tok/s
2025-11-24 09:20:08,035 - INFO - Epoch 1 Step 3880 (Global: 3880): loss=1.4442, ppl=4.24, grad_norm=2.05, lr=7.90e-06, throughput=1690 tok/s
2025-11-24 09:25:01,832 - INFO - Epoch 1 Step 3890 (Global: 3890): loss=1.7823, ppl=5.94, grad_norm=2.50, lr=7.89e-06, throughput=1634 tok/s
2025-11-24 09:29:39,762 - INFO - Epoch 1 Step 3900 (Global: 3900): loss=1.5292, ppl=4.61, grad_norm=1.89, lr=7.88e-06, throughput=1727 tok/s
2025-11-24 09:34:32,480 - INFO - Epoch 1 Step 3910 (Global: 3910): loss=1.6625, ppl=5.27, grad_norm=1.19, lr=7.86e-06, throughput=1640 tok/s
2025-11-24 09:39:23,077 - INFO - Epoch 1 Step 3920 (Global: 3920): loss=1.5899, ppl=4.90, grad_norm=1.07, lr=7.85e-06, throughput=1652 tok/s
2025-11-24 09:44:43,412 - INFO - Epoch 1 Step 3930 (Global: 3930): loss=1.6526, ppl=5.22, grad_norm=1.96, lr=7.83e-06, throughput=1498 tok/s
2025-11-24 09:49:51,452 - INFO - Epoch 1 Step 3940 (Global: 3940): loss=1.6522, ppl=5.22, grad_norm=2.11, lr=7.82e-06, throughput=1558 tok/s
2025-11-24 09:55:25,497 - INFO - Epoch 1 Step 3950 (Global: 3950): loss=1.8583, ppl=6.41, grad_norm=1.56, lr=7.81e-06, throughput=1437 tok/s
2025-11-24 10:01:16,797 - INFO - Epoch 1 Step 3960 (Global: 3960): loss=1.8830, ppl=6.57, grad_norm=1.43, lr=7.79e-06, throughput=1366 tok/s
2025-11-24 10:06:49,860 - INFO - Epoch 1 Step 3970 (Global: 3970): loss=1.6664, ppl=5.29, grad_norm=1.27, lr=7.78e-06, throughput=1441 tok/s
2025-11-24 10:12:43,874 - INFO - Epoch 1 Step 3980 (Global: 3980): loss=1.7669, ppl=5.85, grad_norm=1.30, lr=7.77e-06, throughput=1356 tok/s
2025-11-24 10:18:16,566 - INFO - Epoch 1 Step 3990 (Global: 3990): loss=1.6032, ppl=4.97, grad_norm=1.44, lr=7.75e-06, throughput=1443 tok/s
2025-11-24 10:24:05,151 - INFO - Epoch 1 Step 4000 (Global: 4000): loss=1.5491, ppl=4.71, grad_norm=2.28, lr=7.74e-06, throughput=1377 tok/s
2025-11-24 10:24:05,151 - INFO - 
Running validation at step 4000...
2025-11-24 10:34:55,631 - INFO - Validation loss: 1.6909, perplexity: 5.42
2025-11-24 10:34:55,631 - INFO - Qualitative metrics (n=5):
2025-11-24 10:34:55,632 - INFO -   BLEU: 0.1219
2025-11-24 10:34:55,632 - INFO -   METEOR: 0.2151
2025-11-24 10:34:55,632 - INFO -   Edit Distance: 0.7064
2025-11-24 10:34:55,632 - INFO -   F-measure: 0.2283
2025-11-24 10:34:55,632 - INFO - 
======================================================================
2025-11-24 10:34:55,632 - INFO - Qualitative Evaluation Samples:
2025-11-24 10:34:55,632 - INFO - ======================================================================
2025-11-24 10:34:55,633 - INFO - 
Sample 1 (ID: sample_141920_chunk_1):
2025-11-24 10:34:55,633 - INFO - Context:      [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-24 10:34:55,633 - INFO - Generated:    ' to the band\'s 2005 album, The A.V. Club, writing that "the album is a more accessible, less experimental, and more accessible album than the band\'s last two." He also said that "the album is a more a...'
2025-11-24 10:34:55,633 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-24 10:34:55,633 - INFO - ----------------------------------------------------------------------
2025-11-24 10:34:55,633 - INFO - 
Sample 2 (ID: sample_170543_chunk_2):
2025-11-24 10:34:55,633 - INFO - Context:      [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-24 10:34:55,633 - INFO - Generated:    'aternity groups in the United States. The Order of the Arrow was the first fraternity to adopt a Native American theme, and the first to adopt a Native American theme for a fraternity. The Order of th...'
2025-11-24 10:34:55,634 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-24 10:34:55,634 - INFO - ----------------------------------------------------------------------
2025-11-24 10:34:55,634 - INFO - 
Sample 3 (ID: sample_107152_chunk_9):
2025-11-24 10:34:55,634 - INFO - Context:      [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-24 10:34:55,635 - INFO - Generated:    " be defeated by Oga and Miki. Teimou's group is then defeated by Oga and Miki, and the Red Knights are defeated by Oga and Miki. Teimou is then defeated by Oga and Miki, and the Red Knights are defeat..."
2025-11-24 10:34:55,635 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-24 10:34:55,635 - INFO - ----------------------------------------------------------------------
2025-11-24 10:34:55,635 - INFO - 
Sample 4 (ID: sample_069148_chunk_0):
2025-11-24 10:34:55,635 - INFO - Context:      [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-24 10:34:55,635 - INFO - Generated:    '-0001 | 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0....'
2025-11-24 10:34:55,636 - INFO - Ground Truth: '-056  |             |                  | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam                                                 ...'
2025-11-24 10:34:55,636 - INFO - ----------------------------------------------------------------------
2025-11-24 10:34:55,636 - INFO - 
Sample 5 (ID: sample_103176_chunk_4):
2025-11-24 10:34:55,636 - INFO - Context:      [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-24 10:34:55,636 - INFO - Generated:    '1 | PlayStation 3 | EA Tiburon                                 | [ 150 ] |\n| Madden NFL 12                                 | August 30, 2011 | Windows Phone | EA Tiburon                               ...'
2025-11-24 10:34:55,636 - INFO - Ground Truth: '1                     | PlayStation 2             | EA Tiburon                                                        | [ 150 ]                 |\n| Madden NFL 12                                       ...'
2025-11-24 10:34:55,636 - INFO - ----------------------------------------------------------------------
2025-11-24 10:34:55,638 - INFO - 
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_4000.jsonl
2025-11-24 10:35:49,072 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-24 10:35:49,086 - INFO - New best validation loss: 1.6909, perplexity: 5.42
2025-11-24 10:40:49,307 - INFO - Epoch 1 Step 4010 (Global: 4010): loss=1.5952, ppl=4.93, grad_norm=1.45, lr=7.72e-06, throughput=1599 tok/s
2025-11-24 10:46:07,227 - INFO - Epoch 1 Step 4020 (Global: 4020): loss=1.8715, ppl=6.50, grad_norm=1.47, lr=7.71e-06, throughput=1510 tok/s
2025-11-24 10:51:03,584 - INFO - Epoch 1 Step 4030 (Global: 4030): loss=1.5472, ppl=4.70, grad_norm=1.63, lr=7.70e-06, throughput=1620 tok/s
2025-11-24 10:55:54,532 - INFO - Epoch 1 Step 4040 (Global: 4040): loss=1.7048, ppl=5.50, grad_norm=6.12, lr=7.68e-06, throughput=1650 tok/s
2025-11-24 11:00:52,224 - INFO - Epoch 1 Step 4050 (Global: 4050): loss=1.4836, ppl=4.41, grad_norm=1.63, lr=7.67e-06, throughput=1612 tok/s
2025-11-24 11:05:38,789 - INFO - Epoch 1 Step 4060 (Global: 4060): loss=1.7968, ppl=6.03, grad_norm=1.52, lr=7.65e-06, throughput=1675 tok/s
2025-11-24 11:10:39,171 - INFO - Epoch 1 Step 4070 (Global: 4070): loss=1.5424, ppl=4.68, grad_norm=1.31, lr=7.64e-06, throughput=1598 tok/s
2025-11-24 11:15:27,041 - INFO - Epoch 1 Step 4080 (Global: 4080): loss=1.7416, ppl=5.71, grad_norm=1.65, lr=7.62e-06, throughput=1667 tok/s
2025-11-24 11:20:15,228 - INFO - Epoch 1 Step 4090 (Global: 4090): loss=1.7182, ppl=5.57, grad_norm=1.27, lr=7.61e-06, throughput=1666 tok/s
2025-11-24 11:25:10,295 - INFO - Epoch 1 Step 4100 (Global: 4100): loss=1.6525, ppl=5.22, grad_norm=1.16, lr=7.60e-06, throughput=1627 tok/s
2025-11-24 11:29:58,737 - INFO - Epoch 1 Step 4110 (Global: 4110): loss=1.5644, ppl=4.78, grad_norm=1.30, lr=7.58e-06, throughput=1664 tok/s
2025-11-24 11:34:57,396 - INFO - Epoch 1 Step 4120 (Global: 4120): loss=1.8527, ppl=6.38, grad_norm=1.40, lr=7.57e-06, throughput=1607 tok/s
2025-11-24 11:39:41,750 - INFO - Epoch 1 Step 4130 (Global: 4130): loss=1.7791, ppl=5.92, grad_norm=1.83, lr=7.55e-06, throughput=1688 tok/s
2025-11-24 11:44:33,971 - INFO - Epoch 1 Step 4140 (Global: 4140): loss=1.6942, ppl=5.44, grad_norm=1.70, lr=7.54e-06, throughput=1643 tok/s
2025-11-24 11:49:16,575 - INFO - Epoch 1 Step 4150 (Global: 4150): loss=1.7041, ppl=5.50, grad_norm=1.30, lr=7.52e-06, throughput=1698 tok/s
2025-11-24 11:53:59,857 - INFO - Epoch 1 Step 4160 (Global: 4160): loss=1.4694, ppl=4.35, grad_norm=1.81, lr=7.51e-06, throughput=1694 tok/s
2025-11-24 11:58:58,298 - INFO - Epoch 1 Step 4170 (Global: 4170): loss=1.8423, ppl=6.31, grad_norm=1.33, lr=7.49e-06, throughput=1608 tok/s
2025-11-24 12:03:40,686 - INFO - Epoch 1 Step 4180 (Global: 4180): loss=1.7502, ppl=5.76, grad_norm=2.16, lr=7.48e-06, throughput=1700 tok/s
2025-11-24 12:08:32,851 - INFO - Epoch 1 Step 4190 (Global: 4190): loss=1.4775, ppl=4.38, grad_norm=1.74, lr=7.47e-06, throughput=1643 tok/s
2025-11-24 12:13:17,832 - INFO - Epoch 1 Step 4200 (Global: 4200): loss=1.5940, ppl=4.92, grad_norm=1.39, lr=7.45e-06, throughput=1684 tok/s
2025-11-24 12:18:12,681 - INFO - Epoch 1 Step 4210 (Global: 4210): loss=1.6196, ppl=5.05, grad_norm=1.88, lr=7.44e-06, throughput=1628 tok/s
2025-11-24 12:22:54,536 - INFO - Epoch 1 Step 4220 (Global: 4220): loss=1.7733, ppl=5.89, grad_norm=1.73, lr=7.42e-06, throughput=1703 tok/s
2025-11-24 12:27:47,337 - INFO - Epoch 1 Step 4230 (Global: 4230): loss=1.7052, ppl=5.50, grad_norm=1.46, lr=7.41e-06, throughput=1639 tok/s
2025-11-24 12:32:31,319 - INFO - Epoch 1 Step 4240 (Global: 4240): loss=1.7652, ppl=5.84, grad_norm=1.74, lr=7.39e-06, throughput=1690 tok/s
2025-11-24 12:37:13,979 - INFO - Epoch 1 Step 4250 (Global: 4250): loss=1.7193, ppl=5.58, grad_norm=1.34, lr=7.38e-06, throughput=1698 tok/s
2025-11-24 12:42:07,589 - INFO - Epoch 1 Step 4260 (Global: 4260): loss=1.6922, ppl=5.43, grad_norm=1.59, lr=7.36e-06, throughput=1635 tok/s
2025-11-24 12:48:03,280 - INFO - Epoch 1 Step 4270 (Global: 4270): loss=1.6225, ppl=5.07, grad_norm=1.38, lr=7.35e-06, throughput=1349 tok/s
2025-11-24 12:54:47,243 - INFO - Epoch 1 Step 4280 (Global: 4280): loss=1.5675, ppl=4.79, grad_norm=1.78, lr=7.33e-06, throughput=1188 tok/s
2025-11-24 13:00:25,667 - INFO - Epoch 1 Step 4290 (Global: 4290): loss=1.5705, ppl=4.81, grad_norm=1.40, lr=7.32e-06, throughput=1418 tok/s
2025-11-24 13:06:27,594 - INFO - Epoch 1 Step 4300 (Global: 4300): loss=1.4040, ppl=4.07, grad_norm=1.33, lr=7.30e-06, throughput=1326 tok/s
2025-11-24 13:12:23,908 - INFO - Epoch 1 Step 4310 (Global: 4310): loss=1.3817, ppl=3.98, grad_norm=1.28, lr=7.29e-06, throughput=1347 tok/s
2025-11-24 13:18:05,840 - INFO - Epoch 1 Step 4320 (Global: 4320): loss=1.7035, ppl=5.49, grad_norm=1.75, lr=7.27e-06, throughput=1404 tok/s
2025-11-24 13:24:23,129 - INFO - Epoch 1 Step 4330 (Global: 4330): loss=1.5989, ppl=4.95, grad_norm=1.58, lr=7.26e-06, throughput=1272 tok/s
2025-11-24 13:30:03,346 - INFO - Epoch 1 Step 4340 (Global: 4340): loss=1.5159, ppl=4.55, grad_norm=1.21, lr=7.24e-06, throughput=1411 tok/s
2025-11-24 13:36:08,610 - INFO - Epoch 1 Step 4350 (Global: 4350): loss=1.6193, ppl=5.05, grad_norm=1.45, lr=7.23e-06, throughput=1314 tok/s
2025-11-24 13:42:07,704 - INFO - Epoch 1 Step 4360 (Global: 4360): loss=1.4276, ppl=4.17, grad_norm=1.23, lr=7.21e-06, throughput=1337 tok/s
2025-11-24 13:48:14,093 - INFO - Epoch 1 Step 4370 (Global: 4370): loss=1.7500, ppl=5.75, grad_norm=1.48, lr=7.20e-06, throughput=1310 tok/s
2025-11-24 13:54:08,079 - INFO - Epoch 1 Step 4380 (Global: 4380): loss=1.7672, ppl=5.85, grad_norm=1.41, lr=7.18e-06, throughput=1356 tok/s
2025-11-24 13:59:35,042 - INFO - Epoch 1 Step 4390 (Global: 4390): loss=1.7438, ppl=5.72, grad_norm=1.67, lr=7.17e-06, throughput=1468 tok/s
2025-11-24 14:05:04,279 - INFO - Epoch 1 Step 4400 (Global: 4400): loss=1.5519, ppl=4.72, grad_norm=1.78, lr=7.15e-06, throughput=1458 tok/s
2025-11-24 14:10:30,165 - INFO - Epoch 1 Step 4410 (Global: 4410): loss=1.7879, ppl=5.98, grad_norm=2.75, lr=7.14e-06, throughput=1473 tok/s
2025-11-24 14:16:18,511 - INFO - Epoch 1 Step 4420 (Global: 4420): loss=1.8286, ppl=6.23, grad_norm=1.95, lr=7.12e-06, throughput=1378 tok/s
2025-11-24 14:21:20,515 - INFO - Epoch 1 Step 4430 (Global: 4430): loss=1.6419, ppl=5.16, grad_norm=2.11, lr=7.11e-06, throughput=1589 tok/s
2025-11-24 14:26:34,189 - INFO - Epoch 1 Step 4440 (Global: 4440): loss=1.9620, ppl=7.11, grad_norm=1.87, lr=7.09e-06, throughput=1530 tok/s
2025-11-24 14:31:04,980 - INFO - Epoch 1 Step 4450 (Global: 4450): loss=1.5524, ppl=4.72, grad_norm=1.19, lr=7.08e-06, throughput=1773 tok/s
2025-11-24 14:35:38,887 - INFO - Epoch 1 Step 4460 (Global: 4460): loss=1.8279, ppl=6.22, grad_norm=1.58, lr=7.06e-06, throughput=1752 tok/s
2025-11-24 14:40:07,422 - INFO - Epoch 1 Step 4470 (Global: 4470): loss=1.6282, ppl=5.09, grad_norm=1.30, lr=7.05e-06, throughput=1787 tok/s
2025-11-24 14:44:42,449 - INFO - Epoch 1 Step 4480 (Global: 4480): loss=1.8558, ppl=6.40, grad_norm=1.24, lr=7.03e-06, throughput=1745 tok/s
2025-11-24 14:49:22,542 - INFO - Epoch 1 Step 4490 (Global: 4490): loss=1.7864, ppl=5.97, grad_norm=7.28, lr=7.02e-06, throughput=1714 tok/s
2025-11-24 14:53:54,527 - INFO - Epoch 1 Step 4500 (Global: 4500): loss=1.7713, ppl=5.88, grad_norm=2.59, lr=7.00e-06, throughput=1765 tok/s
2025-11-24 14:58:31,101 - INFO - Epoch 1 Step 4510 (Global: 4510): loss=1.6332, ppl=5.12, grad_norm=1.82, lr=6.99e-06, throughput=1736 tok/s
2025-11-24 15:02:59,034 - INFO - Epoch 1 Step 4520 (Global: 4520): loss=1.6177, ppl=5.04, grad_norm=1.27, lr=6.97e-06, throughput=1792 tok/s
2025-11-24 15:07:57,382 - INFO - Epoch 1 Step 4530 (Global: 4530): loss=1.5990, ppl=4.95, grad_norm=1.16, lr=6.96e-06, throughput=1609 tok/s
2025-11-24 15:13:33,320 - INFO - Epoch 1 Step 4540 (Global: 4540): loss=1.4769, ppl=4.38, grad_norm=1.83, lr=6.94e-06, throughput=1429 tok/s
2025-11-24 15:38:05,256 - INFO - Epoch 1 Step 4550 (Global: 4550): loss=1.6511, ppl=5.21, grad_norm=1.63, lr=6.92e-06, throughput=326 tok/s
2025-11-24 15:49:18,639 - INFO - Epoch 1 Step 4560 (Global: 4560): loss=1.6620, ppl=5.27, grad_norm=1.73, lr=6.91e-06, throughput=713 tok/s
2025-11-24 15:59:13,640 - INFO - Epoch 1 Step 4570 (Global: 4570): loss=1.5077, ppl=4.52, grad_norm=1.45, lr=6.89e-06, throughput=807 tok/s
2025-11-24 16:08:01,964 - INFO - Epoch 1 Step 4580 (Global: 4580): loss=1.4775, ppl=4.38, grad_norm=1.52, lr=6.88e-06, throughput=909 tok/s
2025-11-24 16:13:46,186 - INFO - Epoch 1 Step 4590 (Global: 4590): loss=1.6900, ppl=5.42, grad_norm=1.59, lr=6.86e-06, throughput=1394 tok/s
2025-11-24 16:18:18,138 - INFO - Epoch 1 Step 4600 (Global: 4600): loss=1.6801, ppl=5.37, grad_norm=1.41, lr=6.85e-06, throughput=1765 tok/s
2025-11-24 16:22:39,742 - INFO - Epoch 1 Step 4610 (Global: 4610): loss=1.5483, ppl=4.70, grad_norm=1.45, lr=6.83e-06, throughput=1835 tok/s
2025-11-24 16:26:58,750 - INFO - Epoch 1 Step 4620 (Global: 4620): loss=1.8758, ppl=6.53, grad_norm=1.44, lr=6.82e-06, throughput=1853 tok/s
2025-11-24 16:31:26,942 - INFO - Epoch 1 Step 4630 (Global: 4630): loss=1.6358, ppl=5.13, grad_norm=1.27, lr=6.80e-06, throughput=1790 tok/s
2025-11-24 16:35:45,149 - INFO - Epoch 1 Step 4640 (Global: 4640): loss=1.7868, ppl=5.97, grad_norm=1.57, lr=6.78e-06, throughput=1859 tok/s
2025-11-24 16:40:13,077 - INFO - Epoch 1 Step 4650 (Global: 4650): loss=1.4805, ppl=4.40, grad_norm=3.16, lr=6.77e-06, throughput=1792 tok/s
2025-11-24 16:44:29,825 - INFO - Epoch 1 Step 4660 (Global: 4660): loss=1.6962, ppl=5.45, grad_norm=2.38, lr=6.75e-06, throughput=1870 tok/s
2025-11-24 16:48:49,999 - INFO - Epoch 1 Step 4670 (Global: 4670): loss=1.5473, ppl=4.70, grad_norm=1.40, lr=6.74e-06, throughput=1845 tok/s
2025-11-24 16:53:07,333 - INFO - Epoch 1 Step 4680 (Global: 4680): loss=1.4236, ppl=4.15, grad_norm=1.30, lr=6.72e-06, throughput=1865 tok/s
2025-11-24 16:57:24,347 - INFO - Epoch 1 Step 4690 (Global: 4690): loss=1.6978, ppl=5.46, grad_norm=1.54, lr=6.71e-06, throughput=1868 tok/s
2025-11-24 17:01:54,819 - INFO - Epoch 1 Step 4700 (Global: 4700): loss=1.6431, ppl=5.17, grad_norm=1.39, lr=6.69e-06, throughput=1775 tok/s
2025-11-24 17:06:15,846 - INFO - Epoch 1 Step 4710 (Global: 4710): loss=1.5693, ppl=4.80, grad_norm=1.38, lr=6.67e-06, throughput=1839 tok/s
2025-11-24 17:10:49,945 - INFO - Epoch 1 Step 4720 (Global: 4720): loss=1.6784, ppl=5.36, grad_norm=1.48, lr=6.66e-06, throughput=1751 tok/s
2025-11-24 17:15:15,976 - INFO - Epoch 1 Step 4730 (Global: 4730): loss=1.8042, ppl=6.07, grad_norm=1.58, lr=6.64e-06, throughput=1804 tok/s
2025-11-24 17:19:50,486 - INFO - Epoch 1 Step 4740 (Global: 4740): loss=1.8439, ppl=6.32, grad_norm=1.70, lr=6.63e-06, throughput=1749 tok/s
2025-11-24 17:24:21,174 - INFO - Epoch 1 Step 4750 (Global: 4750): loss=1.7036, ppl=5.49, grad_norm=1.65, lr=6.61e-06, throughput=1773 tok/s
2025-11-24 17:29:44,257 - INFO - Epoch 1 Step 4760 (Global: 4760): loss=1.4958, ppl=4.46, grad_norm=1.52, lr=6.60e-06, throughput=1486 tok/s
2025-11-24 17:34:37,687 - INFO - Epoch 1 Step 4770 (Global: 4770): loss=1.8783, ppl=6.54, grad_norm=2.72, lr=6.58e-06, throughput=1636 tok/s
2025-11-24 17:39:32,720 - INFO - Epoch 1 Step 4780 (Global: 4780): loss=1.6077, ppl=4.99, grad_norm=1.59, lr=6.56e-06, throughput=1627 tok/s
2025-11-24 17:44:14,429 - INFO - Epoch 1 Step 4790 (Global: 4790): loss=1.5547, ppl=4.73, grad_norm=1.70, lr=6.55e-06, throughput=1704 tok/s
2025-11-24 17:51:56,493 - INFO - Epoch 1 Step 4800 (Global: 4800): loss=1.7137, ppl=5.55, grad_norm=1.88, lr=6.53e-06, throughput=1039 tok/s
2025-11-24 17:58:09,076 - INFO - Epoch 1 Step 4810 (Global: 4810): loss=1.7015, ppl=5.48, grad_norm=1.14, lr=6.52e-06, throughput=1288 tok/s
2025-11-24 18:02:52,954 - INFO - Epoch 1 Step 4820 (Global: 4820): loss=1.6154, ppl=5.03, grad_norm=1.92, lr=6.50e-06, throughput=1691 tok/s
2025-11-24 18:08:17,622 - INFO - Epoch 1 Step 4830 (Global: 4830): loss=1.5996, ppl=4.95, grad_norm=1.47, lr=6.48e-06, throughput=1478 tok/s
2025-11-24 18:12:58,704 - INFO - Epoch 1 Step 4840 (Global: 4840): loss=1.7646, ppl=5.84, grad_norm=1.21, lr=6.47e-06, throughput=1708 tok/s
2025-11-24 18:18:32,301 - INFO - Epoch 1 Step 4850 (Global: 4850): loss=1.6626, ppl=5.27, grad_norm=1.12, lr=6.45e-06, throughput=1439 tok/s
2025-11-24 18:23:52,736 - INFO - Epoch 1 Step 4860 (Global: 4860): loss=1.6853, ppl=5.39, grad_norm=1.52, lr=6.44e-06, throughput=1498 tok/s
2025-11-24 18:29:07,701 - INFO - Epoch 1 Step 4870 (Global: 4870): loss=1.7637, ppl=5.83, grad_norm=1.21, lr=6.42e-06, throughput=1524 tok/s
2025-11-24 18:35:38,534 - INFO - Epoch 1 Step 4880 (Global: 4880): loss=1.3831, ppl=3.99, grad_norm=1.15, lr=6.40e-06, throughput=1228 tok/s
2025-11-24 18:45:29,610 - INFO - Epoch 1 Step 4890 (Global: 4890): loss=1.6838, ppl=5.39, grad_norm=1.51, lr=6.39e-06, throughput=812 tok/s
2025-11-24 18:51:08,000 - INFO - Epoch 1 Step 4900 (Global: 4900): loss=1.6571, ppl=5.24, grad_norm=1.37, lr=6.37e-06, throughput=1418 tok/s
2025-11-24 18:56:00,536 - INFO - Epoch 1 Step 4910 (Global: 4910): loss=1.6703, ppl=5.31, grad_norm=1.94, lr=6.35e-06, throughput=1641 tok/s
2025-11-24 19:01:10,305 - INFO - Epoch 1 Step 4920 (Global: 4920): loss=1.6507, ppl=5.21, grad_norm=1.88, lr=6.34e-06, throughput=1550 tok/s
2025-11-24 19:05:56,022 - INFO - Epoch 1 Step 4930 (Global: 4930): loss=1.5251, ppl=4.60, grad_norm=1.55, lr=6.32e-06, throughput=1680 tok/s
2025-11-24 19:10:36,606 - INFO - Epoch 1 Step 4940 (Global: 4940): loss=1.7246, ppl=5.61, grad_norm=1.54, lr=6.31e-06, throughput=1711 tok/s
2025-11-24 19:15:24,274 - INFO - Epoch 1 Step 4950 (Global: 4950): loss=1.7868, ppl=5.97, grad_norm=1.60, lr=6.29e-06, throughput=1669 tok/s
2025-11-24 19:21:19,820 - INFO - Epoch 1 Step 4960 (Global: 4960): loss=1.6828, ppl=5.38, grad_norm=1.81, lr=6.27e-06, throughput=1350 tok/s
2025-11-24 19:26:40,515 - INFO - Epoch 1 Step 4970 (Global: 4970): loss=1.4798, ppl=4.39, grad_norm=2.20, lr=6.26e-06, throughput=1497 tok/s
2025-11-24 19:32:32,408 - INFO - Epoch 1 Step 4980 (Global: 4980): loss=1.5995, ppl=4.95, grad_norm=1.47, lr=6.24e-06, throughput=1364 tok/s
2025-11-24 19:38:55,616 - INFO - Epoch 1 Step 4990 (Global: 4990): loss=1.4227, ppl=4.15, grad_norm=1.29, lr=6.23e-06, throughput=1253 tok/s
2025-11-24 19:44:19,138 - INFO - Epoch 1 Step 5000 (Global: 5000): loss=1.5556, ppl=4.74, grad_norm=1.41, lr=6.21e-06, throughput=1484 tok/s
2025-11-24 19:49:32,844 - INFO - Epoch 1 Step 5010 (Global: 5010): loss=1.7958, ppl=6.02, grad_norm=1.37, lr=6.19e-06, throughput=1530 tok/s
2025-11-24 19:54:37,768 - INFO - Epoch 1 Step 5020 (Global: 5020): loss=1.6070, ppl=4.99, grad_norm=1.14, lr=6.18e-06, throughput=1574 tok/s
2025-11-24 20:01:18,068 - INFO - Epoch 1 Step 5030 (Global: 5030): loss=1.3993, ppl=4.05, grad_norm=1.28, lr=6.16e-06, throughput=1199 tok/s
2025-11-24 20:08:01,868 - INFO - Epoch 1 Step 5040 (Global: 5040): loss=1.6521, ppl=5.22, grad_norm=1.72, lr=6.14e-06, throughput=1189 tok/s
2025-11-24 20:14:15,983 - INFO - Epoch 1 Step 5050 (Global: 5050): loss=1.8766, ppl=6.53, grad_norm=1.30, lr=6.13e-06, throughput=1283 tok/s
2025-11-24 20:19:47,351 - INFO - Epoch 1 Step 5060 (Global: 5060): loss=1.4195, ppl=4.14, grad_norm=1.38, lr=6.11e-06, throughput=1449 tok/s
2025-11-24 20:26:12,314 - INFO - Epoch 1 Step 5070 (Global: 5070): loss=1.5076, ppl=4.52, grad_norm=1.33, lr=6.10e-06, throughput=1247 tok/s
2025-11-24 20:32:10,147 - INFO - Epoch 1 Step 5080 (Global: 5080): loss=1.8086, ppl=6.10, grad_norm=1.33, lr=6.08e-06, throughput=1341 tok/s
2025-11-24 20:38:06,445 - INFO - Epoch 1 Step 5090 (Global: 5090): loss=1.7168, ppl=5.57, grad_norm=1.63, lr=6.06e-06, throughput=1347 tok/s
2025-11-24 20:44:35,995 - INFO - Epoch 1 Step 5100 (Global: 5100): loss=1.3610, ppl=3.90, grad_norm=1.21, lr=6.05e-06, throughput=1232 tok/s
2025-11-24 20:50:29,737 - INFO - Epoch 1 Step 5110 (Global: 5110): loss=1.5522, ppl=4.72, grad_norm=1.16, lr=6.03e-06, throughput=1357 tok/s
2025-11-24 20:55:36,507 - INFO - Epoch 1 Step 5120 (Global: 5120): loss=1.6613, ppl=5.27, grad_norm=1.71, lr=6.01e-06, throughput=1565 tok/s
2025-11-24 21:00:36,968 - INFO - Epoch 1 Step 5130 (Global: 5130): loss=1.6936, ppl=5.44, grad_norm=1.21, lr=6.00e-06, throughput=1598 tok/s
2025-11-24 21:05:26,721 - INFO - Epoch 1 Step 5140 (Global: 5140): loss=1.6708, ppl=5.32, grad_norm=1.38, lr=5.98e-06, throughput=1657 tok/s
2025-11-24 21:10:26,974 - INFO - Epoch 1 Step 5150 (Global: 5150): loss=1.4891, ppl=4.43, grad_norm=1.62, lr=5.96e-06, throughput=1599 tok/s
2025-11-24 21:18:16,200 - INFO - Epoch 1 Step 5160 (Global: 5160): loss=1.6576, ppl=5.25, grad_norm=1.14, lr=5.95e-06, throughput=1023 tok/s
2025-11-24 21:23:57,548 - INFO - Epoch 1 Step 5170 (Global: 5170): loss=1.5366, ppl=4.65, grad_norm=1.32, lr=5.93e-06, throughput=1406 tok/s
2025-11-24 21:30:13,363 - INFO - Epoch 1 Step 5180 (Global: 5180): loss=1.5580, ppl=4.75, grad_norm=2.41, lr=5.91e-06, throughput=1277 tok/s
2025-11-24 21:36:54,574 - INFO - Epoch 1 Step 5190 (Global: 5190): loss=1.5646, ppl=4.78, grad_norm=1.39, lr=5.90e-06, throughput=1196 tok/s
2025-11-24 21:43:33,418 - INFO - Epoch 1 Step 5200 (Global: 5200): loss=1.6517, ppl=5.22, grad_norm=1.87, lr=5.88e-06, throughput=1203 tok/s
2025-11-24 21:50:27,497 - INFO - Epoch 1 Step 5210 (Global: 5210): loss=1.8906, ppl=6.62, grad_norm=1.43, lr=5.87e-06, throughput=1159 tok/s
2025-11-24 21:57:08,741 - INFO - Epoch 1 Step 5220 (Global: 5220): loss=1.7470, ppl=5.74, grad_norm=1.31, lr=5.85e-06, throughput=1196 tok/s
2025-11-24 22:03:51,738 - INFO - Epoch 1 Step 5230 (Global: 5230): loss=1.5311, ppl=4.62, grad_norm=1.46, lr=5.83e-06, throughput=1191 tok/s
2025-11-24 22:08:43,515 - INFO - Epoch 1 Step 5240 (Global: 5240): loss=1.5680, ppl=4.80, grad_norm=1.16, lr=5.82e-06, throughput=1645 tok/s
2025-11-24 22:13:51,117 - INFO - Epoch 1 Step 5250 (Global: 5250): loss=1.7145, ppl=5.55, grad_norm=1.57, lr=5.80e-06, throughput=1560 tok/s
2025-11-24 22:18:56,787 - INFO - Epoch 1 Step 5260 (Global: 5260): loss=1.6998, ppl=5.47, grad_norm=1.27, lr=5.78e-06, throughput=1570 tok/s
2025-11-24 22:24:04,732 - INFO - Epoch 1 Step 5270 (Global: 5270): loss=1.7779, ppl=5.92, grad_norm=1.52, lr=5.77e-06, throughput=1559 tok/s
2025-11-24 22:28:59,541 - INFO - Epoch 1 Step 5280 (Global: 5280): loss=1.8119, ppl=6.12, grad_norm=1.16, lr=5.75e-06, throughput=1628 tok/s
2025-11-24 22:33:56,792 - INFO - Epoch 1 Step 5290 (Global: 5290): loss=1.6627, ppl=5.27, grad_norm=1.20, lr=5.73e-06, throughput=1615 tok/s
2025-11-26 10:54:56,045 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train_arrow', output_dir='outputs/production_vision_base_lm_20251123_003859', objective='lm', val_data_path='data/training/splits_510k/val_arrow', max_samples=None, vision_mode='base', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='\nFree OCR.', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=True, compression_target=None, conv_kernel=5, timestamp='20251123_003859', batch_size=8, gradient_accumulation_steps=6, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=False, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name='production_vision_base_lm_20251123_003859', resume_from_checkpoint='outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt', resume='outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt', init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=16, prefetch_factor=4, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=False, compile_mode='default', use_optimized_model=True, use_encoder_checkpointing=True, use_decoder_checkpointing=True, use_8bit_optimizer=False)
2025-11-26 10:54:56,045 - WARNING - --train_projection is deprecated. Use --train_encoder instead. Automatically setting --train_encoder=True.
2025-11-26 10:54:56,045 - INFO - Resuming training from checkpoint: outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-26 10:54:56,045 - INFO - Continuing outputs in directory: outputs/production_vision_base_lm_20251123_003859
2025-11-26 10:54:56,045 - INFO - Using custom vision prompt: ''\nFree OCR.''
2025-11-26 10:54:56,046 - INFO - Setting random seed: 42
2025-11-26 10:54:56,855 - INFO - Peeking checkpoint metadata from outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-26 10:55:06,973 - INFO - Checkpoint metadata: epoch=0, batch_idx=23999, global_step=4000
2025-11-26 10:55:06,974 - INFO -   W&B run ID: 7aj57hve
2025-11-26 10:55:07,054 - INFO - Checkpoint has WandB run ID: 7aj57hve
2025-11-26 10:55:07,055 - INFO - Creating fresh WandB run (not resuming to avoid stale data)
2025-11-26 10:55:08,268 - INFO - Initialized W&B run: vision-compression-2/production_vision_base_lm_20251123_003859 (ID: xyk0cc3f)
2025-11-26 10:55:08,268 - INFO - Loading model and tokenizer...
2025-11-26 10:55:18,545 - INFO - Enabling decoder gradient checkpointing...
2025-11-26 10:55:18,554 - INFO -   ✓ Decoder checkpointing enabled for 12 transformer layers
2025-11-26 10:55:18,554 - INFO -   Expected: ~30-50% activation memory reduction, ~15-20% compute overhead
2025-11-26 10:55:18,586 - INFO - Created Vision Compression trainer (mode: base)
2025-11-26 10:55:18,587 - INFO - Training objective: lm
2025-11-26 10:55:18,624 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640
2025-11-26 10:55:18,624 - INFO - Loading training data from data/training/splits_510k/train_arrow
2025-11-26 10:55:18,624 - INFO - Detected Arrow format: data/training/splits_510k/train_arrow
2025-11-26 10:55:18,625 - INFO - Loading Arrow dataset from data/training/splits_510k/train_arrow (memory-mapped)
2025-11-26 10:55:18,677 - INFO - Loaded 500,000 samples from data/training/splits_510k/train_arrow (memory-mapped)
2025-11-26 10:55:18,677 - INFO - Vision mode: base (273 tokens, 1024x1024)
2025-11-26 10:55:18,677 - INFO - Mid-epoch resume: skipping first 192000 samples at sampler level (batch 24000)
2025-11-26 10:55:18,774 - INFO - Loading validation data from data/training/splits_510k/val_arrow
2025-11-26 10:55:18,774 - INFO - Detected Arrow format: data/training/splits_510k/val_arrow
2025-11-26 10:55:18,775 - INFO - Loading Arrow dataset from data/training/splits_510k/val_arrow (memory-mapped)
2025-11-26 10:55:18,783 - INFO - Loaded 10,000 samples from data/training/splits_510k/val_arrow (memory-mapped)
2025-11-26 10:55:18,783 - INFO - Vision mode: base (273 tokens, 1024x1024)
2025-11-26 10:55:18,813 - INFO - Created AdamW optimizer with differential LR:
  Encoder: 474 param tensors @ lr=1e-05
  Decoder: 2236 param tensors @ lr=0.0001
  Fused kernels: True
2025-11-26 10:55:18,814 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417
2025-11-26 10:55:18,822 - INFO - Logged optimizer config to W&B: type=adamw_fused, memory=24.86GB
2025-11-26 10:55:18,822 - INFO - Loading checkpoint state (model/optimizer/scheduler) from outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-26 10:55:30,142 - INFO - ✓ Successfully loaded optimizer state from checkpoint
2025-11-26 10:55:30,143 - INFO - ✓ Successfully loaded scheduler state from checkpoint
2025-11-26 10:55:30,149 - WARNING - Failed to restore RNG states: RNG state must be a torch.ByteTensor. Continuing with current RNG state.
2025-11-26 10:55:30,162 - INFO - Restored training state: epoch=0, batch_idx=23999, global_step=4000, best_val_loss=1.6909
2025-11-26 10:55:30,163 - INFO - Resuming mid-epoch: will skip first 24000 batches of epoch 0
2025-11-26 10:55:30,164 - INFO - Starting training loop...
2025-11-26 10:55:30,164 - INFO - 
======================================================================
2025-11-26 10:55:30,164 - INFO - Epoch 1/1
2025-11-26 10:55:30,164 - INFO - ======================================================================
2025-11-26 10:55:42,349 - WARNING - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`transformers.
2025-11-26 10:55:43,551 - INFO - Effective context tokens (per-sample): 278 | Compression ratio: 3.60x
2025-11-26 10:55:43,552 - INFO - Target tokens per sample: 1000
2025-11-26 10:59:07,778 - INFO - Epoch 1 Step 10 (Global: 4010): loss=1.5953, ppl=4.93, grad_norm=1.31, lr=7.72e-06, throughput=2206 tok/s
2025-11-26 11:02:31,886 - INFO - Epoch 1 Step 20 (Global: 4020): loss=1.8728, ppl=6.51, grad_norm=1.68, lr=7.71e-06, throughput=2352 tok/s
2025-11-26 11:05:59,802 - INFO - Epoch 1 Step 30 (Global: 4030): loss=1.5460, ppl=4.69, grad_norm=1.20, lr=7.70e-06, throughput=2309 tok/s
2025-11-26 11:09:26,300 - INFO - Epoch 1 Step 40 (Global: 4040): loss=1.7038, ppl=5.49, grad_norm=2.27, lr=7.68e-06, throughput=2324 tok/s
2025-11-26 11:12:51,182 - INFO - Epoch 1 Step 50 (Global: 4050): loss=1.4818, ppl=4.40, grad_norm=1.29, lr=7.67e-06, throughput=2343 tok/s
2025-11-26 11:16:16,527 - INFO - Epoch 1 Step 60 (Global: 4060): loss=1.7971, ppl=6.03, grad_norm=1.83, lr=7.65e-06, throughput=2338 tok/s
2025-11-26 11:19:40,409 - INFO - Epoch 1 Step 70 (Global: 4070): loss=1.5431, ppl=4.68, grad_norm=1.60, lr=7.64e-06, throughput=2354 tok/s
2025-11-26 11:23:04,244 - INFO - Epoch 1 Step 80 (Global: 4080): loss=1.7389, ppl=5.69, grad_norm=1.27, lr=7.62e-06, throughput=2355 tok/s
2025-11-26 11:26:27,304 - INFO - Epoch 1 Step 90 (Global: 4090): loss=1.7134, ppl=5.55, grad_norm=1.70, lr=7.61e-06, throughput=2364 tok/s
2025-11-26 11:29:51,764 - INFO - Epoch 1 Step 100 (Global: 4100): loss=1.6501, ppl=5.21, grad_norm=1.27, lr=7.60e-06, throughput=2348 tok/s
2025-11-26 11:33:15,553 - INFO - Epoch 1 Step 110 (Global: 4110): loss=1.5634, ppl=4.77, grad_norm=1.57, lr=7.58e-06, throughput=2355 tok/s
2025-11-26 11:36:41,822 - INFO - Epoch 1 Step 120 (Global: 4120): loss=1.8531, ppl=6.38, grad_norm=2.45, lr=7.57e-06, throughput=2327 tok/s
2025-11-26 11:40:05,840 - INFO - Epoch 1 Step 130 (Global: 4130): loss=1.7781, ppl=5.92, grad_norm=1.40, lr=7.55e-06, throughput=2353 tok/s
2025-11-26 11:43:31,651 - INFO - Epoch 1 Step 140 (Global: 4140): loss=1.6938, ppl=5.44, grad_norm=1.39, lr=7.54e-06, throughput=2332 tok/s
2025-11-26 11:47:00,799 - INFO - Epoch 1 Step 150 (Global: 4150): loss=1.7069, ppl=5.51, grad_norm=2.00, lr=7.52e-06, throughput=2295 tok/s
2025-11-26 11:50:26,371 - INFO - Epoch 1 Step 160 (Global: 4160): loss=1.4696, ppl=4.35, grad_norm=1.42, lr=7.51e-06, throughput=2335 tok/s
2025-11-26 11:53:55,718 - INFO - Epoch 1 Step 170 (Global: 4170): loss=1.8436, ppl=6.32, grad_norm=1.55, lr=7.49e-06, throughput=2293 tok/s
2025-11-26 11:57:21,570 - INFO - Epoch 1 Step 180 (Global: 4180): loss=1.7474, ppl=5.74, grad_norm=1.20, lr=7.48e-06, throughput=2332 tok/s
2025-11-26 12:00:47,154 - INFO - Epoch 1 Step 190 (Global: 4190): loss=1.4784, ppl=4.39, grad_norm=1.21, lr=7.47e-06, throughput=2335 tok/s
2025-11-26 12:04:10,015 - INFO - Epoch 1 Step 200 (Global: 4200): loss=1.5963, ppl=4.93, grad_norm=1.31, lr=7.45e-06, throughput=2366 tok/s
2025-11-26 12:07:32,840 - INFO - Epoch 1 Step 210 (Global: 4210): loss=1.6186, ppl=5.05, grad_norm=1.99, lr=7.44e-06, throughput=2367 tok/s
2025-11-26 12:10:56,683 - INFO - Epoch 1 Step 220 (Global: 4220): loss=1.7733, ppl=5.89, grad_norm=1.88, lr=7.42e-06, throughput=2355 tok/s
2025-11-26 12:14:19,344 - INFO - Epoch 1 Step 230 (Global: 4230): loss=1.7065, ppl=5.51, grad_norm=3.75, lr=7.41e-06, throughput=2369 tok/s
2025-11-26 12:17:44,744 - INFO - Epoch 1 Step 240 (Global: 4240): loss=1.7663, ppl=5.85, grad_norm=1.66, lr=7.39e-06, throughput=2337 tok/s
2025-11-26 12:21:09,840 - INFO - Epoch 1 Step 250 (Global: 4250): loss=1.7145, ppl=5.55, grad_norm=1.33, lr=7.38e-06, throughput=2340 tok/s
2025-11-26 12:24:36,277 - INFO - Epoch 1 Step 260 (Global: 4260): loss=1.6920, ppl=5.43, grad_norm=1.33, lr=7.36e-06, throughput=2325 tok/s
2025-11-26 12:28:01,361 - INFO - Epoch 1 Step 270 (Global: 4270): loss=1.6231, ppl=5.07, grad_norm=1.87, lr=7.35e-06, throughput=2341 tok/s
2025-11-26 12:31:24,607 - INFO - Epoch 1 Step 280 (Global: 4280): loss=1.5611, ppl=4.76, grad_norm=1.35, lr=7.33e-06, throughput=2362 tok/s
2025-11-26 12:34:47,625 - INFO - Epoch 1 Step 290 (Global: 4290): loss=1.5658, ppl=4.79, grad_norm=2.20, lr=7.32e-06, throughput=2364 tok/s
2025-11-26 12:38:11,804 - INFO - Epoch 1 Step 300 (Global: 4300): loss=1.4066, ppl=4.08, grad_norm=1.23, lr=7.30e-06, throughput=2351 tok/s
2025-11-26 12:41:35,869 - INFO - Epoch 1 Step 310 (Global: 4310): loss=1.3833, ppl=3.99, grad_norm=3.17, lr=7.29e-06, throughput=2352 tok/s
2025-11-26 12:45:02,975 - INFO - Epoch 1 Step 320 (Global: 4320): loss=1.7036, ppl=5.49, grad_norm=1.17, lr=7.27e-06, throughput=2318 tok/s
2025-11-26 12:48:29,156 - INFO - Epoch 1 Step 330 (Global: 4330): loss=1.6021, ppl=4.96, grad_norm=1.17, lr=7.26e-06, throughput=2328 tok/s
2025-11-26 12:51:55,837 - INFO - Epoch 1 Step 340 (Global: 4340): loss=1.5165, ppl=4.56, grad_norm=1.28, lr=7.24e-06, throughput=2322 tok/s
2025-11-26 12:55:22,378 - INFO - Epoch 1 Step 350 (Global: 4350): loss=1.6196, ppl=5.05, grad_norm=1.38, lr=7.23e-06, throughput=2324 tok/s
2025-11-26 12:58:49,366 - INFO - Epoch 1 Step 360 (Global: 4360): loss=1.4258, ppl=4.16, grad_norm=1.21, lr=7.21e-06, throughput=2319 tok/s
2025-11-26 13:02:14,639 - INFO - Epoch 1 Step 370 (Global: 4370): loss=1.7505, ppl=5.76, grad_norm=1.83, lr=7.20e-06, throughput=2338 tok/s
2025-11-26 13:05:40,540 - INFO - Epoch 1 Step 380 (Global: 4380): loss=1.7673, ppl=5.85, grad_norm=1.40, lr=7.18e-06, throughput=2331 tok/s
2025-11-26 13:09:04,125 - INFO - Epoch 1 Step 390 (Global: 4390): loss=1.7419, ppl=5.71, grad_norm=1.66, lr=7.17e-06, throughput=2358 tok/s
2025-11-26 13:12:27,820 - INFO - Epoch 1 Step 400 (Global: 4400): loss=1.5554, ppl=4.74, grad_norm=2.30, lr=7.15e-06, throughput=2356 tok/s
2025-11-26 13:15:52,785 - INFO - Epoch 1 Step 410 (Global: 4410): loss=1.7893, ppl=5.99, grad_norm=1.34, lr=7.14e-06, throughput=2342 tok/s
2025-11-26 13:19:17,942 - INFO - Epoch 1 Step 420 (Global: 4420): loss=1.8293, ppl=6.23, grad_norm=1.30, lr=7.12e-06, throughput=2340 tok/s
2025-11-26 13:22:47,948 - INFO - Epoch 1 Step 430 (Global: 4430): loss=1.6393, ppl=5.15, grad_norm=1.55, lr=7.11e-06, throughput=2286 tok/s
2025-11-26 13:26:13,714 - INFO - Epoch 1 Step 440 (Global: 4440): loss=1.9649, ppl=7.13, grad_norm=1.34, lr=7.09e-06, throughput=2333 tok/s
2025-11-26 13:29:39,289 - INFO - Epoch 1 Step 450 (Global: 4450): loss=1.5512, ppl=4.72, grad_norm=1.87, lr=7.08e-06, throughput=2335 tok/s
2025-11-26 13:33:04,981 - INFO - Epoch 1 Step 460 (Global: 4460): loss=1.8276, ppl=6.22, grad_norm=2.03, lr=7.06e-06, throughput=2334 tok/s
2025-11-26 13:36:30,668 - INFO - Epoch 1 Step 470 (Global: 4470): loss=1.6299, ppl=5.10, grad_norm=1.45, lr=7.05e-06, throughput=2334 tok/s
2025-11-26 13:39:57,110 - INFO - Epoch 1 Step 480 (Global: 4480): loss=1.8573, ppl=6.41, grad_norm=2.08, lr=7.03e-06, throughput=2325 tok/s
2025-11-26 13:43:22,460 - INFO - Epoch 1 Step 490 (Global: 4490): loss=1.7869, ppl=5.97, grad_norm=2.23, lr=7.02e-06, throughput=2337 tok/s
2025-11-26 13:46:47,165 - INFO - Epoch 1 Step 500 (Global: 4500): loss=1.7727, ppl=5.89, grad_norm=1.58, lr=7.00e-06, throughput=2345 tok/s
2025-11-26 13:50:13,281 - INFO - Epoch 1 Step 510 (Global: 4510): loss=1.6357, ppl=5.13, grad_norm=1.27, lr=6.99e-06, throughput=2329 tok/s
2025-11-26 13:53:39,484 - INFO - Epoch 1 Step 520 (Global: 4520): loss=1.6233, ppl=5.07, grad_norm=1.45, lr=6.97e-06, throughput=2328 tok/s
2025-11-26 13:57:04,548 - INFO - Epoch 1 Step 530 (Global: 4530): loss=1.6000, ppl=4.95, grad_norm=1.08, lr=6.96e-06, throughput=2341 tok/s
2025-11-26 14:00:29,863 - INFO - Epoch 1 Step 540 (Global: 4540): loss=1.4770, ppl=4.38, grad_norm=1.20, lr=6.94e-06, throughput=2338 tok/s
2025-11-26 14:03:55,805 - INFO - Epoch 1 Step 550 (Global: 4550): loss=1.6530, ppl=5.22, grad_norm=1.20, lr=6.92e-06, throughput=2331 tok/s
2025-11-26 14:07:23,587 - INFO - Epoch 1 Step 560 (Global: 4560): loss=1.6620, ppl=5.27, grad_norm=1.32, lr=6.91e-06, throughput=2310 tok/s
2025-11-26 14:10:54,057 - INFO - Epoch 1 Step 570 (Global: 4570): loss=1.5083, ppl=4.52, grad_norm=1.50, lr=6.89e-06, throughput=2281 tok/s
2025-11-26 14:14:25,871 - INFO - Epoch 1 Step 580 (Global: 4580): loss=1.4806, ppl=4.40, grad_norm=1.23, lr=6.88e-06, throughput=2266 tok/s
2025-11-26 14:17:53,574 - INFO - Epoch 1 Step 590 (Global: 4590): loss=1.6875, ppl=5.41, grad_norm=1.45, lr=6.86e-06, throughput=2311 tok/s
2025-11-26 14:21:18,814 - INFO - Epoch 1 Step 600 (Global: 4600): loss=1.6815, ppl=5.37, grad_norm=1.84, lr=6.85e-06, throughput=2339 tok/s
2025-11-26 14:24:44,351 - INFO - Epoch 1 Step 610 (Global: 4610): loss=1.5520, ppl=4.72, grad_norm=1.35, lr=6.83e-06, throughput=2335 tok/s
2025-11-26 14:28:09,979 - INFO - Epoch 1 Step 620 (Global: 4620): loss=1.8772, ppl=6.54, grad_norm=1.42, lr=6.82e-06, throughput=2334 tok/s
2025-11-26 14:31:35,328 - INFO - Epoch 1 Step 630 (Global: 4630): loss=1.6348, ppl=5.13, grad_norm=1.15, lr=6.80e-06, throughput=2338 tok/s
2025-11-26 14:35:02,599 - INFO - Epoch 1 Step 640 (Global: 4640): loss=1.7853, ppl=5.96, grad_norm=3.75, lr=6.78e-06, throughput=2316 tok/s
2025-11-26 14:38:31,391 - INFO - Epoch 1 Step 650 (Global: 4650): loss=1.4731, ppl=4.36, grad_norm=1.62, lr=6.77e-06, throughput=2299 tok/s
2025-11-26 14:41:58,185 - INFO - Epoch 1 Step 660 (Global: 4660): loss=1.6962, ppl=5.45, grad_norm=1.24, lr=6.75e-06, throughput=2321 tok/s
2025-11-26 14:45:23,551 - INFO - Epoch 1 Step 670 (Global: 4670): loss=1.5454, ppl=4.69, grad_norm=1.92, lr=6.74e-06, throughput=2337 tok/s
2025-11-26 14:48:49,086 - INFO - Epoch 1 Step 680 (Global: 4680): loss=1.4202, ppl=4.14, grad_norm=1.95, lr=6.72e-06, throughput=2335 tok/s
2025-11-26 14:52:13,340 - INFO - Epoch 1 Step 690 (Global: 4690): loss=1.6956, ppl=5.45, grad_norm=1.23, lr=6.71e-06, throughput=2350 tok/s
2025-11-26 14:55:37,045 - INFO - Epoch 1 Step 700 (Global: 4700): loss=1.6405, ppl=5.16, grad_norm=7.62, lr=6.69e-06, throughput=2356 tok/s
2025-11-26 14:59:00,853 - INFO - Epoch 1 Step 710 (Global: 4710): loss=1.5683, ppl=4.80, grad_norm=1.45, lr=6.67e-06, throughput=2355 tok/s
2025-11-26 15:02:24,207 - INFO - Epoch 1 Step 720 (Global: 4720): loss=1.6794, ppl=5.36, grad_norm=1.38, lr=6.66e-06, throughput=2360 tok/s
2025-11-26 15:05:49,302 - INFO - Epoch 1 Step 730 (Global: 4730): loss=1.8037, ppl=6.07, grad_norm=1.99, lr=6.64e-06, throughput=2340 tok/s
2025-11-26 15:09:14,502 - INFO - Epoch 1 Step 740 (Global: 4740): loss=1.8448, ppl=6.33, grad_norm=1.72, lr=6.63e-06, throughput=2339 tok/s
2025-11-26 15:12:41,048 - INFO - Epoch 1 Step 750 (Global: 4750): loss=1.7020, ppl=5.49, grad_norm=2.50, lr=6.61e-06, throughput=2324 tok/s
2025-11-26 15:16:07,516 - INFO - Epoch 1 Step 760 (Global: 4760): loss=1.4959, ppl=4.46, grad_norm=2.64, lr=6.60e-06, throughput=2325 tok/s
2025-11-26 15:19:34,658 - INFO - Epoch 1 Step 770 (Global: 4770): loss=1.8787, ppl=6.54, grad_norm=1.34, lr=6.58e-06, throughput=2317 tok/s
2025-11-26 15:23:01,827 - INFO - Epoch 1 Step 780 (Global: 4780): loss=1.6085, ppl=5.00, grad_norm=1.30, lr=6.56e-06, throughput=2317 tok/s
2025-11-26 15:26:28,119 - INFO - Epoch 1 Step 790 (Global: 4790): loss=1.5552, ppl=4.74, grad_norm=1.14, lr=6.55e-06, throughput=2327 tok/s
2025-11-26 15:29:54,797 - INFO - Epoch 1 Step 800 (Global: 4800): loss=1.7161, ppl=5.56, grad_norm=1.24, lr=6.53e-06, throughput=2322 tok/s
2025-11-26 15:33:22,265 - INFO - Epoch 1 Step 810 (Global: 4810): loss=1.7006, ppl=5.48, grad_norm=1.66, lr=6.52e-06, throughput=2314 tok/s
2025-11-26 15:36:50,145 - INFO - Epoch 1 Step 820 (Global: 4820): loss=1.6117, ppl=5.01, grad_norm=1.16, lr=6.50e-06, throughput=2309 tok/s
2025-11-26 15:40:17,353 - INFO - Epoch 1 Step 830 (Global: 4830): loss=1.5950, ppl=4.93, grad_norm=1.41, lr=6.48e-06, throughput=2317 tok/s
2025-11-26 15:43:46,112 - INFO - Epoch 1 Step 840 (Global: 4840): loss=1.7649, ppl=5.84, grad_norm=2.03, lr=6.47e-06, throughput=2299 tok/s
2025-11-26 15:47:13,775 - INFO - Epoch 1 Step 850 (Global: 4850): loss=1.6651, ppl=5.29, grad_norm=1.27, lr=6.45e-06, throughput=2311 tok/s
2025-11-26 15:50:44,101 - INFO - Epoch 1 Step 860 (Global: 4860): loss=1.6814, ppl=5.37, grad_norm=1.56, lr=6.44e-06, throughput=2282 tok/s
2025-11-26 15:54:11,667 - INFO - Epoch 1 Step 870 (Global: 4870): loss=1.7633, ppl=5.83, grad_norm=1.46, lr=6.42e-06, throughput=2313 tok/s
2025-11-26 15:57:38,483 - INFO - Epoch 1 Step 880 (Global: 4880): loss=1.3826, ppl=3.99, grad_norm=1.41, lr=6.40e-06, throughput=2321 tok/s
2025-11-26 16:01:04,604 - INFO - Epoch 1 Step 890 (Global: 4890): loss=1.6819, ppl=5.38, grad_norm=1.57, lr=6.39e-06, throughput=2329 tok/s
2025-11-26 16:04:30,047 - INFO - Epoch 1 Step 900 (Global: 4900): loss=1.6564, ppl=5.24, grad_norm=1.41, lr=6.37e-06, throughput=2336 tok/s
2025-11-26 16:07:55,241 - INFO - Epoch 1 Step 910 (Global: 4910): loss=1.6726, ppl=5.33, grad_norm=1.69, lr=6.35e-06, throughput=2339 tok/s
2025-11-26 16:11:23,239 - INFO - Epoch 1 Step 920 (Global: 4920): loss=1.6469, ppl=5.19, grad_norm=1.34, lr=6.34e-06, throughput=2308 tok/s
2025-11-26 16:14:52,016 - INFO - Epoch 1 Step 930 (Global: 4930): loss=1.5233, ppl=4.59, grad_norm=1.27, lr=6.32e-06, throughput=2299 tok/s
2025-11-26 16:18:19,601 - INFO - Epoch 1 Step 940 (Global: 4940): loss=1.7237, ppl=5.61, grad_norm=1.59, lr=6.31e-06, throughput=2312 tok/s
2025-11-26 16:21:48,730 - INFO - Epoch 1 Step 950 (Global: 4950): loss=1.7854, ppl=5.96, grad_norm=1.20, lr=6.29e-06, throughput=2295 tok/s
2025-11-26 16:25:17,220 - INFO - Epoch 1 Step 960 (Global: 4960): loss=1.6801, ppl=5.37, grad_norm=1.30, lr=6.27e-06, throughput=2302 tok/s
2025-11-26 16:28:44,496 - INFO - Epoch 1 Step 970 (Global: 4970): loss=1.4785, ppl=4.39, grad_norm=2.33, lr=6.26e-06, throughput=2316 tok/s
2025-11-26 16:32:10,538 - INFO - Epoch 1 Step 980 (Global: 4980): loss=1.5979, ppl=4.94, grad_norm=1.90, lr=6.24e-06, throughput=2330 tok/s
2025-11-26 16:35:36,640 - INFO - Epoch 1 Step 990 (Global: 4990): loss=1.4211, ppl=4.14, grad_norm=1.29, lr=6.23e-06, throughput=2329 tok/s
2025-11-26 16:39:06,513 - INFO - Epoch 1 Step 1000 (Global: 5000): loss=1.5556, ppl=4.74, grad_norm=1.55, lr=6.21e-06, throughput=2287 tok/s
2025-11-26 16:42:36,734 - INFO - Epoch 1 Step 1010 (Global: 5010): loss=1.7957, ppl=6.02, grad_norm=1.61, lr=6.19e-06, throughput=2283 tok/s
2025-11-26 16:46:07,831 - INFO - Epoch 1 Step 1020 (Global: 5020): loss=1.6076, ppl=4.99, grad_norm=1.27, lr=6.18e-06, throughput=2274 tok/s
2025-11-26 16:49:38,670 - INFO - Epoch 1 Step 1030 (Global: 5030): loss=1.4002, ppl=4.06, grad_norm=1.34, lr=6.16e-06, throughput=2277 tok/s
2025-11-26 16:53:08,317 - INFO - Epoch 1 Step 1040 (Global: 5040): loss=1.6525, ppl=5.22, grad_norm=1.30, lr=6.14e-06, throughput=2290 tok/s
2025-11-26 16:56:36,748 - INFO - Epoch 1 Step 1050 (Global: 5050): loss=1.8744, ppl=6.52, grad_norm=1.47, lr=6.13e-06, throughput=2303 tok/s
2025-11-26 17:00:05,207 - INFO - Epoch 1 Step 1060 (Global: 5060): loss=1.4214, ppl=4.14, grad_norm=1.38, lr=6.11e-06, throughput=2303 tok/s
2025-11-26 17:03:35,080 - INFO - Epoch 1 Step 1070 (Global: 5070): loss=1.5064, ppl=4.51, grad_norm=1.77, lr=6.10e-06, throughput=2287 tok/s
2025-11-26 17:07:03,031 - INFO - Epoch 1 Step 1080 (Global: 5080): loss=1.8061, ppl=6.09, grad_norm=1.41, lr=6.08e-06, throughput=2308 tok/s
2025-11-26 17:10:31,517 - INFO - Epoch 1 Step 1090 (Global: 5090): loss=1.7182, ppl=5.57, grad_norm=1.51, lr=6.06e-06, throughput=2302 tok/s
2025-11-26 17:14:02,333 - INFO - Epoch 1 Step 1100 (Global: 5100): loss=1.3599, ppl=3.90, grad_norm=1.27, lr=6.05e-06, throughput=2277 tok/s
2025-11-26 17:17:31,325 - INFO - Epoch 1 Step 1110 (Global: 5110): loss=1.5500, ppl=4.71, grad_norm=1.77, lr=6.03e-06, throughput=2297 tok/s
2025-11-26 17:20:59,574 - INFO - Epoch 1 Step 1120 (Global: 5120): loss=1.6610, ppl=5.26, grad_norm=1.23, lr=6.01e-06, throughput=2305 tok/s
2025-11-26 17:24:28,770 - INFO - Epoch 1 Step 1130 (Global: 5130): loss=1.6901, ppl=5.42, grad_norm=1.49, lr=6.00e-06, throughput=2295 tok/s
2025-11-26 17:27:57,204 - INFO - Epoch 1 Step 1140 (Global: 5140): loss=1.6691, ppl=5.31, grad_norm=1.83, lr=5.98e-06, throughput=2303 tok/s
2025-11-26 17:31:23,345 - INFO - Epoch 1 Step 1150 (Global: 5150): loss=1.4859, ppl=4.42, grad_norm=1.36, lr=5.96e-06, throughput=2329 tok/s
2025-11-26 17:34:49,376 - INFO - Epoch 1 Step 1160 (Global: 5160): loss=1.6571, ppl=5.24, grad_norm=1.45, lr=5.95e-06, throughput=2330 tok/s
2025-11-26 17:38:14,889 - INFO - Epoch 1 Step 1170 (Global: 5170): loss=1.5316, ppl=4.63, grad_norm=1.25, lr=5.93e-06, throughput=2336 tok/s
2025-11-26 17:41:39,399 - INFO - Epoch 1 Step 1180 (Global: 5180): loss=1.5583, ppl=4.75, grad_norm=2.02, lr=5.91e-06, throughput=2347 tok/s
2025-11-26 17:45:02,846 - INFO - Epoch 1 Step 1190 (Global: 5190): loss=1.5618, ppl=4.77, grad_norm=1.83, lr=5.90e-06, throughput=2359 tok/s
2025-11-26 17:48:26,259 - INFO - Epoch 1 Step 1200 (Global: 5200): loss=1.6518, ppl=5.22, grad_norm=1.17, lr=5.88e-06, throughput=2360 tok/s
2025-11-26 17:51:49,658 - INFO - Epoch 1 Step 1210 (Global: 5210): loss=1.8940, ppl=6.65, grad_norm=1.93, lr=5.87e-06, throughput=2360 tok/s
2025-11-26 17:55:13,466 - INFO - Epoch 1 Step 1220 (Global: 5220): loss=1.7546, ppl=5.78, grad_norm=1.72, lr=5.85e-06, throughput=2355 tok/s
2025-11-26 17:58:36,883 - INFO - Epoch 1 Step 1230 (Global: 5230): loss=1.5336, ppl=4.63, grad_norm=1.57, lr=5.83e-06, throughput=2360 tok/s
2025-11-26 18:02:00,745 - INFO - Epoch 1 Step 1240 (Global: 5240): loss=1.5682, ppl=4.80, grad_norm=1.51, lr=5.82e-06, throughput=2355 tok/s
2025-11-26 18:05:24,458 - INFO - Epoch 1 Step 1250 (Global: 5250): loss=1.7219, ppl=5.60, grad_norm=1.59, lr=5.80e-06, throughput=2356 tok/s
2025-11-26 18:08:48,646 - INFO - Epoch 1 Step 1260 (Global: 5260): loss=1.6992, ppl=5.47, grad_norm=2.08, lr=5.78e-06, throughput=2351 tok/s
2025-11-26 18:12:12,446 - INFO - Epoch 1 Step 1270 (Global: 5270): loss=1.7772, ppl=5.91, grad_norm=1.84, lr=5.77e-06, throughput=2355 tok/s
2025-11-26 18:15:36,246 - INFO - Epoch 1 Step 1280 (Global: 5280): loss=1.8137, ppl=6.13, grad_norm=1.73, lr=5.75e-06, throughput=2355 tok/s
2025-11-26 18:18:59,585 - INFO - Epoch 1 Step 1290 (Global: 5290): loss=1.6611, ppl=5.26, grad_norm=1.70, lr=5.73e-06, throughput=2361 tok/s
2025-11-26 18:22:24,341 - INFO - Epoch 1 Step 1300 (Global: 5300): loss=1.4779, ppl=4.38, grad_norm=1.27, lr=5.72e-06, throughput=2344 tok/s
2025-11-26 18:25:47,278 - INFO - Epoch 1 Step 1310 (Global: 5310): loss=1.4563, ppl=4.29, grad_norm=3.03, lr=5.70e-06, throughput=2365 tok/s
2025-11-26 18:29:10,610 - INFO - Epoch 1 Step 1320 (Global: 5320): loss=1.7373, ppl=5.68, grad_norm=16.75, lr=5.68e-06, throughput=2361 tok/s
2025-11-26 18:32:34,359 - INFO - Epoch 1 Step 1330 (Global: 5330): loss=1.6035, ppl=4.97, grad_norm=1.48, lr=5.67e-06, throughput=2356 tok/s
2025-11-26 18:35:57,773 - INFO - Epoch 1 Step 1340 (Global: 5340): loss=1.5352, ppl=4.64, grad_norm=2.11, lr=5.65e-06, throughput=2360 tok/s
2025-11-26 18:39:20,236 - INFO - Epoch 1 Step 1350 (Global: 5350): loss=1.4879, ppl=4.43, grad_norm=2.11, lr=5.63e-06, throughput=2371 tok/s
2025-11-26 18:42:43,156 - INFO - Epoch 1 Step 1360 (Global: 5360): loss=1.6878, ppl=5.41, grad_norm=1.55, lr=5.62e-06, throughput=2365 tok/s
2025-11-26 18:46:09,386 - INFO - Epoch 1 Step 1370 (Global: 5370): loss=1.8401, ppl=6.30, grad_norm=1.52, lr=5.60e-06, throughput=2328 tok/s
2025-11-26 18:49:36,563 - INFO - Epoch 1 Step 1380 (Global: 5380): loss=1.6474, ppl=5.19, grad_norm=1.58, lr=5.58e-06, throughput=2317 tok/s
2025-11-26 18:53:05,103 - INFO - Epoch 1 Step 1390 (Global: 5390): loss=1.8087, ppl=6.10, grad_norm=1.57, lr=5.57e-06, throughput=2302 tok/s
2025-11-26 18:56:33,823 - INFO - Epoch 1 Step 1400 (Global: 5400): loss=1.4765, ppl=4.38, grad_norm=2.31, lr=5.55e-06, throughput=2300 tok/s
2025-11-26 19:00:01,474 - INFO - Epoch 1 Step 1410 (Global: 5410): loss=1.6024, ppl=4.96, grad_norm=2.62, lr=5.53e-06, throughput=2312 tok/s
2025-11-26 19:03:30,966 - INFO - Epoch 1 Step 1420 (Global: 5420): loss=1.5479, ppl=4.70, grad_norm=2.50, lr=5.52e-06, throughput=2291 tok/s
2025-11-26 19:07:00,061 - INFO - Epoch 1 Step 1430 (Global: 5430): loss=1.6556, ppl=5.24, grad_norm=1.23, lr=5.50e-06, throughput=2296 tok/s
2025-11-26 19:10:28,439 - INFO - Epoch 1 Step 1440 (Global: 5440): loss=1.9449, ppl=6.99, grad_norm=1.52, lr=5.48e-06, throughput=2304 tok/s
2025-11-26 19:13:58,448 - INFO - Epoch 1 Step 1450 (Global: 5450): loss=1.7501, ppl=5.76, grad_norm=1.38, lr=5.47e-06, throughput=2286 tok/s
2025-11-26 19:17:24,719 - INFO - Epoch 1 Step 1460 (Global: 5460): loss=1.6826, ppl=5.38, grad_norm=1.16, lr=5.45e-06, throughput=2327 tok/s
2025-11-26 19:20:52,595 - INFO - Epoch 1 Step 1470 (Global: 5470): loss=1.5204, ppl=4.57, grad_norm=1.26, lr=5.43e-06, throughput=2309 tok/s
2025-11-26 19:24:18,631 - INFO - Epoch 1 Step 1480 (Global: 5480): loss=1.6249, ppl=5.08, grad_norm=1.38, lr=5.42e-06, throughput=2330 tok/s
2025-11-26 19:27:42,845 - INFO - Epoch 1 Step 1490 (Global: 5490): loss=1.5220, ppl=4.58, grad_norm=1.37, lr=5.40e-06, throughput=2350 tok/s
2025-11-26 19:31:08,476 - INFO - Epoch 1 Step 1500 (Global: 5500): loss=1.6931, ppl=5.44, grad_norm=3.03, lr=5.38e-06, throughput=2334 tok/s
2025-11-26 19:34:33,112 - INFO - Epoch 1 Step 1510 (Global: 5510): loss=1.4829, ppl=4.41, grad_norm=2.39, lr=5.37e-06, throughput=2346 tok/s
2025-11-26 19:37:57,966 - INFO - Epoch 1 Step 1520 (Global: 5520): loss=1.6404, ppl=5.16, grad_norm=1.38, lr=5.35e-06, throughput=2343 tok/s
2025-11-26 19:41:23,383 - INFO - Epoch 1 Step 1530 (Global: 5530): loss=1.7479, ppl=5.74, grad_norm=1.59, lr=5.33e-06, throughput=2337 tok/s
2025-11-26 19:44:47,957 - INFO - Epoch 1 Step 1540 (Global: 5540): loss=1.5609, ppl=4.76, grad_norm=1.78, lr=5.32e-06, throughput=2346 tok/s
2025-11-26 19:48:13,234 - INFO - Epoch 1 Step 1550 (Global: 5550): loss=1.6103, ppl=5.00, grad_norm=1.88, lr=5.30e-06, throughput=2338 tok/s
2025-11-26 19:51:37,108 - INFO - Epoch 1 Step 1560 (Global: 5560): loss=1.5914, ppl=4.91, grad_norm=1.22, lr=5.28e-06, throughput=2354 tok/s
2025-11-26 19:55:00,732 - INFO - Epoch 1 Step 1570 (Global: 5570): loss=1.7428, ppl=5.71, grad_norm=1.38, lr=5.27e-06, throughput=2357 tok/s
2025-11-26 19:58:25,710 - INFO - Epoch 1 Step 1580 (Global: 5580): loss=1.6435, ppl=5.17, grad_norm=1.71, lr=5.25e-06, throughput=2342 tok/s
2025-11-26 20:01:49,854 - INFO - Epoch 1 Step 1590 (Global: 5590): loss=1.9722, ppl=7.19, grad_norm=1.52, lr=5.23e-06, throughput=2351 tok/s
2025-11-26 20:05:14,459 - INFO - Epoch 1 Step 1600 (Global: 5600): loss=1.6881, ppl=5.41, grad_norm=1.62, lr=5.22e-06, throughput=2346 tok/s
2025-11-26 20:08:44,929 - INFO - Epoch 1 Step 1610 (Global: 5610): loss=1.5186, ppl=4.57, grad_norm=1.48, lr=5.20e-06, throughput=2281 tok/s
2025-11-26 20:12:13,095 - INFO - Epoch 1 Step 1620 (Global: 5620): loss=1.7530, ppl=5.77, grad_norm=1.49, lr=5.18e-06, throughput=2306 tok/s
2025-11-26 20:15:42,661 - INFO - Epoch 1 Step 1630 (Global: 5630): loss=1.3741, ppl=3.95, grad_norm=1.37, lr=5.17e-06, throughput=2290 tok/s
2025-11-26 20:19:10,232 - INFO - Epoch 1 Step 1640 (Global: 5640): loss=1.4650, ppl=4.33, grad_norm=1.17, lr=5.15e-06, throughput=2312 tok/s
2025-11-26 20:22:39,989 - INFO - Epoch 1 Step 1650 (Global: 5650): loss=1.7150, ppl=5.56, grad_norm=1.41, lr=5.13e-06, throughput=2288 tok/s
2025-11-26 20:26:10,344 - INFO - Epoch 1 Step 1660 (Global: 5660): loss=1.6300, ppl=5.10, grad_norm=1.61, lr=5.12e-06, throughput=2282 tok/s
2025-11-26 20:29:39,633 - INFO - Epoch 1 Step 1670 (Global: 5670): loss=1.8847, ppl=6.58, grad_norm=1.34, lr=5.10e-06, throughput=2293 tok/s
2025-11-26 20:33:09,258 - INFO - Epoch 1 Step 1680 (Global: 5680): loss=1.7402, ppl=5.70, grad_norm=1.55, lr=5.08e-06, throughput=2290 tok/s
2025-11-26 20:36:37,671 - INFO - Epoch 1 Step 1690 (Global: 5690): loss=1.5155, ppl=4.55, grad_norm=1.72, lr=5.07e-06, throughput=2303 tok/s
2025-11-26 20:40:07,810 - INFO - Epoch 1 Step 1700 (Global: 5700): loss=1.5077, ppl=4.52, grad_norm=1.34, lr=5.05e-06, throughput=2284 tok/s
2025-11-26 20:43:36,536 - INFO - Epoch 1 Step 1710 (Global: 5710): loss=1.6036, ppl=4.97, grad_norm=1.51, lr=5.03e-06, throughput=2300 tok/s
2025-11-26 20:47:06,160 - INFO - Epoch 1 Step 1720 (Global: 5720): loss=1.5908, ppl=4.91, grad_norm=1.52, lr=5.02e-06, throughput=2290 tok/s
2025-11-26 20:50:35,678 - INFO - Epoch 1 Step 1730 (Global: 5730): loss=1.6747, ppl=5.34, grad_norm=1.46, lr=5.00e-06, throughput=2291 tok/s
2025-11-26 20:54:03,974 - INFO - Epoch 1 Step 1740 (Global: 5740): loss=1.6198, ppl=5.05, grad_norm=1.50, lr=4.98e-06, throughput=2304 tok/s
2025-11-26 20:57:34,815 - INFO - Epoch 1 Step 1750 (Global: 5750): loss=1.4641, ppl=4.32, grad_norm=1.46, lr=4.96e-06, throughput=2277 tok/s
2025-11-26 21:01:03,854 - INFO - Epoch 1 Step 1760 (Global: 5760): loss=1.6753, ppl=5.34, grad_norm=2.14, lr=4.95e-06, throughput=2296 tok/s
2025-11-26 21:04:33,467 - INFO - Epoch 1 Step 1770 (Global: 5770): loss=1.5430, ppl=4.68, grad_norm=1.55, lr=4.93e-06, throughput=2290 tok/s
2025-11-26 21:08:02,792 - INFO - Epoch 1 Step 1780 (Global: 5780): loss=1.5668, ppl=4.79, grad_norm=2.19, lr=4.91e-06, throughput=2293 tok/s
2025-11-26 21:11:31,727 - INFO - Epoch 1 Step 1790 (Global: 5790): loss=1.4746, ppl=4.37, grad_norm=2.17, lr=4.90e-06, throughput=2297 tok/s
2025-11-26 21:15:01,687 - INFO - Epoch 1 Step 1800 (Global: 5800): loss=1.6367, ppl=5.14, grad_norm=3.62, lr=4.88e-06, throughput=2286 tok/s
2025-11-26 21:18:29,721 - INFO - Epoch 1 Step 1810 (Global: 5810): loss=1.7872, ppl=5.97, grad_norm=1.53, lr=4.86e-06, throughput=2307 tok/s
2025-11-26 21:21:57,639 - INFO - Epoch 1 Step 1820 (Global: 5820): loss=1.3995, ppl=4.05, grad_norm=1.04, lr=4.85e-06, throughput=2309 tok/s
2025-11-26 21:25:22,944 - INFO - Epoch 1 Step 1830 (Global: 5830): loss=1.6517, ppl=5.22, grad_norm=1.71, lr=4.83e-06, throughput=2338 tok/s
2025-11-26 21:28:51,379 - INFO - Epoch 1 Step 1840 (Global: 5840): loss=1.6664, ppl=5.29, grad_norm=1.15, lr=4.81e-06, throughput=2303 tok/s
2025-11-26 21:32:21,802 - INFO - Epoch 1 Step 1850 (Global: 5850): loss=1.6906, ppl=5.42, grad_norm=1.34, lr=4.80e-06, throughput=2281 tok/s
2025-11-26 21:35:50,515 - INFO - Epoch 1 Step 1860 (Global: 5860): loss=1.6730, ppl=5.33, grad_norm=1.34, lr=4.78e-06, throughput=2300 tok/s
2025-11-26 21:39:19,830 - INFO - Epoch 1 Step 1870 (Global: 5870): loss=1.6172, ppl=5.04, grad_norm=1.26, lr=4.76e-06, throughput=2293 tok/s
2025-11-26 21:42:49,268 - INFO - Epoch 1 Step 1880 (Global: 5880): loss=1.4436, ppl=4.24, grad_norm=1.82, lr=4.75e-06, throughput=2292 tok/s
2025-11-26 21:46:18,551 - INFO - Epoch 1 Step 1890 (Global: 5890): loss=1.8251, ppl=6.20, grad_norm=1.23, lr=4.73e-06, throughput=2294 tok/s
2025-11-26 21:49:48,315 - INFO - Epoch 1 Step 1900 (Global: 5900): loss=1.7322, ppl=5.65, grad_norm=1.78, lr=4.71e-06, throughput=2288 tok/s
2025-11-26 21:53:16,076 - INFO - Epoch 1 Step 1910 (Global: 5910): loss=1.8912, ppl=6.63, grad_norm=1.52, lr=4.70e-06, throughput=2310 tok/s
2025-11-26 21:56:45,519 - INFO - Epoch 1 Step 1920 (Global: 5920): loss=1.6152, ppl=5.03, grad_norm=1.46, lr=4.68e-06, throughput=2292 tok/s
2025-11-26 22:00:15,001 - INFO - Epoch 1 Step 1930 (Global: 5930): loss=1.5182, ppl=4.56, grad_norm=1.40, lr=4.66e-06, throughput=2291 tok/s
2025-11-26 22:03:43,597 - INFO - Epoch 1 Step 1940 (Global: 5940): loss=1.5657, ppl=4.79, grad_norm=1.23, lr=4.65e-06, throughput=2301 tok/s
2025-11-26 22:07:15,354 - INFO - Epoch 1 Step 1950 (Global: 5950): loss=1.4983, ppl=4.47, grad_norm=1.74, lr=4.63e-06, throughput=2267 tok/s
2025-11-26 22:10:44,094 - INFO - Epoch 1 Step 1960 (Global: 5960): loss=1.4744, ppl=4.37, grad_norm=1.19, lr=4.61e-06, throughput=2300 tok/s
2025-11-26 22:14:14,296 - INFO - Epoch 1 Step 1970 (Global: 5970): loss=1.6319, ppl=5.11, grad_norm=1.13, lr=4.60e-06, throughput=2284 tok/s
2025-11-26 22:17:43,214 - INFO - Epoch 1 Step 1980 (Global: 5980): loss=1.5544, ppl=4.73, grad_norm=2.16, lr=4.58e-06, throughput=2298 tok/s
2025-11-26 22:21:10,123 - INFO - Epoch 1 Step 1990 (Global: 5990): loss=1.5317, ppl=4.63, grad_norm=1.96, lr=4.56e-06, throughput=2320 tok/s
2025-11-26 22:24:36,446 - INFO - Epoch 1 Step 2000 (Global: 6000): loss=1.6058, ppl=4.98, grad_norm=1.34, lr=4.55e-06, throughput=2326 tok/s
2025-11-26 22:24:36,446 - INFO - 
Running validation at step 6000...
2025-11-26 22:36:45,891 - WARNING - NLTK wordnet data missing - METEOR score unavailable. Run: python -m nltk.downloader wordnet omw-1.4
2025-11-26 22:36:45,910 - INFO - Validation loss: 1.6441, perplexity: 5.18
2025-11-26 22:36:45,911 - INFO - 
======================================================================
2025-11-26 22:36:45,911 - INFO - Qualitative Evaluation Samples:
2025-11-26 22:36:45,911 - INFO - ======================================================================
2025-11-26 22:36:45,912 - INFO - 
Sample 1 (ID: sample_141920_chunk_1):
2025-11-26 22:36:45,912 - INFO - Context:      [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-26 22:36:45,913 - INFO - Generated:    ' to the band\'s previous work, saying that "Death Cab for Cutie is a lot more mature, a lot more confident, a lot more confident in their own abilities, and they\'re not afraid to be themselves. And tha...'
2025-11-26 22:36:45,913 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-26 22:36:45,913 - INFO - ----------------------------------------------------------------------
2025-11-26 22:36:45,914 - INFO - 
Sample 2 (ID: sample_170543_chunk_2):
2025-11-26 22:36:45,914 - INFO - Context:      [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-26 22:36:45,914 - INFO - Generated:    'aternity and sorority houses, but it was not uncommon for fraternity and sorority houses to have Native American-themed events. The Order of the Arrow has a Native American theme, and the Order of the...'
2025-11-26 22:36:45,915 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-26 22:36:45,915 - INFO - ----------------------------------------------------------------------
2025-11-26 22:36:45,915 - INFO - 
Sample 3 (ID: sample_107152_chunk_9):
2025-11-26 22:36:45,916 - INFO - Context:      [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-26 22:36:45,916 - INFO - Generated:    " be killed by the Red Tail's leader, Shigeki. Oga is then forced to fight the Red Tail's leader, Shigeki, in a duel, but is defeated. Oga is then forced to fight the Red Tail's leader, Shigeki, in a d..."
2025-11-26 22:36:45,917 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-26 22:36:45,917 - INFO - ----------------------------------------------------------------------
2025-11-26 22:36:45,917 - INFO - 
Sample 4 (ID: sample_069148_chunk_0):
2025-11-26 22:36:45,918 - INFO - Context:      [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-26 22:36:45,918 - INFO - Generated:    '-1 | 1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-32-33-34-35-36-37-38-39-40-41-42-43-44-45-46-47-48-49-50-51-52-53-54-55-56-57-58-59-60-61-62-63-64-65-66-67-68-...'
2025-11-26 22:36:45,919 - INFO - Ground Truth: '-056  |             |                  | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam                                                 ...'
2025-11-26 22:36:45,919 - INFO - ----------------------------------------------------------------------
2025-11-26 22:36:45,919 - INFO - 
Sample 5 (ID: sample_103176_chunk_4):
2025-11-26 22:36:45,920 - INFO - Context:      [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-26 22:36:45,920 - INFO - Generated:    '1 | BlackBerry PlayBook | EA Tiburon       | [ 150 ] |\n| Madden NFL 12                                 | August 30, 2011 | iOS               | EA Tiburon       | [ 150 ] |\n| Madden NFL 12             ...'
2025-11-26 22:36:45,921 - INFO - Ground Truth: '1                     | PlayStation 2             | EA Tiburon                                                        | [ 150 ]                 |\n| Madden NFL 12                                       ...'
2025-11-26 22:36:45,921 - INFO - ----------------------------------------------------------------------
2025-11-26 22:36:45,924 - INFO - 
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_6000.jsonl
2025-11-26 22:39:11,210 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-26 22:39:11,225 - INFO - New best validation loss: 1.6441, perplexity: 5.18
2025-11-26 22:42:42,003 - INFO - Epoch 1 Step 2010 (Global: 6010): loss=1.5365, ppl=4.65, grad_norm=1.12, lr=4.53e-06, throughput=2278 tok/s
2025-11-26 22:46:11,278 - INFO - Epoch 1 Step 2020 (Global: 6020): loss=1.5825, ppl=4.87, grad_norm=1.89, lr=4.51e-06, throughput=2294 tok/s
2025-11-26 22:49:35,682 - INFO - Epoch 1 Step 2030 (Global: 6030): loss=1.8149, ppl=6.14, grad_norm=2.05, lr=4.50e-06, throughput=2348 tok/s
2025-11-26 22:53:04,832 - INFO - Epoch 1 Step 2040 (Global: 6040): loss=1.5978, ppl=4.94, grad_norm=1.34, lr=4.48e-06, throughput=2295 tok/s
2025-11-26 22:56:33,844 - INFO - Epoch 1 Step 2050 (Global: 6050): loss=1.6242, ppl=5.07, grad_norm=1.59, lr=4.46e-06, throughput=2297 tok/s
2025-11-26 23:00:01,675 - INFO - Epoch 1 Step 2060 (Global: 6060): loss=1.7453, ppl=5.73, grad_norm=3.45, lr=4.45e-06, throughput=2310 tok/s
2025-11-26 23:03:30,408 - INFO - Epoch 1 Step 2070 (Global: 6070): loss=1.9817, ppl=7.26, grad_norm=1.52, lr=4.43e-06, throughput=2300 tok/s
2025-11-26 23:06:57,547 - INFO - Epoch 1 Step 2080 (Global: 6080): loss=1.5477, ppl=4.70, grad_norm=1.12, lr=4.41e-06, throughput=2317 tok/s
2025-11-26 23:10:24,976 - INFO - Epoch 1 Step 2090 (Global: 6090): loss=1.6698, ppl=5.31, grad_norm=1.52, lr=4.40e-06, throughput=2314 tok/s
2025-11-26 23:13:54,344 - INFO - Epoch 1 Step 2100 (Global: 6100): loss=1.4559, ppl=4.29, grad_norm=1.73, lr=4.38e-06, throughput=2293 tok/s
2025-11-26 23:17:22,072 - INFO - Epoch 1 Step 2110 (Global: 6110): loss=1.3795, ppl=3.97, grad_norm=1.01, lr=4.36e-06, throughput=2311 tok/s
2025-11-26 23:20:50,207 - INFO - Epoch 1 Step 2120 (Global: 6120): loss=1.6584, ppl=5.25, grad_norm=2.06, lr=4.35e-06, throughput=2306 tok/s
2025-11-26 23:24:17,149 - INFO - Epoch 1 Step 2130 (Global: 6130): loss=1.6136, ppl=5.02, grad_norm=1.34, lr=4.33e-06, throughput=2320 tok/s
2025-11-26 23:27:43,576 - INFO - Epoch 1 Step 2140 (Global: 6140): loss=1.6063, ppl=4.98, grad_norm=1.98, lr=4.31e-06, throughput=2325 tok/s
2025-11-26 23:31:15,974 - INFO - Epoch 1 Step 2150 (Global: 6150): loss=1.8216, ppl=6.18, grad_norm=2.02, lr=4.30e-06, throughput=2260 tok/s
2025-11-26 23:34:45,061 - INFO - Epoch 1 Step 2160 (Global: 6160): loss=1.6107, ppl=5.01, grad_norm=1.16, lr=4.28e-06, throughput=2296 tok/s
2025-11-26 23:38:16,844 - INFO - Epoch 1 Step 2170 (Global: 6170): loss=1.8281, ppl=6.22, grad_norm=1.41, lr=4.26e-06, throughput=2266 tok/s
2025-11-26 23:41:44,869 - INFO - Epoch 1 Step 2180 (Global: 6180): loss=1.6540, ppl=5.23, grad_norm=1.55, lr=4.25e-06, throughput=2307 tok/s
2025-11-26 23:45:15,971 - INFO - Epoch 1 Step 2190 (Global: 6190): loss=1.5850, ppl=4.88, grad_norm=1.31, lr=4.23e-06, throughput=2274 tok/s
2025-11-26 23:48:45,636 - INFO - Epoch 1 Step 2200 (Global: 6200): loss=1.5717, ppl=4.81, grad_norm=1.27, lr=4.21e-06, throughput=2289 tok/s
2025-11-26 23:52:15,851 - INFO - Epoch 1 Step 2210 (Global: 6210): loss=1.5257, ppl=4.60, grad_norm=1.24, lr=4.20e-06, throughput=2283 tok/s
2025-11-26 23:55:46,844 - INFO - Epoch 1 Step 2220 (Global: 6220): loss=1.7978, ppl=6.04, grad_norm=1.55, lr=4.18e-06, throughput=2275 tok/s
2025-11-26 23:59:15,622 - INFO - Epoch 1 Step 2230 (Global: 6230): loss=1.5428, ppl=4.68, grad_norm=1.26, lr=4.16e-06, throughput=2299 tok/s
2025-11-27 00:02:47,405 - INFO - Epoch 1 Step 2240 (Global: 6240): loss=1.5888, ppl=4.90, grad_norm=1.55, lr=4.15e-06, throughput=2266 tok/s
2025-11-27 00:06:16,560 - INFO - Epoch 1 Step 2250 (Global: 6250): loss=1.3918, ppl=4.02, grad_norm=1.64, lr=4.13e-06, throughput=2295 tok/s
2025-11-27 00:09:46,152 - INFO - Epoch 1 Step 2260 (Global: 6260): loss=1.7444, ppl=5.72, grad_norm=1.93, lr=4.12e-06, throughput=2290 tok/s
2025-11-27 00:13:18,192 - INFO - Epoch 1 Step 2270 (Global: 6270): loss=1.6026, ppl=4.97, grad_norm=3.36, lr=4.10e-06, throughput=2264 tok/s
2025-11-27 00:16:47,801 - INFO - Epoch 1 Step 2280 (Global: 6280): loss=1.4486, ppl=4.26, grad_norm=1.48, lr=4.08e-06, throughput=2290 tok/s
2025-11-27 00:20:16,925 - INFO - Epoch 1 Step 2290 (Global: 6290): loss=1.5349, ppl=4.64, grad_norm=1.62, lr=4.07e-06, throughput=2295 tok/s
2025-11-27 00:23:46,492 - INFO - Epoch 1 Step 2300 (Global: 6300): loss=1.6205, ppl=5.06, grad_norm=1.27, lr=4.05e-06, throughput=2290 tok/s
2025-11-27 00:27:12,439 - INFO - Epoch 1 Step 2310 (Global: 6310): loss=1.4740, ppl=4.37, grad_norm=1.75, lr=4.03e-06, throughput=2331 tok/s
2025-11-27 00:30:39,262 - INFO - Epoch 1 Step 2320 (Global: 6320): loss=1.8079, ppl=6.10, grad_norm=1.62, lr=4.02e-06, throughput=2321 tok/s
2025-11-27 00:34:08,568 - INFO - Epoch 1 Step 2330 (Global: 6330): loss=1.5097, ppl=4.53, grad_norm=1.89, lr=4.00e-06, throughput=2293 tok/s
2025-11-27 00:37:36,224 - INFO - Epoch 1 Step 2340 (Global: 6340): loss=1.6409, ppl=5.16, grad_norm=1.66, lr=3.98e-06, throughput=2312 tok/s
2025-11-27 00:41:07,289 - INFO - Epoch 1 Step 2350 (Global: 6350): loss=1.5278, ppl=4.61, grad_norm=2.30, lr=3.97e-06, throughput=2274 tok/s
2025-11-27 00:44:38,985 - INFO - Epoch 1 Step 2360 (Global: 6360): loss=1.5100, ppl=4.53, grad_norm=1.91, lr=3.95e-06, throughput=2267 tok/s
2025-11-27 00:48:08,011 - INFO - Epoch 1 Step 2370 (Global: 6370): loss=1.5218, ppl=4.58, grad_norm=1.75, lr=3.93e-06, throughput=2296 tok/s
2025-11-27 00:51:37,415 - INFO - Epoch 1 Step 2380 (Global: 6380): loss=1.7147, ppl=5.55, grad_norm=1.14, lr=3.92e-06, throughput=2292 tok/s
2025-11-27 00:55:08,049 - INFO - Epoch 1 Step 2390 (Global: 6390): loss=1.6802, ppl=5.37, grad_norm=1.06, lr=3.90e-06, throughput=2279 tok/s
2025-11-27 00:58:38,419 - INFO - Epoch 1 Step 2400 (Global: 6400): loss=1.6613, ppl=5.27, grad_norm=1.12, lr=3.89e-06, throughput=2282 tok/s
2025-11-27 01:02:09,669 - INFO - Epoch 1 Step 2410 (Global: 6410): loss=1.6886, ppl=5.41, grad_norm=1.30, lr=3.87e-06, throughput=2272 tok/s
2025-11-27 01:05:40,972 - INFO - Epoch 1 Step 2420 (Global: 6420): loss=1.6153, ppl=5.03, grad_norm=1.31, lr=3.85e-06, throughput=2272 tok/s
2025-11-27 01:09:03,975 - INFO - Epoch 1 Step 2430 (Global: 6430): loss=1.8013, ppl=6.06, grad_norm=1.36, lr=3.84e-06, throughput=2365 tok/s
2025-11-27 01:13:01,774 - INFO - Epoch 1 Step 2440 (Global: 6440): loss=1.5954, ppl=4.93, grad_norm=2.16, lr=3.82e-06, throughput=2019 tok/s
2025-11-27 01:17:07,423 - INFO - Epoch 1 Step 2450 (Global: 6450): loss=1.4617, ppl=4.31, grad_norm=1.45, lr=3.80e-06, throughput=1954 tok/s
2025-11-27 01:20:37,024 - INFO - Epoch 1 Step 2460 (Global: 6460): loss=1.8627, ppl=6.44, grad_norm=1.40, lr=3.79e-06, throughput=2290 tok/s
2025-11-27 01:24:06,975 - INFO - Epoch 1 Step 2470 (Global: 6470): loss=1.5834, ppl=4.87, grad_norm=1.37, lr=3.77e-06, throughput=2286 tok/s
2025-11-27 01:27:41,438 - INFO - Epoch 1 Step 2480 (Global: 6480): loss=1.7328, ppl=5.66, grad_norm=1.52, lr=3.76e-06, throughput=2238 tok/s
2025-11-27 01:31:13,298 - INFO - Epoch 1 Step 2490 (Global: 6490): loss=1.4429, ppl=4.23, grad_norm=2.61, lr=3.74e-06, throughput=2266 tok/s
2025-11-27 01:34:44,742 - INFO - Epoch 1 Step 2500 (Global: 6500): loss=1.8865, ppl=6.60, grad_norm=1.91, lr=3.72e-06, throughput=2270 tok/s
2025-11-27 01:38:17,628 - INFO - Epoch 1 Step 2510 (Global: 6510): loss=1.4727, ppl=4.36, grad_norm=1.68, lr=3.71e-06, throughput=2255 tok/s
2025-11-27 01:41:51,485 - INFO - Epoch 1 Step 2520 (Global: 6520): loss=1.7039, ppl=5.50, grad_norm=2.08, lr=3.69e-06, throughput=2245 tok/s
2025-11-27 01:45:23,690 - INFO - Epoch 1 Step 2530 (Global: 6530): loss=1.7020, ppl=5.48, grad_norm=1.52, lr=3.67e-06, throughput=2262 tok/s
2025-11-27 01:48:57,051 - INFO - Epoch 1 Step 2540 (Global: 6540): loss=1.6842, ppl=5.39, grad_norm=1.38, lr=3.66e-06, throughput=2250 tok/s
2025-11-27 01:52:27,567 - INFO - Epoch 1 Step 2550 (Global: 6550): loss=1.6493, ppl=5.20, grad_norm=1.91, lr=3.64e-06, throughput=2280 tok/s
2025-11-27 01:56:01,798 - INFO - Epoch 1 Step 2560 (Global: 6560): loss=1.7493, ppl=5.75, grad_norm=2.08, lr=3.63e-06, throughput=2241 tok/s
2025-11-27 01:59:33,645 - INFO - Epoch 1 Step 2570 (Global: 6570): loss=1.3693, ppl=3.93, grad_norm=1.52, lr=3.61e-06, throughput=2266 tok/s
2025-11-27 02:03:05,696 - INFO - Epoch 1 Step 2580 (Global: 6580): loss=1.6137, ppl=5.02, grad_norm=2.17, lr=3.59e-06, throughput=2264 tok/s
2025-11-27 02:06:38,011 - INFO - Epoch 1 Step 2590 (Global: 6590): loss=1.5096, ppl=4.52, grad_norm=1.23, lr=3.58e-06, throughput=2261 tok/s
2025-11-27 02:10:09,728 - INFO - Epoch 1 Step 2600 (Global: 6600): loss=1.5652, ppl=4.78, grad_norm=1.26, lr=3.56e-06, throughput=2267 tok/s
2025-11-27 02:13:40,824 - INFO - Epoch 1 Step 2610 (Global: 6610): loss=1.6559, ppl=5.24, grad_norm=1.44, lr=3.55e-06, throughput=2274 tok/s
2025-11-27 02:17:13,358 - INFO - Epoch 1 Step 2620 (Global: 6620): loss=1.5663, ppl=4.79, grad_norm=1.62, lr=3.53e-06, throughput=2258 tok/s
2025-11-27 02:20:48,082 - INFO - Epoch 1 Step 2630 (Global: 6630): loss=1.5840, ppl=4.87, grad_norm=1.52, lr=3.51e-06, throughput=2235 tok/s
2025-11-27 02:24:20,093 - INFO - Epoch 1 Step 2640 (Global: 6640): loss=2.0053, ppl=7.43, grad_norm=1.66, lr=3.50e-06, throughput=2264 tok/s
2025-11-27 02:27:51,743 - INFO - Epoch 1 Step 2650 (Global: 6650): loss=1.4028, ppl=4.07, grad_norm=1.44, lr=3.48e-06, throughput=2268 tok/s
2025-11-27 02:31:23,911 - INFO - Epoch 1 Step 2660 (Global: 6660): loss=1.4283, ppl=4.17, grad_norm=1.24, lr=3.47e-06, throughput=2262 tok/s
2025-11-27 02:34:56,040 - INFO - Epoch 1 Step 2670 (Global: 6670): loss=1.7759, ppl=5.91, grad_norm=1.23, lr=3.45e-06, throughput=2263 tok/s
2025-11-27 02:38:30,700 - INFO - Epoch 1 Step 2680 (Global: 6680): loss=1.7219, ppl=5.60, grad_norm=1.24, lr=3.43e-06, throughput=2236 tok/s
2025-11-27 02:42:02,147 - INFO - Epoch 1 Step 2690 (Global: 6690): loss=1.6588, ppl=5.25, grad_norm=1.44, lr=3.42e-06, throughput=2270 tok/s
2025-11-27 02:45:34,746 - INFO - Epoch 1 Step 2700 (Global: 6700): loss=1.5907, ppl=4.91, grad_norm=1.29, lr=3.40e-06, throughput=2258 tok/s
2025-11-27 02:49:07,484 - INFO - Epoch 1 Step 2710 (Global: 6710): loss=1.6153, ppl=5.03, grad_norm=1.61, lr=3.39e-06, throughput=2256 tok/s
2025-11-27 02:52:42,038 - INFO - Epoch 1 Step 2720 (Global: 6720): loss=1.7161, ppl=5.56, grad_norm=2.22, lr=3.37e-06, throughput=2237 tok/s
2025-11-27 02:56:17,254 - INFO - Epoch 1 Step 2730 (Global: 6730): loss=1.5639, ppl=4.78, grad_norm=1.35, lr=3.35e-06, throughput=2230 tok/s
2025-11-27 02:59:48,692 - INFO - Epoch 1 Step 2740 (Global: 6740): loss=1.6292, ppl=5.10, grad_norm=1.47, lr=3.34e-06, throughput=2270 tok/s
2025-11-27 03:03:23,083 - INFO - Epoch 1 Step 2750 (Global: 6750): loss=1.7152, ppl=5.56, grad_norm=1.44, lr=3.32e-06, throughput=2239 tok/s
2025-11-27 03:06:56,939 - INFO - Epoch 1 Step 2760 (Global: 6760): loss=1.4593, ppl=4.30, grad_norm=1.41, lr=3.31e-06, throughput=2245 tok/s
2025-11-27 03:10:29,418 - INFO - Epoch 1 Step 2770 (Global: 6770): loss=1.5058, ppl=4.51, grad_norm=1.57, lr=3.29e-06, throughput=2259 tok/s
2025-11-27 03:14:01,934 - INFO - Epoch 1 Step 2780 (Global: 6780): loss=1.6385, ppl=5.15, grad_norm=1.85, lr=3.28e-06, throughput=2259 tok/s
2025-11-27 03:17:33,346 - INFO - Epoch 1 Step 2790 (Global: 6790): loss=1.5304, ppl=4.62, grad_norm=1.12, lr=3.26e-06, throughput=2270 tok/s
2025-11-27 03:21:05,669 - INFO - Epoch 1 Step 2800 (Global: 6800): loss=1.6664, ppl=5.29, grad_norm=1.17, lr=3.24e-06, throughput=2261 tok/s
2025-11-27 03:24:40,485 - INFO - Epoch 1 Step 2810 (Global: 6810): loss=1.7402, ppl=5.70, grad_norm=1.38, lr=3.23e-06, throughput=2234 tok/s
2025-11-27 03:28:13,414 - INFO - Epoch 1 Step 2820 (Global: 6820): loss=1.6874, ppl=5.41, grad_norm=1.63, lr=3.21e-06, throughput=2254 tok/s
2025-11-27 03:31:45,484 - INFO - Epoch 1 Step 2830 (Global: 6830): loss=1.4830, ppl=4.41, grad_norm=2.45, lr=3.20e-06, throughput=2263 tok/s
2025-11-27 03:35:20,038 - INFO - Epoch 1 Step 2840 (Global: 6840): loss=1.7287, ppl=5.63, grad_norm=1.17, lr=3.18e-06, throughput=2237 tok/s
2025-11-27 03:38:51,740 - INFO - Epoch 1 Step 2850 (Global: 6850): loss=1.5791, ppl=4.85, grad_norm=1.16, lr=3.17e-06, throughput=2267 tok/s
2025-11-27 03:42:23,947 - INFO - Epoch 1 Step 2860 (Global: 6860): loss=1.6467, ppl=5.19, grad_norm=1.68, lr=3.15e-06, throughput=2262 tok/s
2025-11-27 03:45:58,573 - INFO - Epoch 1 Step 2870 (Global: 6870): loss=1.5430, ppl=4.68, grad_norm=1.31, lr=3.13e-06, throughput=2236 tok/s
2025-11-27 03:49:28,222 - INFO - Epoch 1 Step 2880 (Global: 6880): loss=1.4145, ppl=4.11, grad_norm=2.98, lr=3.12e-06, throughput=2290 tok/s
2025-11-27 03:53:02,378 - INFO - Epoch 1 Step 2890 (Global: 6890): loss=1.5964, ppl=4.94, grad_norm=1.98, lr=3.10e-06, throughput=2241 tok/s
2025-11-27 03:56:33,482 - INFO - Epoch 1 Step 2900 (Global: 6900): loss=1.5785, ppl=4.85, grad_norm=1.33, lr=3.09e-06, throughput=2274 tok/s
2025-11-27 04:00:05,916 - INFO - Epoch 1 Step 2910 (Global: 6910): loss=1.7358, ppl=5.67, grad_norm=1.74, lr=3.07e-06, throughput=2260 tok/s
2025-11-27 04:03:39,852 - INFO - Epoch 1 Step 2920 (Global: 6920): loss=1.6620, ppl=5.27, grad_norm=1.26, lr=3.06e-06, throughput=2244 tok/s
2025-11-27 04:07:11,250 - INFO - Epoch 1 Step 2930 (Global: 6930): loss=1.6128, ppl=5.02, grad_norm=1.31, lr=3.04e-06, throughput=2271 tok/s
2025-11-27 04:10:44,629 - INFO - Epoch 1 Step 2940 (Global: 6940): loss=1.7695, ppl=5.87, grad_norm=1.07, lr=3.03e-06, throughput=2250 tok/s
2025-11-27 04:14:19,848 - INFO - Epoch 1 Step 2950 (Global: 6950): loss=1.5935, ppl=4.92, grad_norm=1.42, lr=3.01e-06, throughput=2230 tok/s
2025-11-27 04:17:54,050 - INFO - Epoch 1 Step 2960 (Global: 6960): loss=1.3491, ppl=3.85, grad_norm=1.72, lr=3.00e-06, throughput=2241 tok/s
2025-11-27 04:21:27,378 - INFO - Epoch 1 Step 2970 (Global: 6970): loss=1.6235, ppl=5.07, grad_norm=1.40, lr=2.98e-06, throughput=2250 tok/s
2025-11-27 04:25:01,629 - INFO - Epoch 1 Step 2980 (Global: 6980): loss=1.8911, ppl=6.63, grad_norm=2.16, lr=2.96e-06, throughput=2240 tok/s
2025-11-27 04:28:34,000 - INFO - Epoch 1 Step 2990 (Global: 6990): loss=1.6364, ppl=5.14, grad_norm=3.81, lr=2.95e-06, throughput=2260 tok/s
2025-11-27 04:32:05,377 - INFO - Epoch 1 Step 3000 (Global: 7000): loss=1.7852, ppl=5.96, grad_norm=2.25, lr=2.93e-06, throughput=2271 tok/s
2025-11-27 04:35:41,360 - INFO - Epoch 1 Step 3010 (Global: 7010): loss=1.5919, ppl=4.91, grad_norm=1.64, lr=2.92e-06, throughput=2222 tok/s
2025-11-27 04:39:12,066 - INFO - Epoch 1 Step 3020 (Global: 7020): loss=1.4436, ppl=4.24, grad_norm=1.89, lr=2.90e-06, throughput=2278 tok/s
2025-11-27 04:42:44,384 - INFO - Epoch 1 Step 3030 (Global: 7030): loss=1.6126, ppl=5.02, grad_norm=1.38, lr=2.89e-06, throughput=2261 tok/s
2025-11-27 04:46:15,860 - INFO - Epoch 1 Step 3040 (Global: 7040): loss=1.6120, ppl=5.01, grad_norm=1.96, lr=2.87e-06, throughput=2270 tok/s
2025-11-27 04:49:49,439 - INFO - Epoch 1 Step 3050 (Global: 7050): loss=1.3577, ppl=3.89, grad_norm=1.26, lr=2.86e-06, throughput=2247 tok/s
2025-11-27 04:53:21,224 - INFO - Epoch 1 Step 3060 (Global: 7060): loss=1.7837, ppl=5.95, grad_norm=1.59, lr=2.84e-06, throughput=2266 tok/s
2025-11-27 04:56:57,455 - INFO - Epoch 1 Step 3070 (Global: 7070): loss=1.7341, ppl=5.66, grad_norm=1.16, lr=2.83e-06, throughput=2220 tok/s
2025-11-27 05:00:28,194 - INFO - Epoch 1 Step 3080 (Global: 7080): loss=1.4054, ppl=4.08, grad_norm=1.04, lr=2.81e-06, throughput=2278 tok/s
2025-11-27 05:04:03,004 - INFO - Epoch 1 Step 3090 (Global: 7090): loss=1.6133, ppl=5.02, grad_norm=1.49, lr=2.80e-06, throughput=2235 tok/s
2025-11-27 05:07:35,155 - INFO - Epoch 1 Step 3100 (Global: 7100): loss=1.5820, ppl=4.86, grad_norm=1.10, lr=2.78e-06, throughput=2263 tok/s
2025-11-27 05:11:09,449 - INFO - Epoch 1 Step 3110 (Global: 7110): loss=1.5124, ppl=4.54, grad_norm=1.25, lr=2.77e-06, throughput=2240 tok/s
2025-11-27 05:14:44,271 - INFO - Epoch 1 Step 3120 (Global: 7120): loss=1.7204, ppl=5.59, grad_norm=1.40, lr=2.75e-06, throughput=2234 tok/s
2025-11-27 05:18:14,957 - INFO - Epoch 1 Step 3130 (Global: 7130): loss=1.5938, ppl=4.92, grad_norm=1.75, lr=2.74e-06, throughput=2278 tok/s
2025-11-27 05:21:49,940 - INFO - Epoch 1 Step 3140 (Global: 7140): loss=1.6331, ppl=5.12, grad_norm=1.55, lr=2.72e-06, throughput=2233 tok/s
2025-11-27 05:25:21,455 - INFO - Epoch 1 Step 3150 (Global: 7150): loss=1.7229, ppl=5.60, grad_norm=1.48, lr=2.71e-06, throughput=2269 tok/s
2025-11-27 05:28:56,025 - INFO - Epoch 1 Step 3160 (Global: 7160): loss=1.6770, ppl=5.35, grad_norm=1.52, lr=2.69e-06, throughput=2237 tok/s
2025-11-27 05:32:29,444 - INFO - Epoch 1 Step 3170 (Global: 7170): loss=1.6378, ppl=5.14, grad_norm=1.28, lr=2.68e-06, throughput=2249 tok/s
2025-11-27 05:36:03,748 - INFO - Epoch 1 Step 3180 (Global: 7180): loss=1.4781, ppl=4.38, grad_norm=1.46, lr=2.66e-06, throughput=2240 tok/s
2025-11-27 05:39:36,263 - INFO - Epoch 1 Step 3190 (Global: 7190): loss=1.6835, ppl=5.38, grad_norm=2.48, lr=2.65e-06, throughput=2259 tok/s
2025-11-27 05:43:07,326 - INFO - Epoch 1 Step 3200 (Global: 7200): loss=1.5887, ppl=4.90, grad_norm=1.26, lr=2.63e-06, throughput=2274 tok/s
2025-11-27 05:46:42,343 - INFO - Epoch 1 Step 3210 (Global: 7210): loss=1.7665, ppl=5.85, grad_norm=2.20, lr=2.62e-06, throughput=2232 tok/s
2025-11-27 05:50:17,497 - INFO - Epoch 1 Step 3220 (Global: 7220): loss=1.7307, ppl=5.64, grad_norm=1.27, lr=2.60e-06, throughput=2231 tok/s
2025-11-27 05:53:48,942 - INFO - Epoch 1 Step 3230 (Global: 7230): loss=1.5067, ppl=4.51, grad_norm=1.71, lr=2.59e-06, throughput=2270 tok/s
2025-11-27 05:57:18,000 - INFO - Epoch 1 Step 3240 (Global: 7240): loss=1.5121, ppl=4.54, grad_norm=1.45, lr=2.58e-06, throughput=2296 tok/s
2025-11-27 06:00:45,481 - INFO - Epoch 1 Step 3250 (Global: 7250): loss=1.7415, ppl=5.71, grad_norm=1.89, lr=2.56e-06, throughput=2313 tok/s
2025-11-27 06:04:11,378 - INFO - Epoch 1 Step 3260 (Global: 7260): loss=1.5100, ppl=4.53, grad_norm=1.92, lr=2.55e-06, throughput=2331 tok/s
2025-11-27 06:07:37,626 - INFO - Epoch 1 Step 3270 (Global: 7270): loss=1.7311, ppl=5.65, grad_norm=1.06, lr=2.53e-06, throughput=2327 tok/s
2025-11-27 06:11:02,626 - INFO - Epoch 1 Step 3280 (Global: 7280): loss=1.4662, ppl=4.33, grad_norm=1.81, lr=2.52e-06, throughput=2342 tok/s
2025-11-27 06:14:27,117 - INFO - Epoch 1 Step 3290 (Global: 7290): loss=1.4278, ppl=4.17, grad_norm=1.48, lr=2.50e-06, throughput=2347 tok/s
2025-11-27 06:17:52,734 - INFO - Epoch 1 Step 3300 (Global: 7300): loss=1.6611, ppl=5.26, grad_norm=1.73, lr=2.49e-06, throughput=2334 tok/s
2025-11-27 06:21:18,277 - INFO - Epoch 1 Step 3310 (Global: 7310): loss=1.8076, ppl=6.10, grad_norm=1.72, lr=2.47e-06, throughput=2335 tok/s
2025-11-27 06:24:42,730 - INFO - Epoch 1 Step 3320 (Global: 7320): loss=1.6935, ppl=5.44, grad_norm=1.54, lr=2.46e-06, throughput=2348 tok/s
2025-11-27 06:28:07,968 - INFO - Epoch 1 Step 3330 (Global: 7330): loss=1.7430, ppl=5.71, grad_norm=1.32, lr=2.44e-06, throughput=2339 tok/s
2025-11-27 06:31:33,086 - INFO - Epoch 1 Step 3340 (Global: 7340): loss=1.5017, ppl=4.49, grad_norm=2.42, lr=2.43e-06, throughput=2340 tok/s
2025-11-27 06:34:57,839 - INFO - Epoch 1 Step 3350 (Global: 7350): loss=1.5907, ppl=4.91, grad_norm=1.55, lr=2.42e-06, throughput=2344 tok/s
2025-11-27 06:38:23,287 - INFO - Epoch 1 Step 3360 (Global: 7360): loss=1.6143, ppl=5.02, grad_norm=1.73, lr=2.40e-06, throughput=2336 tok/s
2025-11-27 06:41:48,191 - INFO - Epoch 1 Step 3370 (Global: 7370): loss=1.7264, ppl=5.62, grad_norm=1.26, lr=2.39e-06, throughput=2343 tok/s
2025-11-27 06:45:12,994 - INFO - Epoch 1 Step 3380 (Global: 7380): loss=1.4600, ppl=4.31, grad_norm=1.06, lr=2.37e-06, throughput=2344 tok/s
2025-11-27 06:48:37,156 - INFO - Epoch 1 Step 3390 (Global: 7390): loss=1.8470, ppl=6.34, grad_norm=1.35, lr=2.36e-06, throughput=2351 tok/s
2025-11-27 06:52:01,075 - INFO - Epoch 1 Step 3400 (Global: 7400): loss=1.5430, ppl=4.68, grad_norm=3.88, lr=2.34e-06, throughput=2354 tok/s
2025-11-27 06:55:25,850 - INFO - Epoch 1 Step 3410 (Global: 7410): loss=1.6049, ppl=4.98, grad_norm=1.44, lr=2.33e-06, throughput=2344 tok/s
2025-11-27 06:58:49,637 - INFO - Epoch 1 Step 3420 (Global: 7420): loss=1.7217, ppl=5.59, grad_norm=1.66, lr=2.32e-06, throughput=2355 tok/s
2025-11-27 07:02:19,931 - INFO - Epoch 1 Step 3430 (Global: 7430): loss=1.9944, ppl=7.35, grad_norm=1.33, lr=2.30e-06, throughput=2283 tok/s
2025-11-27 07:05:49,217 - INFO - Epoch 1 Step 3440 (Global: 7440): loss=1.6351, ppl=5.13, grad_norm=1.51, lr=2.29e-06, throughput=2294 tok/s
2025-11-27 07:09:18,712 - INFO - Epoch 1 Step 3450 (Global: 7450): loss=1.6559, ppl=5.24, grad_norm=1.51, lr=2.27e-06, throughput=2291 tok/s
2025-11-27 07:12:49,970 - INFO - Epoch 1 Step 3460 (Global: 7460): loss=1.6631, ppl=5.28, grad_norm=1.40, lr=2.26e-06, throughput=2272 tok/s
2025-11-27 07:16:21,851 - INFO - Epoch 1 Step 3470 (Global: 7470): loss=1.7262, ppl=5.62, grad_norm=1.09, lr=2.25e-06, throughput=2265 tok/s
2025-11-27 07:19:49,361 - INFO - Epoch 1 Step 3480 (Global: 7480): loss=1.5154, ppl=4.55, grad_norm=1.38, lr=2.23e-06, throughput=2313 tok/s
2025-11-27 07:23:22,965 - INFO - Epoch 1 Step 3490 (Global: 7490): loss=1.8209, ppl=6.18, grad_norm=1.34, lr=2.22e-06, throughput=2247 tok/s
2025-11-27 07:26:50,157 - INFO - Epoch 1 Step 3500 (Global: 7500): loss=1.6126, ppl=5.02, grad_norm=1.37, lr=2.20e-06, throughput=2317 tok/s
2025-11-27 07:30:20,635 - INFO - Epoch 1 Step 3510 (Global: 7510): loss=1.7271, ppl=5.62, grad_norm=1.79, lr=2.19e-06, throughput=2281 tok/s
2025-11-27 07:33:50,580 - INFO - Epoch 1 Step 3520 (Global: 7520): loss=1.4353, ppl=4.20, grad_norm=1.56, lr=2.18e-06, throughput=2286 tok/s
2025-11-27 07:37:15,090 - INFO - Epoch 1 Step 3530 (Global: 7530): loss=1.6192, ppl=5.05, grad_norm=1.27, lr=2.16e-06, throughput=2347 tok/s
2025-11-27 07:40:39,104 - INFO - Epoch 1 Step 3540 (Global: 7540): loss=1.3397, ppl=3.82, grad_norm=1.90, lr=2.15e-06, throughput=2353 tok/s
2025-11-27 07:44:03,219 - INFO - Epoch 1 Step 3550 (Global: 7550): loss=1.4825, ppl=4.40, grad_norm=1.41, lr=2.14e-06, throughput=2352 tok/s
2025-11-27 07:47:28,483 - INFO - Epoch 1 Step 3560 (Global: 7560): loss=1.7056, ppl=5.50, grad_norm=2.42, lr=2.12e-06, throughput=2338 tok/s
2025-11-27 07:50:53,076 - INFO - Epoch 1 Step 3570 (Global: 7570): loss=1.6305, ppl=5.11, grad_norm=2.27, lr=2.11e-06, throughput=2346 tok/s
2025-11-27 07:54:17,314 - INFO - Epoch 1 Step 3580 (Global: 7580): loss=1.5124, ppl=4.54, grad_norm=1.38, lr=2.09e-06, throughput=2350 tok/s
2025-11-27 07:57:41,930 - INFO - Epoch 1 Step 3590 (Global: 7590): loss=1.7213, ppl=5.59, grad_norm=1.35, lr=2.08e-06, throughput=2346 tok/s
2025-11-27 08:01:06,471 - INFO - Epoch 1 Step 3600 (Global: 7600): loss=1.4019, ppl=4.06, grad_norm=1.80, lr=2.07e-06, throughput=2347 tok/s
2025-11-27 08:04:31,537 - INFO - Epoch 1 Step 3610 (Global: 7610): loss=1.6517, ppl=5.22, grad_norm=1.52, lr=2.05e-06, throughput=2341 tok/s
2025-11-27 08:07:55,379 - INFO - Epoch 1 Step 3620 (Global: 7620): loss=1.4607, ppl=4.31, grad_norm=1.21, lr=2.04e-06, throughput=2355 tok/s
2025-11-27 08:11:19,959 - INFO - Epoch 1 Step 3630 (Global: 7630): loss=1.5729, ppl=4.82, grad_norm=1.47, lr=2.03e-06, throughput=2346 tok/s
2025-11-27 08:14:44,291 - INFO - Epoch 1 Step 3640 (Global: 7640): loss=1.4847, ppl=4.41, grad_norm=1.61, lr=2.01e-06, throughput=2349 tok/s
2025-11-27 08:18:07,792 - INFO - Epoch 1 Step 3650 (Global: 7650): loss=1.4906, ppl=4.44, grad_norm=1.46, lr=2.00e-06, throughput=2359 tok/s
2025-11-27 08:21:33,483 - INFO - Epoch 1 Step 3660 (Global: 7660): loss=1.6911, ppl=5.43, grad_norm=1.34, lr=1.99e-06, throughput=2334 tok/s
2025-11-27 08:25:04,883 - INFO - Epoch 1 Step 3670 (Global: 7670): loss=1.6647, ppl=5.28, grad_norm=1.20, lr=1.97e-06, throughput=2271 tok/s
2025-11-27 08:28:36,231 - INFO - Epoch 1 Step 3680 (Global: 7680): loss=1.5911, ppl=4.91, grad_norm=1.55, lr=1.96e-06, throughput=2271 tok/s
2025-11-27 08:32:08,692 - INFO - Epoch 1 Step 3690 (Global: 7690): loss=1.7500, ppl=5.75, grad_norm=1.76, lr=1.95e-06, throughput=2259 tok/s
2025-11-27 08:35:39,741 - INFO - Epoch 1 Step 3700 (Global: 7700): loss=1.6109, ppl=5.01, grad_norm=1.62, lr=1.93e-06, throughput=2274 tok/s
2025-11-27 08:39:10,504 - INFO - Epoch 1 Step 3710 (Global: 7710): loss=1.5751, ppl=4.83, grad_norm=1.41, lr=1.92e-06, throughput=2277 tok/s
2025-11-27 08:42:40,977 - INFO - Epoch 1 Step 3720 (Global: 7720): loss=1.7817, ppl=5.94, grad_norm=2.14, lr=1.91e-06, throughput=2281 tok/s
2025-11-27 08:46:12,148 - INFO - Epoch 1 Step 3730 (Global: 7730): loss=1.5511, ppl=4.72, grad_norm=1.33, lr=1.89e-06, throughput=2273 tok/s
2025-11-27 08:49:42,209 - INFO - Epoch 1 Step 3740 (Global: 7740): loss=1.5275, ppl=4.61, grad_norm=1.43, lr=1.88e-06, throughput=2285 tok/s
2025-11-27 08:53:08,202 - INFO - Epoch 1 Step 3750 (Global: 7750): loss=1.3528, ppl=3.87, grad_norm=1.44, lr=1.87e-06, throughput=2330 tok/s
2025-11-27 08:56:39,746 - INFO - Epoch 1 Step 3760 (Global: 7760): loss=1.6997, ppl=5.47, grad_norm=1.44, lr=1.85e-06, throughput=2269 tok/s
2025-11-27 09:00:11,055 - INFO - Epoch 1 Step 3770 (Global: 7770): loss=1.6531, ppl=5.22, grad_norm=2.16, lr=1.84e-06, throughput=2272 tok/s
2025-11-27 09:03:45,342 - INFO - Epoch 1 Step 3780 (Global: 7780): loss=1.6415, ppl=5.16, grad_norm=1.21, lr=1.83e-06, throughput=2240 tok/s
2025-11-27 09:07:18,453 - INFO - Epoch 1 Step 3790 (Global: 7790): loss=1.6298, ppl=5.10, grad_norm=1.71, lr=1.82e-06, throughput=2252 tok/s
2025-11-27 09:10:48,890 - INFO - Epoch 1 Step 3800 (Global: 7800): loss=1.7036, ppl=5.49, grad_norm=1.50, lr=1.80e-06, throughput=2281 tok/s
2025-11-27 09:14:19,119 - INFO - Epoch 1 Step 3810 (Global: 7810): loss=1.7550, ppl=5.78, grad_norm=2.45, lr=1.79e-06, throughput=2283 tok/s
2025-11-27 09:17:51,891 - INFO - Epoch 1 Step 3820 (Global: 7820): loss=1.6258, ppl=5.08, grad_norm=1.13, lr=1.78e-06, throughput=2256 tok/s
2025-11-27 09:21:23,338 - INFO - Epoch 1 Step 3830 (Global: 7830): loss=1.6043, ppl=4.97, grad_norm=1.38, lr=1.76e-06, throughput=2270 tok/s
2025-11-27 09:24:54,182 - INFO - Epoch 1 Step 3840 (Global: 7840): loss=1.8708, ppl=6.49, grad_norm=2.42, lr=1.75e-06, throughput=2277 tok/s
2025-11-27 09:28:24,902 - INFO - Epoch 1 Step 3850 (Global: 7850): loss=1.5286, ppl=4.61, grad_norm=1.38, lr=1.74e-06, throughput=2278 tok/s
2025-11-27 09:31:54,887 - INFO - Epoch 1 Step 3860 (Global: 7860): loss=1.3482, ppl=3.85, grad_norm=1.13, lr=1.73e-06, throughput=2286 tok/s
2025-11-27 09:35:24,821 - INFO - Epoch 1 Step 3870 (Global: 7870): loss=1.6552, ppl=5.23, grad_norm=1.18, lr=1.71e-06, throughput=2286 tok/s
2025-11-27 09:38:53,605 - INFO - Epoch 1 Step 3880 (Global: 7880): loss=1.5843, ppl=4.88, grad_norm=1.59, lr=1.70e-06, throughput=2299 tok/s
2025-11-27 09:42:18,640 - INFO - Epoch 1 Step 3890 (Global: 7890): loss=1.5446, ppl=4.69, grad_norm=1.52, lr=1.69e-06, throughput=2341 tok/s
2025-11-27 09:45:52,026 - INFO - Epoch 1 Step 3900 (Global: 7900): loss=1.3641, ppl=3.91, grad_norm=1.57, lr=1.68e-06, throughput=2249 tok/s
2025-11-27 09:49:23,985 - INFO - Epoch 1 Step 3910 (Global: 7910): loss=1.6889, ppl=5.41, grad_norm=1.37, lr=1.66e-06, throughput=2265 tok/s
2025-11-27 09:52:53,669 - INFO - Epoch 1 Step 3920 (Global: 7920): loss=1.4384, ppl=4.21, grad_norm=1.15, lr=1.65e-06, throughput=2289 tok/s
2025-11-27 09:56:25,184 - INFO - Epoch 1 Step 3930 (Global: 7930): loss=1.6867, ppl=5.40, grad_norm=1.60, lr=1.64e-06, throughput=2269 tok/s
2025-11-27 09:59:57,979 - INFO - Epoch 1 Step 3940 (Global: 7940): loss=1.4542, ppl=4.28, grad_norm=1.25, lr=1.63e-06, throughput=2256 tok/s
2025-11-27 10:03:28,185 - INFO - Epoch 1 Step 3950 (Global: 7950): loss=1.5389, ppl=4.66, grad_norm=1.52, lr=1.61e-06, throughput=2283 tok/s
2025-11-27 10:06:58,826 - INFO - Epoch 1 Step 3960 (Global: 7960): loss=1.5950, ppl=4.93, grad_norm=1.35, lr=1.60e-06, throughput=2279 tok/s
2025-11-27 10:10:30,538 - INFO - Epoch 1 Step 3970 (Global: 7970): loss=1.7780, ppl=5.92, grad_norm=1.70, lr=1.59e-06, throughput=2267 tok/s
2025-11-27 10:14:02,617 - INFO - Epoch 1 Step 3980 (Global: 7980): loss=1.5373, ppl=4.65, grad_norm=2.06, lr=1.58e-06, throughput=2263 tok/s
2025-11-27 10:17:29,544 - INFO - Epoch 1 Step 3990 (Global: 7990): loss=1.7778, ppl=5.92, grad_norm=1.57, lr=1.56e-06, throughput=2320 tok/s
2025-11-27 10:20:54,130 - INFO - Epoch 1 Step 4000 (Global: 8000): loss=1.8192, ppl=6.17, grad_norm=1.26, lr=1.55e-06, throughput=2346 tok/s
2025-11-27 10:20:54,131 - INFO - 
Running validation at step 8000...
2025-11-27 10:32:54,275 - INFO - Validation loss: 1.6272, perplexity: 5.09
2025-11-27 10:32:54,276 - INFO - 
======================================================================
2025-11-27 10:32:54,276 - INFO - Qualitative Evaluation Samples:
2025-11-27 10:32:54,277 - INFO - ======================================================================
2025-11-27 10:32:54,277 - INFO - 
Sample 1 (ID: sample_141920_chunk_1):
2025-11-27 10:32:54,277 - INFO - Context:      [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-27 10:32:54,277 - INFO - Generated:    ' to the band\'s previous work, saying that "it\'s a little more experimental, a little more experimental than the last two, but it\'s still the same Codex and Keys, and it\'s still the same Codex and Keys...'
2025-11-27 10:32:54,278 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-27 10:32:54,278 - INFO - ----------------------------------------------------------------------
2025-11-27 10:32:54,279 - INFO - 
Sample 2 (ID: sample_170543_chunk_2):
2025-11-27 10:32:54,279 - INFO - Context:      [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-27 10:32:54,279 - INFO - Generated:    'aternalistic fraternal organizations. The Order of the Arrow was founded in 1926, and the first national convention was held in 1927. The Order of the Arrow was founded in 1928, and the first national...'
2025-11-27 10:32:54,280 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-27 10:32:54,280 - INFO - ----------------------------------------------------------------------
2025-11-27 10:32:54,281 - INFO - 
Sample 3 (ID: sample_107152_chunk_9):
2025-11-27 10:32:54,281 - INFO - Context:      [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-27 10:32:54,282 - INFO - Generated:    " be defeated by Oga. Teimou's shadow group then defeated the Red Tails and the Shingetsu Teimou's shadow group. Teimou's shadow group then defeated the Red Tails and the Shingetsu Teimou's shadow grou..."
2025-11-27 10:32:54,282 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-27 10:32:54,282 - INFO - ----------------------------------------------------------------------
2025-11-27 10:32:54,283 - INFO - 
Sample 4 (ID: sample_069148_chunk_0):
2025-11-27 10:32:54,283 - INFO - Context:      [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-27 10:32:54,284 - INFO - Generated:    '-01-01 | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff |...'
2025-11-27 10:32:54,284 - INFO - Ground Truth: '-056  |             |                  | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam                                                 ...'
2025-11-27 10:32:54,285 - INFO - ----------------------------------------------------------------------
2025-11-27 10:32:54,285 - INFO - 
Sample 5 (ID: sample_103176_chunk_4):
2025-11-27 10:32:54,285 - INFO - Context:      [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-27 10:32:54,286 - INFO - Generated:    '1 | PlayStation 3 | EA Tiburon                                 | [ 150 ] |\n| Madden NFL 12                                 | August 30, 2011 | PlayStation 3 | EA Tiburon                               ...'
2025-11-27 10:32:54,286 - INFO - Ground Truth: '1                     | PlayStation 2             | EA Tiburon                                                        | [ 150 ]                 |\n| Madden NFL 12                                       ...'
2025-11-27 10:32:54,286 - INFO - ----------------------------------------------------------------------
2025-11-27 10:32:54,288 - INFO - 
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_8000.jsonl
2025-11-27 10:35:24,351 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-27 10:35:24,371 - INFO - New best validation loss: 1.6272, perplexity: 5.09
2025-11-27 10:38:47,829 - INFO - Epoch 1 Step 4010 (Global: 8010): loss=1.6169, ppl=5.04, grad_norm=1.58, lr=1.54e-06, throughput=2360 tok/s
2025-11-27 10:42:12,374 - INFO - Epoch 1 Step 4020 (Global: 8020): loss=1.5769, ppl=4.84, grad_norm=2.05, lr=1.53e-06, throughput=2347 tok/s
2025-11-27 10:45:34,363 - INFO - Epoch 1 Step 4030 (Global: 8030): loss=1.8082, ppl=6.10, grad_norm=1.34, lr=1.52e-06, throughput=2376 tok/s
2025-11-27 10:49:03,119 - INFO - Epoch 1 Step 4040 (Global: 8040): loss=1.7761, ppl=5.91, grad_norm=1.43, lr=1.50e-06, throughput=2299 tok/s
2025-11-27 10:52:34,907 - INFO - Epoch 1 Step 4050 (Global: 8050): loss=1.6383, ppl=5.15, grad_norm=1.64, lr=1.49e-06, throughput=2266 tok/s
2025-11-27 10:56:06,357 - INFO - Epoch 1 Step 4060 (Global: 8060): loss=1.5201, ppl=4.57, grad_norm=2.16, lr=1.48e-06, throughput=2270 tok/s
2025-11-27 10:59:37,783 - INFO - Epoch 1 Step 4070 (Global: 8070): loss=1.7167, ppl=5.57, grad_norm=1.34, lr=1.47e-06, throughput=2270 tok/s
2025-11-27 11:03:08,268 - INFO - Epoch 1 Step 4080 (Global: 8080): loss=1.5123, ppl=4.54, grad_norm=1.17, lr=1.46e-06, throughput=2280 tok/s
2025-11-27 11:06:38,312 - INFO - Epoch 1 Step 4090 (Global: 8090): loss=1.5805, ppl=4.86, grad_norm=1.23, lr=1.44e-06, throughput=2285 tok/s
2025-11-27 11:10:10,480 - INFO - Epoch 1 Step 4100 (Global: 8100): loss=1.5481, ppl=4.70, grad_norm=1.41, lr=1.43e-06, throughput=2262 tok/s
2025-11-27 11:13:40,862 - INFO - Epoch 1 Step 4110 (Global: 8110): loss=1.6666, ppl=5.29, grad_norm=1.17, lr=1.42e-06, throughput=2282 tok/s
2025-11-27 11:17:11,528 - INFO - Epoch 1 Step 4120 (Global: 8120): loss=1.5787, ppl=4.85, grad_norm=1.76, lr=1.41e-06, throughput=2278 tok/s
2025-11-27 11:20:39,851 - INFO - Epoch 1 Step 4130 (Global: 8130): loss=1.7777, ppl=5.92, grad_norm=1.86, lr=1.40e-06, throughput=2304 tok/s
2025-11-27 11:24:09,760 - INFO - Epoch 1 Step 4140 (Global: 8140): loss=1.8461, ppl=6.34, grad_norm=1.44, lr=1.39e-06, throughput=2287 tok/s
2025-11-27 11:27:39,304 - INFO - Epoch 1 Step 4150 (Global: 8150): loss=1.6895, ppl=5.42, grad_norm=2.03, lr=1.37e-06, throughput=2291 tok/s
2025-11-27 11:31:09,428 - INFO - Epoch 1 Step 4160 (Global: 8160): loss=1.5404, ppl=4.67, grad_norm=1.09, lr=1.36e-06, throughput=2284 tok/s
2025-11-27 11:34:40,033 - INFO - Epoch 1 Step 4170 (Global: 8170): loss=1.6484, ppl=5.20, grad_norm=1.38, lr=1.35e-06, throughput=2279 tok/s
2025-11-27 11:38:11,850 - INFO - Epoch 1 Step 4180 (Global: 8180): loss=1.5886, ppl=4.90, grad_norm=1.46, lr=1.34e-06, throughput=2266 tok/s
2025-11-27 11:41:41,933 - INFO - Epoch 1 Step 4190 (Global: 8190): loss=1.7589, ppl=5.81, grad_norm=1.58, lr=1.33e-06, throughput=2285 tok/s
2025-11-27 11:45:10,504 - INFO - Epoch 1 Step 4200 (Global: 8200): loss=1.5511, ppl=4.72, grad_norm=1.42, lr=1.32e-06, throughput=2301 tok/s
2025-11-27 11:48:37,068 - INFO - Epoch 1 Step 4210 (Global: 8210): loss=2.0304, ppl=7.62, grad_norm=1.53, lr=1.31e-06, throughput=2324 tok/s
2025-11-27 11:52:03,812 - INFO - Epoch 1 Step 4220 (Global: 8220): loss=1.8178, ppl=6.16, grad_norm=1.40, lr=1.29e-06, throughput=2322 tok/s
2025-11-27 11:55:29,639 - INFO - Epoch 1 Step 4230 (Global: 8230): loss=1.6033, ppl=4.97, grad_norm=1.45, lr=1.28e-06, throughput=2332 tok/s
2025-11-27 11:58:55,103 - INFO - Epoch 1 Step 4240 (Global: 8240): loss=1.5554, ppl=4.74, grad_norm=1.13, lr=1.27e-06, throughput=2336 tok/s
2025-11-27 12:02:19,766 - INFO - Epoch 1 Step 4250 (Global: 8250): loss=1.3641, ppl=3.91, grad_norm=1.24, lr=1.26e-06, throughput=2345 tok/s
2025-11-27 12:05:44,995 - INFO - Epoch 1 Step 4260 (Global: 8260): loss=1.6582, ppl=5.25, grad_norm=1.56, lr=1.25e-06, throughput=2339 tok/s
2025-11-27 12:09:09,473 - INFO - Epoch 1 Step 4270 (Global: 8270): loss=1.6418, ppl=5.16, grad_norm=1.34, lr=1.24e-06, throughput=2347 tok/s
2025-11-27 12:12:31,949 - INFO - Epoch 1 Step 4280 (Global: 8280): loss=1.7062, ppl=5.51, grad_norm=1.27, lr=1.23e-06, throughput=2371 tok/s
2025-11-27 12:15:54,938 - INFO - Epoch 1 Step 4290 (Global: 8290): loss=1.7156, ppl=5.56, grad_norm=1.49, lr=1.22e-06, throughput=2365 tok/s
2025-11-27 12:19:17,487 - INFO - Epoch 1 Step 4300 (Global: 8300): loss=1.5761, ppl=4.84, grad_norm=1.45, lr=1.21e-06, throughput=2370 tok/s
2025-11-27 12:22:40,527 - INFO - Epoch 1 Step 4310 (Global: 8310): loss=1.7613, ppl=5.82, grad_norm=1.15, lr=1.20e-06, throughput=2364 tok/s
2025-11-27 12:26:04,423 - INFO - Epoch 1 Step 4320 (Global: 8320): loss=1.5934, ppl=4.92, grad_norm=1.23, lr=1.18e-06, throughput=2354 tok/s
2025-11-27 12:29:32,426 - INFO - Epoch 1 Step 4330 (Global: 8330): loss=1.5424, ppl=4.68, grad_norm=1.31, lr=1.17e-06, throughput=2308 tok/s
2025-11-27 12:32:59,198 - INFO - Epoch 1 Step 4340 (Global: 8340): loss=1.6180, ppl=5.04, grad_norm=1.55, lr=1.16e-06, throughput=2321 tok/s
2025-11-27 12:36:24,747 - INFO - Epoch 1 Step 4350 (Global: 8350): loss=1.9459, ppl=7.00, grad_norm=1.33, lr=1.15e-06, throughput=2335 tok/s
2025-11-27 12:39:51,139 - INFO - Epoch 1 Step 4360 (Global: 8360): loss=1.5864, ppl=4.89, grad_norm=1.27, lr=1.14e-06, throughput=2326 tok/s
2025-11-27 12:43:15,943 - INFO - Epoch 1 Step 4370 (Global: 8370): loss=1.5319, ppl=4.63, grad_norm=1.53, lr=1.13e-06, throughput=2344 tok/s
2025-11-27 12:46:39,322 - INFO - Epoch 1 Step 4380 (Global: 8380): loss=1.6704, ppl=5.31, grad_norm=1.16, lr=1.12e-06, throughput=2360 tok/s
2025-11-27 12:50:02,945 - INFO - Epoch 1 Step 4390 (Global: 8390): loss=1.6581, ppl=5.25, grad_norm=1.27, lr=1.11e-06, throughput=2357 tok/s
2025-11-27 12:53:27,610 - INFO - Epoch 1 Step 4400 (Global: 8400): loss=1.4405, ppl=4.22, grad_norm=1.10, lr=1.10e-06, throughput=2345 tok/s
2025-11-27 12:56:50,335 - INFO - Epoch 1 Step 4410 (Global: 8410): loss=1.3959, ppl=4.04, grad_norm=4.31, lr=1.09e-06, throughput=2368 tok/s
2025-11-27 13:00:21,053 - INFO - Epoch 1 Step 4420 (Global: 8420): loss=1.3936, ppl=4.03, grad_norm=1.73, lr=1.08e-06, throughput=2278 tok/s
2025-11-27 13:03:56,224 - INFO - Epoch 1 Step 4430 (Global: 8430): loss=1.5997, ppl=4.95, grad_norm=2.05, lr=1.07e-06, throughput=2231 tok/s
2025-11-27 13:07:31,974 - INFO - Epoch 1 Step 4440 (Global: 8440): loss=1.6549, ppl=5.23, grad_norm=1.55, lr=1.06e-06, throughput=2225 tok/s
2025-11-27 13:11:05,763 - INFO - Epoch 1 Step 4450 (Global: 8450): loss=1.5972, ppl=4.94, grad_norm=1.37, lr=1.05e-06, throughput=2245 tok/s
2025-11-27 13:14:37,380 - INFO - Epoch 1 Step 4460 (Global: 8460): loss=1.4822, ppl=4.40, grad_norm=2.23, lr=1.04e-06, throughput=2268 tok/s
2025-11-27 13:18:09,818 - INFO - Epoch 1 Step 4470 (Global: 8470): loss=1.3327, ppl=3.79, grad_norm=1.17, lr=1.03e-06, throughput=2259 tok/s
2025-11-27 13:21:40,540 - INFO - Epoch 1 Step 4480 (Global: 8480): loss=1.6039, ppl=4.97, grad_norm=1.55, lr=1.02e-06, throughput=2278 tok/s
2025-11-27 13:25:13,910 - INFO - Epoch 1 Step 4490 (Global: 8490): loss=1.5592, ppl=4.75, grad_norm=1.16, lr=1.01e-06, throughput=2250 tok/s
2025-11-27 13:28:45,359 - INFO - Epoch 1 Step 4500 (Global: 8500): loss=1.5205, ppl=4.57, grad_norm=1.27, lr=9.96e-07, throughput=2270 tok/s
2025-11-27 13:32:18,632 - INFO - Epoch 1 Step 4510 (Global: 8510): loss=1.6607, ppl=5.26, grad_norm=2.19, lr=9.86e-07, throughput=2251 tok/s
2025-11-27 13:35:54,203 - INFO - Epoch 1 Step 4520 (Global: 8520): loss=1.5474, ppl=4.70, grad_norm=1.21, lr=9.76e-07, throughput=2227 tok/s
2025-11-27 13:39:28,969 - INFO - Epoch 1 Step 4530 (Global: 8530): loss=1.7638, ppl=5.83, grad_norm=1.96, lr=9.67e-07, throughput=2235 tok/s
2025-11-27 13:43:05,159 - INFO - Epoch 1 Step 4540 (Global: 8540): loss=1.6483, ppl=5.20, grad_norm=1.55, lr=9.57e-07, throughput=2220 tok/s
2025-11-27 13:46:38,594 - INFO - Epoch 1 Step 4550 (Global: 8550): loss=1.7897, ppl=5.99, grad_norm=1.45, lr=9.47e-07, throughput=2249 tok/s
2025-11-27 13:50:12,398 - INFO - Epoch 1 Step 4560 (Global: 8560): loss=1.6006, ppl=4.96, grad_norm=1.30, lr=9.37e-07, throughput=2245 tok/s
2025-11-27 13:53:46,524 - INFO - Epoch 1 Step 4570 (Global: 8570): loss=1.5500, ppl=4.71, grad_norm=1.49, lr=9.27e-07, throughput=2242 tok/s
2025-11-27 13:57:19,414 - INFO - Epoch 1 Step 4580 (Global: 8580): loss=1.8392, ppl=6.29, grad_norm=1.34, lr=9.18e-07, throughput=2255 tok/s
2025-11-27 14:00:52,756 - INFO - Epoch 1 Step 4590 (Global: 8590): loss=1.3857, ppl=4.00, grad_norm=1.46, lr=9.08e-07, throughput=2250 tok/s
2025-11-27 14:04:26,845 - INFO - Epoch 1 Step 4600 (Global: 8600): loss=1.3973, ppl=4.04, grad_norm=1.28, lr=8.98e-07, throughput=2242 tok/s
2025-11-27 14:08:00,326 - INFO - Epoch 1 Step 4610 (Global: 8610): loss=1.4807, ppl=4.40, grad_norm=2.14, lr=8.89e-07, throughput=2248 tok/s
2025-11-27 14:11:37,103 - INFO - Epoch 1 Step 4620 (Global: 8620): loss=1.4324, ppl=4.19, grad_norm=2.70, lr=8.79e-07, throughput=2214 tok/s
2025-11-27 14:15:10,056 - INFO - Epoch 1 Step 4630 (Global: 8630): loss=1.7810, ppl=5.94, grad_norm=1.35, lr=8.70e-07, throughput=2254 tok/s
2025-11-27 14:18:41,040 - INFO - Epoch 1 Step 4640 (Global: 8640): loss=1.2881, ppl=3.63, grad_norm=1.95, lr=8.60e-07, throughput=2275 tok/s
2025-11-27 14:22:16,435 - INFO - Epoch 1 Step 4650 (Global: 8650): loss=1.6443, ppl=5.18, grad_norm=1.48, lr=8.51e-07, throughput=2228 tok/s
2025-11-27 14:25:48,432 - INFO - Epoch 1 Step 4660 (Global: 8660): loss=1.4110, ppl=4.10, grad_norm=1.92, lr=8.42e-07, throughput=2264 tok/s
2025-11-27 14:29:23,294 - INFO - Epoch 1 Step 4670 (Global: 8670): loss=1.4625, ppl=4.32, grad_norm=1.20, lr=8.32e-07, throughput=2234 tok/s
2025-11-27 14:32:57,336 - INFO - Epoch 1 Step 4680 (Global: 8680): loss=1.6921, ppl=5.43, grad_norm=1.73, lr=8.23e-07, throughput=2243 tok/s
2025-11-27 14:36:30,703 - INFO - Epoch 1 Step 4690 (Global: 8690): loss=1.4586, ppl=4.30, grad_norm=1.23, lr=8.14e-07, throughput=2250 tok/s
2025-11-27 14:40:02,647 - INFO - Epoch 1 Step 4700 (Global: 8700): loss=1.8357, ppl=6.27, grad_norm=1.52, lr=8.05e-07, throughput=2265 tok/s
2025-11-27 14:43:36,865 - INFO - Epoch 1 Step 4710 (Global: 8710): loss=1.4507, ppl=4.27, grad_norm=2.09, lr=7.96e-07, throughput=2241 tok/s
2025-11-27 14:47:07,839 - INFO - Epoch 1 Step 4720 (Global: 8720): loss=1.6126, ppl=5.02, grad_norm=2.44, lr=7.87e-07, throughput=2275 tok/s
2025-11-27 14:50:42,135 - INFO - Epoch 1 Step 4730 (Global: 8730): loss=1.5844, ppl=4.88, grad_norm=2.08, lr=7.78e-07, throughput=2240 tok/s
2025-11-27 14:54:16,557 - INFO - Epoch 1 Step 4740 (Global: 8740): loss=1.3759, ppl=3.96, grad_norm=1.50, lr=7.69e-07, throughput=2239 tok/s
2025-11-27 14:57:48,939 - INFO - Epoch 1 Step 4750 (Global: 8750): loss=1.6488, ppl=5.20, grad_norm=1.63, lr=7.60e-07, throughput=2260 tok/s
2025-11-27 15:01:24,002 - INFO - Epoch 1 Step 4760 (Global: 8760): loss=1.7100, ppl=5.53, grad_norm=1.40, lr=7.51e-07, throughput=2232 tok/s
2025-11-27 15:04:55,790 - INFO - Epoch 1 Step 4770 (Global: 8770): loss=1.5890, ppl=4.90, grad_norm=1.30, lr=7.42e-07, throughput=2266 tok/s
2025-11-27 15:08:30,219 - INFO - Epoch 1 Step 4780 (Global: 8780): loss=1.6121, ppl=5.01, grad_norm=1.54, lr=7.33e-07, throughput=2239 tok/s
2025-11-27 15:12:02,797 - INFO - Epoch 1 Step 4790 (Global: 8790): loss=1.6103, ppl=5.00, grad_norm=1.26, lr=7.25e-07, throughput=2258 tok/s
2025-11-27 15:15:38,173 - INFO - Epoch 1 Step 4800 (Global: 8800): loss=1.3870, ppl=4.00, grad_norm=1.17, lr=7.16e-07, throughput=2229 tok/s
2025-11-27 15:19:10,323 - INFO - Epoch 1 Step 4810 (Global: 8810): loss=1.8532, ppl=6.38, grad_norm=1.17, lr=7.07e-07, throughput=2263 tok/s
2025-11-27 15:22:46,125 - INFO - Epoch 1 Step 4820 (Global: 8820): loss=1.5743, ppl=4.83, grad_norm=1.46, lr=6.99e-07, throughput=2224 tok/s
2025-11-27 15:26:21,093 - INFO - Epoch 1 Step 4830 (Global: 8830): loss=1.5621, ppl=4.77, grad_norm=1.52, lr=6.90e-07, throughput=2233 tok/s
2025-11-27 15:29:53,346 - INFO - Epoch 1 Step 4840 (Global: 8840): loss=1.5694, ppl=4.80, grad_norm=2.22, lr=6.82e-07, throughput=2261 tok/s
2025-11-27 15:33:29,510 - INFO - Epoch 1 Step 4850 (Global: 8850): loss=1.7044, ppl=5.50, grad_norm=1.42, lr=6.74e-07, throughput=2221 tok/s
2025-11-27 15:37:03,147 - INFO - Epoch 1 Step 4860 (Global: 8860): loss=1.5101, ppl=4.53, grad_norm=1.35, lr=6.65e-07, throughput=2247 tok/s
2025-11-27 15:40:36,832 - INFO - Epoch 1 Step 4870 (Global: 8870): loss=1.8138, ppl=6.13, grad_norm=1.19, lr=6.57e-07, throughput=2246 tok/s
2025-11-27 15:44:09,397 - INFO - Epoch 1 Step 4880 (Global: 8880): loss=1.5810, ppl=4.86, grad_norm=1.39, lr=6.49e-07, throughput=2258 tok/s
2025-11-27 15:47:45,166 - INFO - Epoch 1 Step 4890 (Global: 8890): loss=1.6636, ppl=5.28, grad_norm=1.36, lr=6.40e-07, throughput=2225 tok/s
2025-11-27 15:51:15,830 - INFO - Epoch 1 Step 4900 (Global: 8900): loss=1.6452, ppl=5.18, grad_norm=1.58, lr=6.32e-07, throughput=2279 tok/s
2025-11-27 15:54:49,104 - INFO - Epoch 1 Step 4910 (Global: 8910): loss=1.7205, ppl=5.59, grad_norm=1.78, lr=6.24e-07, throughput=2251 tok/s
2025-11-27 15:58:21,882 - INFO - Epoch 1 Step 4920 (Global: 8920): loss=1.6994, ppl=5.47, grad_norm=1.44, lr=6.16e-07, throughput=2256 tok/s
2025-11-27 16:01:56,198 - INFO - Epoch 1 Step 4930 (Global: 8930): loss=1.5468, ppl=4.70, grad_norm=2.00, lr=6.08e-07, throughput=2240 tok/s
2025-11-27 16:05:30,658 - INFO - Epoch 1 Step 4940 (Global: 8940): loss=1.5956, ppl=4.93, grad_norm=1.38, lr=6.00e-07, throughput=2238 tok/s
2025-11-27 16:09:01,476 - INFO - Epoch 1 Step 4950 (Global: 8950): loss=1.3347, ppl=3.80, grad_norm=1.47, lr=5.92e-07, throughput=2277 tok/s
2025-11-27 16:12:35,661 - INFO - Epoch 1 Step 4960 (Global: 8960): loss=1.5259, ppl=4.60, grad_norm=1.63, lr=5.84e-07, throughput=2241 tok/s
2025-11-27 16:16:07,366 - INFO - Epoch 1 Step 4970 (Global: 8970): loss=1.6130, ppl=5.02, grad_norm=1.55, lr=5.76e-07, throughput=2267 tok/s
2025-11-27 16:19:42,070 - INFO - Epoch 1 Step 4980 (Global: 8980): loss=1.6231, ppl=5.07, grad_norm=1.24, lr=5.68e-07, throughput=2236 tok/s
2025-11-27 16:23:13,213 - INFO - Epoch 1 Step 4990 (Global: 8990): loss=1.3964, ppl=4.04, grad_norm=1.48, lr=5.61e-07, throughput=2273 tok/s
2025-11-27 16:26:46,175 - INFO - Epoch 1 Step 5000 (Global: 9000): loss=1.3289, ppl=3.78, grad_norm=1.29, lr=5.53e-07, throughput=2254 tok/s
2025-11-27 16:30:21,009 - INFO - Epoch 1 Step 5010 (Global: 9010): loss=1.7395, ppl=5.69, grad_norm=1.27, lr=5.45e-07, throughput=2234 tok/s
2025-11-27 16:33:51,669 - INFO - Epoch 1 Step 5020 (Global: 9020): loss=1.5989, ppl=4.95, grad_norm=1.40, lr=5.38e-07, throughput=2279 tok/s
2025-11-27 16:37:24,749 - INFO - Epoch 1 Step 5030 (Global: 9030): loss=1.5521, ppl=4.72, grad_norm=1.20, lr=5.30e-07, throughput=2253 tok/s
2025-11-27 16:40:59,702 - INFO - Epoch 1 Step 5040 (Global: 9040): loss=1.6465, ppl=5.19, grad_norm=2.00, lr=5.23e-07, throughput=2233 tok/s
2025-11-27 16:44:32,529 - INFO - Epoch 1 Step 5050 (Global: 9050): loss=1.8447, ppl=6.33, grad_norm=1.61, lr=5.15e-07, throughput=2255 tok/s
2025-11-27 16:48:04,774 - INFO - Epoch 1 Step 5060 (Global: 9060): loss=1.6904, ppl=5.42, grad_norm=1.17, lr=5.08e-07, throughput=2262 tok/s
2025-11-27 16:51:39,208 - INFO - Epoch 1 Step 5070 (Global: 9070): loss=1.5692, ppl=4.80, grad_norm=1.60, lr=5.01e-07, throughput=2238 tok/s
2025-11-27 16:55:12,500 - INFO - Epoch 1 Step 5080 (Global: 9080): loss=1.4458, ppl=4.25, grad_norm=1.13, lr=4.93e-07, throughput=2250 tok/s
2025-11-27 16:58:44,072 - INFO - Epoch 1 Step 5090 (Global: 9090): loss=1.4829, ppl=4.41, grad_norm=1.19, lr=4.86e-07, throughput=2269 tok/s
2025-11-27 17:02:17,357 - INFO - Epoch 1 Step 5100 (Global: 9100): loss=1.6237, ppl=5.07, grad_norm=1.68, lr=4.79e-07, throughput=2251 tok/s
2025-11-27 17:05:51,243 - INFO - Epoch 1 Step 5110 (Global: 9110): loss=1.6518, ppl=5.22, grad_norm=1.37, lr=4.72e-07, throughput=2244 tok/s
2025-11-27 17:09:24,468 - INFO - Epoch 1 Step 5120 (Global: 9120): loss=1.4125, ppl=4.11, grad_norm=1.54, lr=4.65e-07, throughput=2251 tok/s
2025-11-27 17:12:55,464 - INFO - Epoch 1 Step 5130 (Global: 9130): loss=1.4459, ppl=4.25, grad_norm=1.84, lr=4.58e-07, throughput=2275 tok/s
2025-11-27 17:16:27,875 - INFO - Epoch 1 Step 5140 (Global: 9140): loss=1.5491, ppl=4.71, grad_norm=1.63, lr=4.51e-07, throughput=2260 tok/s
2025-11-27 17:20:00,915 - INFO - Epoch 1 Step 5150 (Global: 9150): loss=1.4207, ppl=4.14, grad_norm=1.54, lr=4.44e-07, throughput=2253 tok/s
2025-11-27 17:23:32,991 - INFO - Epoch 1 Step 5160 (Global: 9160): loss=1.8070, ppl=6.09, grad_norm=1.30, lr=4.37e-07, throughput=2263 tok/s
2025-11-27 17:27:01,914 - INFO - Epoch 1 Step 5170 (Global: 9170): loss=1.5653, ppl=4.78, grad_norm=1.12, lr=4.30e-07, throughput=2298 tok/s
2025-11-27 17:30:33,418 - INFO - Epoch 1 Step 5180 (Global: 9180): loss=1.4778, ppl=4.38, grad_norm=1.46, lr=4.23e-07, throughput=2269 tok/s
2025-11-27 17:34:04,641 - INFO - Epoch 1 Step 5190 (Global: 9190): loss=1.8118, ppl=6.12, grad_norm=1.39, lr=4.17e-07, throughput=2272 tok/s
2025-11-27 17:37:37,933 - INFO - Epoch 1 Step 5200 (Global: 9200): loss=1.7735, ppl=5.89, grad_norm=1.95, lr=4.10e-07, throughput=2250 tok/s
2025-11-27 17:41:09,322 - INFO - Epoch 1 Step 5210 (Global: 9210): loss=1.8240, ppl=6.20, grad_norm=1.12, lr=4.03e-07, throughput=2271 tok/s
2025-11-27 17:44:39,621 - INFO - Epoch 1 Step 5220 (Global: 9220): loss=1.7174, ppl=5.57, grad_norm=1.45, lr=3.97e-07, throughput=2282 tok/s
2025-11-27 17:48:12,323 - INFO - Epoch 1 Step 5230 (Global: 9230): loss=1.3940, ppl=4.03, grad_norm=1.73, lr=3.90e-07, throughput=2257 tok/s
2025-11-27 17:51:45,668 - INFO - Epoch 1 Step 5240 (Global: 9240): loss=1.5710, ppl=4.81, grad_norm=2.22, lr=3.84e-07, throughput=2250 tok/s
2025-11-27 17:55:16,907 - INFO - Epoch 1 Step 5250 (Global: 9250): loss=1.5609, ppl=4.76, grad_norm=1.78, lr=3.77e-07, throughput=2272 tok/s
2025-11-27 17:58:49,802 - INFO - Epoch 1 Step 5260 (Global: 9260): loss=1.5694, ppl=4.80, grad_norm=1.62, lr=3.71e-07, throughput=2255 tok/s
2025-11-27 18:02:24,012 - INFO - Epoch 1 Step 5270 (Global: 9270): loss=1.5508, ppl=4.72, grad_norm=1.15, lr=3.65e-07, throughput=2241 tok/s
2025-11-27 18:05:57,942 - INFO - Epoch 1 Step 5280 (Global: 9280): loss=1.5768, ppl=4.84, grad_norm=1.30, lr=3.58e-07, throughput=2244 tok/s
2025-11-27 18:09:31,469 - INFO - Epoch 1 Step 5290 (Global: 9290): loss=1.4960, ppl=4.46, grad_norm=1.20, lr=3.52e-07, throughput=2248 tok/s
2025-11-27 18:13:03,849 - INFO - Epoch 1 Step 5300 (Global: 9300): loss=1.4750, ppl=4.37, grad_norm=1.53, lr=3.46e-07, throughput=2260 tok/s
2025-11-27 18:16:34,927 - INFO - Epoch 1 Step 5310 (Global: 9310): loss=1.3901, ppl=4.02, grad_norm=1.44, lr=3.40e-07, throughput=2274 tok/s
2025-11-27 18:20:08,431 - INFO - Epoch 1 Step 5320 (Global: 9320): loss=1.5485, ppl=4.70, grad_norm=1.69, lr=3.34e-07, throughput=2248 tok/s
2025-11-27 18:23:41,502 - INFO - Epoch 1 Step 5330 (Global: 9330): loss=1.6942, ppl=5.44, grad_norm=1.37, lr=3.28e-07, throughput=2253 tok/s
2025-11-27 18:27:15,281 - INFO - Epoch 1 Step 5340 (Global: 9340): loss=1.5931, ppl=4.92, grad_norm=1.87, lr=3.22e-07, throughput=2245 tok/s
2025-11-27 18:30:47,817 - INFO - Epoch 1 Step 5350 (Global: 9350): loss=1.5405, ppl=4.67, grad_norm=1.24, lr=3.16e-07, throughput=2258 tok/s
2025-11-27 18:34:18,938 - INFO - Epoch 1 Step 5360 (Global: 9360): loss=1.3512, ppl=3.86, grad_norm=1.48, lr=3.10e-07, throughput=2274 tok/s
2025-11-27 18:37:50,990 - INFO - Epoch 1 Step 5370 (Global: 9370): loss=1.5083, ppl=4.52, grad_norm=1.63, lr=3.05e-07, throughput=2264 tok/s
2025-11-27 18:41:24,277 - INFO - Epoch 1 Step 5380 (Global: 9380): loss=1.7409, ppl=5.70, grad_norm=2.09, lr=2.99e-07, throughput=2251 tok/s
2025-11-27 18:44:57,980 - INFO - Epoch 1 Step 5390 (Global: 9390): loss=1.3469, ppl=3.85, grad_norm=1.70, lr=2.93e-07, throughput=2246 tok/s
2025-11-27 18:48:30,379 - INFO - Epoch 1 Step 5400 (Global: 9400): loss=1.5145, ppl=4.55, grad_norm=1.27, lr=2.88e-07, throughput=2260 tok/s
2025-11-27 18:52:03,238 - INFO - Epoch 1 Step 5410 (Global: 9410): loss=1.6546, ppl=5.23, grad_norm=1.77, lr=2.82e-07, throughput=2255 tok/s
2025-11-27 18:55:34,767 - INFO - Epoch 1 Step 5420 (Global: 9420): loss=1.5002, ppl=4.48, grad_norm=1.41, lr=2.76e-07, throughput=2269 tok/s
2025-11-27 18:59:05,365 - INFO - Epoch 1 Step 5430 (Global: 9430): loss=1.7195, ppl=5.58, grad_norm=2.06, lr=2.71e-07, throughput=2279 tok/s
2025-11-27 19:02:32,445 - INFO - Epoch 1 Step 5440 (Global: 9440): loss=1.5165, ppl=4.56, grad_norm=1.55, lr=2.66e-07, throughput=2318 tok/s
2025-11-27 19:05:57,046 - INFO - Epoch 1 Step 5450 (Global: 9450): loss=1.5921, ppl=4.91, grad_norm=1.29, lr=2.60e-07, throughput=2346 tok/s
2025-11-27 19:09:21,278 - INFO - Epoch 1 Step 5460 (Global: 9460): loss=1.8472, ppl=6.34, grad_norm=1.35, lr=2.55e-07, throughput=2350 tok/s
2025-11-27 19:12:46,679 - INFO - Epoch 1 Step 5470 (Global: 9470): loss=1.6808, ppl=5.37, grad_norm=1.72, lr=2.50e-07, throughput=2337 tok/s
2025-11-27 19:16:10,957 - INFO - Epoch 1 Step 5480 (Global: 9480): loss=1.6737, ppl=5.33, grad_norm=1.27, lr=2.44e-07, throughput=2350 tok/s
2025-11-27 19:19:35,185 - INFO - Epoch 1 Step 5490 (Global: 9490): loss=1.8276, ppl=6.22, grad_norm=1.32, lr=2.39e-07, throughput=2350 tok/s
2025-11-27 19:22:59,284 - INFO - Epoch 1 Step 5500 (Global: 9500): loss=1.5507, ppl=4.71, grad_norm=1.49, lr=2.34e-07, throughput=2352 tok/s
2025-11-27 19:26:23,626 - INFO - Epoch 1 Step 5510 (Global: 9510): loss=1.6971, ppl=5.46, grad_norm=1.20, lr=2.29e-07, throughput=2349 tok/s
2025-11-27 19:29:51,020 - INFO - Epoch 1 Step 5520 (Global: 9520): loss=1.8513, ppl=6.37, grad_norm=1.38, lr=2.24e-07, throughput=2314 tok/s
2025-11-27 19:33:26,812 - INFO - Epoch 1 Step 5530 (Global: 9530): loss=1.6369, ppl=5.14, grad_norm=1.30, lr=2.19e-07, throughput=2224 tok/s
2025-11-27 19:37:01,127 - INFO - Epoch 1 Step 5540 (Global: 9540): loss=1.6972, ppl=5.46, grad_norm=1.37, lr=2.14e-07, throughput=2240 tok/s
2025-11-27 19:40:33,251 - INFO - Epoch 1 Step 5550 (Global: 9550): loss=1.6032, ppl=4.97, grad_norm=1.09, lr=2.10e-07, throughput=2263 tok/s
2025-11-27 19:44:05,945 - INFO - Epoch 1 Step 5560 (Global: 9560): loss=1.6616, ppl=5.27, grad_norm=1.48, lr=2.05e-07, throughput=2257 tok/s
2025-11-27 19:47:37,637 - INFO - Epoch 1 Step 5570 (Global: 9570): loss=1.9210, ppl=6.83, grad_norm=1.33, lr=2.00e-07, throughput=2267 tok/s
2025-11-27 19:51:10,833 - INFO - Epoch 1 Step 5580 (Global: 9580): loss=1.5385, ppl=4.66, grad_norm=1.73, lr=1.95e-07, throughput=2251 tok/s
2025-11-27 19:54:44,814 - INFO - Epoch 1 Step 5590 (Global: 9590): loss=1.6455, ppl=5.18, grad_norm=2.11, lr=1.91e-07, throughput=2243 tok/s
2025-11-27 19:58:17,420 - INFO - Epoch 1 Step 5600 (Global: 9600): loss=1.5719, ppl=4.82, grad_norm=1.27, lr=1.86e-07, throughput=2258 tok/s
2025-11-27 20:01:49,899 - INFO - Epoch 1 Step 5610 (Global: 9610): loss=1.6549, ppl=5.23, grad_norm=2.44, lr=1.82e-07, throughput=2259 tok/s
2025-11-27 20:05:23,285 - INFO - Epoch 1 Step 5620 (Global: 9620): loss=1.8481, ppl=6.35, grad_norm=1.28, lr=1.77e-07, throughput=2249 tok/s
2025-11-27 20:09:00,581 - INFO - Epoch 1 Step 5630 (Global: 9630): loss=1.8733, ppl=6.51, grad_norm=1.73, lr=1.73e-07, throughput=2209 tok/s
2025-11-27 20:12:29,542 - INFO - Epoch 1 Step 5640 (Global: 9640): loss=1.7226, ppl=5.60, grad_norm=1.57, lr=1.68e-07, throughput=2297 tok/s
2025-11-27 20:15:59,749 - INFO - Epoch 1 Step 5650 (Global: 9650): loss=1.5968, ppl=4.94, grad_norm=1.52, lr=1.64e-07, throughput=2283 tok/s
2025-11-27 20:19:27,760 - INFO - Epoch 1 Step 5660 (Global: 9660): loss=1.7941, ppl=6.01, grad_norm=1.27, lr=1.60e-07, throughput=2308 tok/s
2025-11-27 20:22:56,487 - INFO - Epoch 1 Step 5670 (Global: 9670): loss=1.7409, ppl=5.70, grad_norm=1.79, lr=1.56e-07, throughput=2300 tok/s
2025-11-27 20:26:23,063 - INFO - Epoch 1 Step 5680 (Global: 9680): loss=1.4597, ppl=4.30, grad_norm=1.11, lr=1.52e-07, throughput=2324 tok/s
2025-11-27 20:29:58,805 - INFO - Epoch 1 Step 5690 (Global: 9690): loss=1.5771, ppl=4.84, grad_norm=1.23, lr=1.48e-07, throughput=2225 tok/s
2025-11-27 20:33:25,268 - INFO - Epoch 1 Step 5700 (Global: 9700): loss=1.7536, ppl=5.78, grad_norm=1.37, lr=1.44e-07, throughput=2325 tok/s
2025-11-27 20:36:51,193 - INFO - Epoch 1 Step 5710 (Global: 9710): loss=1.7731, ppl=5.89, grad_norm=1.96, lr=1.40e-07, throughput=2331 tok/s
2025-11-27 20:40:18,492 - INFO - Epoch 1 Step 5720 (Global: 9720): loss=1.5060, ppl=4.51, grad_norm=1.91, lr=1.36e-07, throughput=2316 tok/s
2025-11-27 20:43:45,709 - INFO - Epoch 1 Step 5730 (Global: 9730): loss=1.7986, ppl=6.04, grad_norm=1.66, lr=1.32e-07, throughput=2316 tok/s
2025-11-27 20:47:12,315 - INFO - Epoch 1 Step 5740 (Global: 9740): loss=1.5726, ppl=4.82, grad_norm=1.30, lr=1.28e-07, throughput=2323 tok/s
2025-11-27 20:50:39,756 - INFO - Epoch 1 Step 5750 (Global: 9750): loss=1.5318, ppl=4.63, grad_norm=2.17, lr=1.24e-07, throughput=2314 tok/s
2025-11-27 20:54:05,181 - INFO - Epoch 1 Step 5760 (Global: 9760): loss=1.6843, ppl=5.39, grad_norm=1.51, lr=1.21e-07, throughput=2337 tok/s
2025-11-27 20:57:31,667 - INFO - Epoch 1 Step 5770 (Global: 9770): loss=1.5998, ppl=4.95, grad_norm=1.59, lr=1.17e-07, throughput=2325 tok/s
2025-11-27 21:00:59,084 - INFO - Epoch 1 Step 5780 (Global: 9780): loss=1.7273, ppl=5.63, grad_norm=1.91, lr=1.13e-07, throughput=2314 tok/s
2025-11-27 21:04:25,574 - INFO - Epoch 1 Step 5790 (Global: 9790): loss=1.7644, ppl=5.84, grad_norm=1.48, lr=1.10e-07, throughput=2325 tok/s
2025-11-27 21:07:51,160 - INFO - Epoch 1 Step 5800 (Global: 9800): loss=1.7346, ppl=5.67, grad_norm=1.80, lr=1.06e-07, throughput=2335 tok/s
2025-11-27 21:11:17,308 - INFO - Epoch 1 Step 5810 (Global: 9810): loss=1.5369, ppl=4.65, grad_norm=1.71, lr=1.03e-07, throughput=2328 tok/s
2025-11-27 21:14:42,597 - INFO - Epoch 1 Step 5820 (Global: 9820): loss=1.7478, ppl=5.74, grad_norm=1.62, lr=9.97e-08, throughput=2338 tok/s
2025-11-27 21:18:08,923 - INFO - Epoch 1 Step 5830 (Global: 9830): loss=1.5048, ppl=4.50, grad_norm=1.41, lr=9.64e-08, throughput=2326 tok/s
2025-11-27 21:21:34,472 - INFO - Epoch 1 Step 5840 (Global: 9840): loss=1.6680, ppl=5.30, grad_norm=1.82, lr=9.32e-08, throughput=2335 tok/s
2025-11-27 21:24:56,657 - INFO - Epoch 1 Step 5850 (Global: 9850): loss=1.6059, ppl=4.98, grad_norm=1.17, lr=9.00e-08, throughput=2374 tok/s
2025-11-27 21:28:20,460 - INFO - Epoch 1 Step 5860 (Global: 9860): loss=1.6501, ppl=5.21, grad_norm=1.49, lr=8.68e-08, throughput=2355 tok/s
2025-11-27 21:31:45,765 - INFO - Epoch 1 Step 5870 (Global: 9870): loss=1.3523, ppl=3.87, grad_norm=1.23, lr=8.37e-08, throughput=2338 tok/s
2025-11-27 21:35:18,286 - INFO - Epoch 1 Step 5880 (Global: 9880): loss=1.5160, ppl=4.55, grad_norm=1.43, lr=8.07e-08, throughput=2259 tok/s
2025-11-27 21:38:51,300 - INFO - Epoch 1 Step 5890 (Global: 9890): loss=1.5679, ppl=4.80, grad_norm=1.58, lr=7.77e-08, throughput=2253 tok/s
2025-11-27 21:42:28,975 - INFO - Epoch 1 Step 5900 (Global: 9900): loss=1.3739, ppl=3.95, grad_norm=1.41, lr=7.48e-08, throughput=2205 tok/s
2025-11-27 21:46:03,161 - INFO - Epoch 1 Step 5910 (Global: 9910): loss=1.5115, ppl=4.53, grad_norm=1.31, lr=7.20e-08, throughput=2241 tok/s
2025-11-27 21:49:39,458 - INFO - Epoch 1 Step 5920 (Global: 9920): loss=1.4874, ppl=4.43, grad_norm=1.63, lr=6.92e-08, throughput=2219 tok/s
2025-11-27 21:53:14,104 - INFO - Epoch 1 Step 5930 (Global: 9930): loss=1.7029, ppl=5.49, grad_norm=2.80, lr=6.64e-08, throughput=2236 tok/s
2025-11-27 21:56:46,292 - INFO - Epoch 1 Step 5940 (Global: 9940): loss=1.4983, ppl=4.47, grad_norm=1.58, lr=6.37e-08, throughput=2262 tok/s
2025-11-27 22:00:21,581 - INFO - Epoch 1 Step 5950 (Global: 9950): loss=1.6700, ppl=5.31, grad_norm=1.46, lr=6.11e-08, throughput=2230 tok/s
2025-11-27 22:03:56,980 - INFO - Epoch 1 Step 5960 (Global: 9960): loss=1.7459, ppl=5.73, grad_norm=1.41, lr=5.85e-08, throughput=2228 tok/s
2025-11-27 22:07:31,743 - INFO - Epoch 1 Step 5970 (Global: 9970): loss=1.7514, ppl=5.76, grad_norm=1.21, lr=5.60e-08, throughput=2235 tok/s
2025-11-27 22:11:09,650 - INFO - Epoch 1 Step 5980 (Global: 9980): loss=1.6806, ppl=5.37, grad_norm=1.30, lr=5.35e-08, throughput=2203 tok/s
2025-11-27 22:14:43,468 - INFO - Epoch 1 Step 5990 (Global: 9990): loss=1.5732, ppl=4.82, grad_norm=1.39, lr=5.11e-08, throughput=2245 tok/s
2025-11-27 22:18:19,058 - INFO - Epoch 1 Step 6000 (Global: 10000): loss=1.7657, ppl=5.85, grad_norm=1.66, lr=4.87e-08, throughput=2226 tok/s
2025-11-27 22:18:19,059 - INFO - 
Running validation at step 10000...
2025-11-27 22:30:37,913 - INFO - Validation loss: 1.6258, perplexity: 5.08
2025-11-27 22:30:37,914 - INFO - 
======================================================================
2025-11-27 22:30:37,914 - INFO - Qualitative Evaluation Samples:
2025-11-27 22:30:37,914 - INFO - ======================================================================
2025-11-27 22:30:37,915 - INFO - 
Sample 1 (ID: sample_141920_chunk_1):
2025-11-27 22:30:37,915 - INFO - Context:      [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-27 22:30:37,916 - INFO - Generated:    ' to the band\'s previous work, stating that "it\'s a little more experimental, a little more experimental than the last two, but it\'s still the same Codex and Keys, and it\'s still the same Codex and Key...'
2025-11-27 22:30:37,916 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-27 22:30:37,916 - INFO - ----------------------------------------------------------------------
2025-11-27 22:30:37,917 - INFO - 
Sample 2 (ID: sample_170543_chunk_2):
2025-11-27 22:30:37,917 - INFO - Context:      [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-27 22:30:37,917 - INFO - Generated:    'aternalistic fraternal organizations, which were often seen as "white" and "Anglo" in nature. The Order of the Arrow was founded in 1920 by the Grand Lodge of the United States of America, and the fir...'
2025-11-27 22:30:37,918 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-27 22:30:37,918 - INFO - ----------------------------------------------------------------------
2025-11-27 22:30:37,919 - INFO - 
Sample 3 (ID: sample_107152_chunk_9):
2025-11-27 22:30:37,919 - INFO - Context:      [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-27 22:30:37,920 - INFO - Generated:    " be defeated by Oga. Teimou's shadow group is then defeated by Oga and Miki, and the four fighters are taken to the Shingetsu Temple to be killed by the Shingetsu's own shadow group. Teimou's shadow g..."
2025-11-27 22:30:37,920 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-27 22:30:37,920 - INFO - ----------------------------------------------------------------------
2025-11-27 22:30:37,921 - INFO - 
Sample 4 (ID: sample_069148_chunk_0):
2025-11-27 22:30:37,921 - INFO - Context:      [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-27 22:30:37,921 - INFO - Generated:    '-01-01 | 1             | 1             | 1             | 1             | 1             |\n| 1.0.0   | U+0B01..0B03, 0B05..0B0C, 0B0F..0B10, 0B13..0B28, 0B2A..0B30, 0B32..0B33, 0B36..0B39, 0B3C..0B43, 0...'
2025-11-27 22:30:37,922 - INFO - Ground Truth: '-056  |             |                  | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam                                                 ...'
2025-11-27 22:30:37,922 - INFO - ----------------------------------------------------------------------
2025-11-27 22:30:37,923 - INFO - 
Sample 5 (ID: sample_103176_chunk_4):
2025-11-27 22:30:37,923 - INFO - Context:      [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-27 22:30:37,923 - INFO - Generated:    '1 | BlackBerry PlayBook | EA Tiburon                                 | [ 150 ] |\n| Madden NFL 12                                                      | August 30, 2011 | PlayStation 3       | EA Tibur...'
2025-11-27 22:30:37,924 - INFO - Ground Truth: '1                     | PlayStation 2             | EA Tiburon                                                        | [ 150 ]                 |\n| Madden NFL 12                                       ...'
2025-11-27 22:30:37,924 - INFO - ----------------------------------------------------------------------
2025-11-27 22:30:37,925 - INFO - 
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_10000.jsonl
2025-11-27 22:34:00,082 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-27 22:34:00,099 - INFO - New best validation loss: 1.6258, perplexity: 5.08
2025-11-27 22:37:34,452 - INFO - Epoch 1 Step 6010 (Global: 10010): loss=1.7488, ppl=5.75, grad_norm=1.53, lr=4.64e-08, throughput=2240 tok/s
2025-11-27 22:41:12,002 - INFO - Epoch 1 Step 6020 (Global: 10020): loss=1.6746, ppl=5.34, grad_norm=1.66, lr=4.42e-08, throughput=2206 tok/s
2025-11-27 22:44:46,127 - INFO - Epoch 1 Step 6030 (Global: 10030): loss=1.6560, ppl=5.24, grad_norm=2.31, lr=4.20e-08, throughput=2242 tok/s
2025-11-27 22:48:22,442 - INFO - Epoch 1 Step 6040 (Global: 10040): loss=1.8665, ppl=6.47, grad_norm=1.38, lr=3.98e-08, throughput=2219 tok/s
2025-11-27 22:51:57,206 - INFO - Epoch 1 Step 6050 (Global: 10050): loss=1.7092, ppl=5.52, grad_norm=1.77, lr=3.78e-08, throughput=2235 tok/s
2025-11-27 22:55:31,548 - INFO - Epoch 1 Step 6060 (Global: 10060): loss=1.4928, ppl=4.45, grad_norm=1.16, lr=3.57e-08, throughput=2239 tok/s
2025-11-27 22:59:08,999 - INFO - Epoch 1 Step 6070 (Global: 10070): loss=1.6630, ppl=5.28, grad_norm=1.54, lr=3.38e-08, throughput=2207 tok/s
2025-11-27 23:02:43,245 - INFO - Epoch 1 Step 6080 (Global: 10080): loss=1.5916, ppl=4.91, grad_norm=1.35, lr=3.18e-08, throughput=2240 tok/s
2025-11-27 23:06:19,421 - INFO - Epoch 1 Step 6090 (Global: 10090): loss=1.6389, ppl=5.15, grad_norm=1.30, lr=3.00e-08, throughput=2220 tok/s
2025-11-27 23:09:54,624 - INFO - Epoch 1 Step 6100 (Global: 10100): loss=1.6017, ppl=4.96, grad_norm=1.44, lr=2.82e-08, throughput=2230 tok/s
2025-11-27 23:13:30,259 - INFO - Epoch 1 Step 6110 (Global: 10110): loss=1.6520, ppl=5.22, grad_norm=1.27, lr=2.64e-08, throughput=2226 tok/s
2025-11-27 23:17:04,305 - INFO - Epoch 1 Step 6120 (Global: 10120): loss=1.5504, ppl=4.71, grad_norm=1.36, lr=2.47e-08, throughput=2243 tok/s
2025-11-27 23:20:36,135 - INFO - Epoch 1 Step 6130 (Global: 10130): loss=1.9117, ppl=6.76, grad_norm=2.11, lr=2.31e-08, throughput=2266 tok/s
2025-11-27 23:24:14,085 - INFO - Epoch 1 Step 6140 (Global: 10140): loss=1.5247, ppl=4.59, grad_norm=1.74, lr=2.15e-08, throughput=2202 tok/s
2025-11-27 23:27:53,121 - INFO - Epoch 1 Step 6150 (Global: 10150): loss=1.7456, ppl=5.73, grad_norm=1.70, lr=2.00e-08, throughput=2191 tok/s
2025-11-27 23:31:32,824 - INFO - Epoch 1 Step 6160 (Global: 10160): loss=1.7847, ppl=5.96, grad_norm=1.80, lr=1.85e-08, throughput=2185 tok/s
2025-11-27 23:34:59,952 - INFO - Epoch 1 Step 6170 (Global: 10170): loss=1.7517, ppl=5.76, grad_norm=1.42, lr=1.71e-08, throughput=2317 tok/s
2025-11-27 23:38:27,852 - INFO - Epoch 1 Step 6180 (Global: 10180): loss=1.5839, ppl=4.87, grad_norm=2.78, lr=1.58e-08, throughput=2309 tok/s
2025-11-27 23:41:55,534 - INFO - Epoch 1 Step 6190 (Global: 10190): loss=1.7643, ppl=5.84, grad_norm=1.53, lr=1.45e-08, throughput=2311 tok/s
2025-11-27 23:45:22,593 - INFO - Epoch 1 Step 6200 (Global: 10200): loss=1.7653, ppl=5.84, grad_norm=2.02, lr=1.32e-08, throughput=2318 tok/s
2025-11-27 23:48:49,265 - INFO - Epoch 1 Step 6210 (Global: 10210): loss=1.4867, ppl=4.42, grad_norm=1.44, lr=1.20e-08, throughput=2323 tok/s
2025-11-27 23:52:15,136 - INFO - Epoch 1 Step 6220 (Global: 10220): loss=1.5656, ppl=4.79, grad_norm=1.30, lr=1.09e-08, throughput=2332 tok/s
2025-11-27 23:55:42,499 - INFO - Epoch 1 Step 6230 (Global: 10230): loss=1.6320, ppl=5.11, grad_norm=1.91, lr=9.81e-09, throughput=2315 tok/s
2025-11-27 23:59:10,145 - INFO - Epoch 1 Step 6240 (Global: 10240): loss=1.6798, ppl=5.36, grad_norm=1.40, lr=8.79e-09, throughput=2312 tok/s
2025-11-28 00:02:37,306 - INFO - Epoch 1 Step 6250 (Global: 10250): loss=1.6438, ppl=5.17, grad_norm=1.42, lr=7.83e-09, throughput=2317 tok/s
2025-11-28 00:06:05,018 - INFO - Epoch 1 Step 6260 (Global: 10260): loss=1.8199, ppl=6.17, grad_norm=1.22, lr=6.92e-09, throughput=2311 tok/s
2025-11-28 00:09:31,536 - INFO - Epoch 1 Step 6270 (Global: 10270): loss=1.7332, ppl=5.66, grad_norm=1.90, lr=6.06e-09, throughput=2324 tok/s
2025-11-28 00:12:58,319 - INFO - Epoch 1 Step 6280 (Global: 10280): loss=1.7339, ppl=5.66, grad_norm=1.45, lr=5.27e-09, throughput=2321 tok/s
2025-11-28 00:16:25,951 - INFO - Epoch 1 Step 6290 (Global: 10290): loss=1.7052, ppl=5.50, grad_norm=1.55, lr=4.53e-09, throughput=2312 tok/s
2025-11-28 00:19:54,748 - INFO - Epoch 1 Step 6300 (Global: 10300): loss=1.7043, ppl=5.50, grad_norm=2.03, lr=3.84e-09, throughput=2299 tok/s
2025-11-28 00:23:22,238 - INFO - Epoch 1 Step 6310 (Global: 10310): loss=1.7098, ppl=5.53, grad_norm=2.17, lr=3.21e-09, throughput=2313 tok/s
2025-11-28 00:26:48,410 - INFO - Epoch 1 Step 6320 (Global: 10320): loss=1.6309, ppl=5.11, grad_norm=1.38, lr=2.64e-09, throughput=2328 tok/s
2025-11-28 00:30:15,949 - INFO - Epoch 1 Step 6330 (Global: 10330): loss=1.7720, ppl=5.88, grad_norm=1.57, lr=2.12e-09, throughput=2313 tok/s
2025-11-28 00:33:44,204 - INFO - Epoch 1 Step 6340 (Global: 10340): loss=1.6188, ppl=5.05, grad_norm=1.33, lr=1.66e-09, throughput=2305 tok/s
2025-11-28 00:37:12,443 - INFO - Epoch 1 Step 6350 (Global: 10350): loss=1.6697, ppl=5.31, grad_norm=1.48, lr=1.26e-09, throughput=2305 tok/s
2025-11-28 00:40:39,111 - INFO - Epoch 1 Step 6360 (Global: 10360): loss=1.6755, ppl=5.34, grad_norm=1.23, lr=9.12e-10, throughput=2323 tok/s
2025-11-28 00:44:06,723 - INFO - Epoch 1 Step 6370 (Global: 10370): loss=1.5299, ppl=4.62, grad_norm=1.95, lr=6.20e-10, throughput=2312 tok/s
2025-11-28 00:47:30,860 - INFO - Epoch 1 Step 6380 (Global: 10380): loss=1.5091, ppl=4.52, grad_norm=1.87, lr=3.84e-10, throughput=2351 tok/s
2025-11-28 00:50:56,526 - INFO - Epoch 1 Step 6390 (Global: 10390): loss=1.3304, ppl=3.78, grad_norm=1.04, lr=2.05e-10, throughput=2334 tok/s
2025-11-28 00:54:20,885 - INFO - Epoch 1 Step 6400 (Global: 10400): loss=1.5872, ppl=4.89, grad_norm=4.62, lr=8.11e-11, throughput=2349 tok/s
2025-11-28 00:57:45,721 - INFO - Epoch 1 Step 6410 (Global: 10410): loss=1.6529, ppl=5.22, grad_norm=1.38, lr=1.38e-11, throughput=2343 tok/s
2025-11-28 01:00:00,874 - INFO - Flushing 4 remainder batches from gradient accumulation
2025-11-28 01:00:00,877 - INFO -   Rescaling gradients by 1.50x (compensating for 4/6 batches)
2025-11-28 01:00:01,218 - INFO - Remainder batch: loss=1.6686, ppl=5.30, grad_norm=1.38
2025-11-28 01:00:01,242 - INFO - Epoch 1 training: loss=1.6270, ppl=5.09, grad_norm=1.65, throughput=2247 tok/s (137071.0s total)
2025-11-28 01:00:01,250 - INFO - 
Running final validation...
2025-11-28 01:11:53,249 - INFO - Validation loss: 1.6258, perplexity: 5.08
2025-11-28 01:11:53,250 - INFO - 
======================================================================
2025-11-28 01:11:53,250 - INFO - Qualitative Evaluation Samples:
2025-11-28 01:11:53,250 - INFO - ======================================================================
2025-11-28 01:11:53,250 - INFO - 
Sample 1 (ID: sample_141920_chunk_1):
2025-11-28 01:11:53,250 - INFO - Context:      [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-28 01:11:53,251 - INFO - Generated:    ' to the band\'s previous work, stating that "it\'s a little more experimental, a little more experimental than the last two, but it\'s still the same Codex and Keys, and it\'s still the same Codex and Key...'
2025-11-28 01:11:53,251 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-28 01:11:53,251 - INFO - ----------------------------------------------------------------------
2025-11-28 01:11:53,251 - INFO - 
Sample 2 (ID: sample_170543_chunk_2):
2025-11-28 01:11:53,251 - INFO - Context:      [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-28 01:11:53,252 - INFO - Generated:    "aternalistic fraternities, and the Order of the Arrow was no exception. The Order's Native American-themed chapters were founded in the 1930s, and the Order's first chapter was established in 1934. Th..."
2025-11-28 01:11:53,252 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-28 01:11:53,252 - INFO - ----------------------------------------------------------------------
2025-11-28 01:11:53,253 - INFO - 
Sample 3 (ID: sample_107152_chunk_9):
2025-11-28 01:11:53,253 - INFO - Context:      [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-28 01:11:53,254 - INFO - Generated:    " be defeated by Oga and Mikii. Teimou's shadow group then defeated the Red Tails, and Oga and Mikii were able to get back at Teimou. Teimou's shadow group then defeated the Red Tails, and Oga and Miki..."
2025-11-28 01:11:53,254 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-28 01:11:53,255 - INFO - ----------------------------------------------------------------------
2025-11-28 01:11:53,255 - INFO - 
Sample 4 (ID: sample_069148_chunk_0):
2025-11-28 01:11:53,256 - INFO - Context:      [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-28 01:11:53,256 - INFO - Generated:    '-01-01 | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, ...'
2025-11-28 01:11:53,257 - INFO - Ground Truth: '-056  |             |                  | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam                                                 ...'
2025-11-28 01:11:53,257 - INFO - ----------------------------------------------------------------------
2025-11-28 01:11:53,257 - INFO - 
Sample 5 (ID: sample_103176_chunk_4):
2025-11-28 01:11:53,258 - INFO - Context:      [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-28 01:11:53,258 - INFO - Generated:    '1 | PlayStation 3 | EA Tiburon                                 | [ 150 ] |\n| Madden NFL 12                                                              | August 30, 2011 | PlayStation 3 | EA Tiburon  ...'
2025-11-28 01:11:53,259 - INFO - Ground Truth: '1                     | PlayStation 2             | EA Tiburon                                                        | [ 150 ]                 |\n| Madden NFL 12                                       ...'
2025-11-28 01:11:53,259 - INFO - ----------------------------------------------------------------------
2025-11-28 01:11:53,260 - INFO - 
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_10417.jsonl
2025-11-28 01:14:49,678 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-28 01:14:49,689 - INFO - New best validation loss: 1.6258, perplexity: 5.08
2025-11-28 01:14:49,691 - INFO - 
Training complete!
2025-11-28 01:14:49,691 - INFO - Final checkpoint is best, created symlink to save space (~2GB saved)
2025-11-28 01:14:49,691 - INFO - Best validation loss: 1.6258, perplexity: 5.08
2025-11-28 01:14:49,692 - INFO - Checkpoints saved to outputs/production_vision_base_lm_20251123_003859
2025-11-28 01:14:50,370 - INFO - W&B run finished