ivnle's picture
Rename vision_base_hybrid0_lm_direct -> vision_base_h0_lm
50033fb verified
2025-11-23 00:39:18,735 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_vision_base_lm_20251123_003859', objective='lm', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='base', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='free_ocr', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251123_003859', batch_size=8, gradient_accumulation_steps=6, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=True, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name='production_vision_base_lm_20251123_003859', resume_from_checkpoint=None, resume=None, init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=16, prefetch_factor=4, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=False, compile_mode='default', use_optimized_model=True, use_encoder_checkpointing=True, use_decoder_checkpointing=True)
2025-11-23 00:39:18,735 - INFO - Using preset vision prompt: 'free_ocr'''\nFree OCR.''
2025-11-23 00:39:18,735 - INFO - Setting random seed: 42
2025-11-23 00:39:20,423 - INFO - Initialized W&B run: vision-compression-2/production_vision_base_lm_20251123_003859 (ID: 7aj57hve)
2025-11-23 00:39:20,423 - INFO - Loading model and tokenizer...
2025-11-23 00:39:30,234 - INFO - Enabling decoder gradient checkpointing...
2025-11-23 00:39:30,242 - INFO - ✓ Decoder checkpointing enabled for 12 transformer layers
2025-11-23 00:39:30,242 - INFO - Expected: ~30-50% activation memory reduction, ~15-20% compute overhead
2025-11-23 00:39:30,270 - INFO - Created Vision Compression trainer (mode: base)
2025-11-23 00:39:30,270 - INFO - Training objective: lm
2025-11-23 00:39:30,303 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640
2025-11-23 00:39:30,303 - INFO - Loading training data from data/training/splits_510k/train.jsonl
2025-11-23 00:42:11,102 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl
2025-11-23 00:42:11,103 - INFO - Vision mode: base (273 tokens, 1024x1024)
2025-11-23 00:42:11,143 - INFO - Loading validation data from data/training/splits_510k/val.jsonl
2025-11-23 00:42:13,810 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl
2025-11-23 00:42:13,811 - INFO - Vision mode: base (273 tokens, 1024x1024)
2025-11-23 00:42:13,843 - INFO - Created AdamW optimizer with differential LR:
Encoder: 474 param tensors @ lr=1e-05
Decoder: 2236 param tensors @ lr=0.0001
Fused kernels: True
2025-11-23 00:42:13,843 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417
2025-11-23 00:42:13,843 - INFO - Starting training loop...
2025-11-23 00:42:13,844 - INFO -
======================================================================
2025-11-23 00:42:13,844 - INFO - Running initial validation (before any training)...
2025-11-23 00:42:13,844 - INFO - ======================================================================
2025-11-23 00:52:15,914 - INFO - Validation loss: 2.0308, perplexity: 7.62
2025-11-23 00:52:15,914 - INFO - Qualitative metrics (n=5):
2025-11-23 00:52:15,914 - INFO - BLEU: 0.0539
2025-11-23 00:52:15,914 - INFO - METEOR: 0.2296
2025-11-23 00:52:15,914 - INFO - Edit Distance: 0.7797
2025-11-23 00:52:15,915 - INFO - F-measure: 0.2083
2025-11-23 00:52:15,915 - INFO -
======================================================================
2025-11-23 00:52:15,915 - INFO - Qualitative Evaluation Samples:
2025-11-23 00:52:15,915 - INFO - ======================================================================
2025-11-23 00:52:15,915 - INFO -
Sample 1 (ID: sample_141920_chunk_1):
2025-11-23 00:52:15,915 - INFO - Context: [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-23 00:52:15,915 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...'
2025-11-23 00:52:15,916 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-23 00:52:15,916 - INFO - ----------------------------------------------------------------------
2025-11-23 00:52:15,916 - INFO -
Sample 2 (ID: sample_170543_chunk_2):
2025-11-23 00:52:15,916 - INFO - Context: [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-23 00:52:15,916 - INFO - Generated: 'was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROTC;...'
2025-11-23 00:52:15,916 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-23 00:52:15,916 - INFO - ----------------------------------------------------------------------
2025-11-23 00:52:15,916 - INFO -
Sample 3 (ID: sample_107152_chunk_9):
2025-11-23 00:52:15,916 - INFO - Context: [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-23 00:52:15,917 - INFO - Generated: 'at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and bo...'
2025-11-23 00:52:15,917 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-23 00:52:15,917 - INFO - ----------------------------------------------------------------------
2025-11-23 00:52:15,917 - INFO -
Sample 4 (ID: sample_069148_chunk_0):
2025-11-23 00:52:15,917 - INFO - Context: [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-23 00:52:15,917 - INFO - Generated: '# Oriya (Unicode block) Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....'
2025-11-23 00:52:15,918 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...'
2025-11-23 00:52:15,918 - INFO - ----------------------------------------------------------------------
2025-11-23 00:52:15,918 - INFO -
Sample 5 (ID: sample_103176_chunk_4):
2025-11-23 00:52:15,919 - INFO - Context: [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-23 00:52:15,919 - INFO - Generated: '| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores |\n|-----------------------|------------|---------|---------------------|\n| [ 132 ] | Ultima Underworld: The Stygian Abyss and ...'
2025-11-23 00:52:15,919 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...'
2025-11-23 00:52:15,919 - INFO - ----------------------------------------------------------------------
2025-11-23 00:52:15,920 - INFO -
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_0.jsonl
2025-11-23 00:52:17,178 - INFO - Initial validation - Loss: 2.0308, Perplexity: 7.62
2025-11-23 00:52:17,178 - INFO - ======================================================================
2025-11-23 00:52:17,178 - INFO -
======================================================================
2025-11-23 00:52:17,179 - INFO - Epoch 1/1
2025-11-23 00:52:17,179 - INFO - ======================================================================
2025-11-23 00:52:43,666 - WARNING - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`transformers.
2025-11-23 00:52:45,490 - INFO - Effective context tokens (per-sample): 278 | Compression ratio: 3.60x
2025-11-23 00:52:45,491 - INFO - Target tokens per sample: 1000
2025-11-23 00:57:24,015 - INFO - Epoch 1 Step 10 (Global: 10): loss=1.7593, ppl=5.81, grad_norm=1.97, lr=1.09e-06, throughput=1564 tok/s
2025-11-23 01:01:57,820 - INFO - Epoch 1 Step 20 (Global: 20): loss=1.8714, ppl=6.50, grad_norm=2.02, lr=1.17e-06, throughput=1753 tok/s
2025-11-23 01:06:41,116 - INFO - Epoch 1 Step 30 (Global: 30): loss=1.9781, ppl=7.23, grad_norm=1.91, lr=1.26e-06, throughput=1694 tok/s
2025-11-23 01:11:05,187 - INFO - Epoch 1 Step 40 (Global: 40): loss=1.9456, ppl=7.00, grad_norm=1.62, lr=1.35e-06, throughput=1818 tok/s
2025-11-23 01:15:37,139 - INFO - Epoch 1 Step 50 (Global: 50): loss=1.8491, ppl=6.35, grad_norm=1.58, lr=1.43e-06, throughput=1765 tok/s
2025-11-23 01:20:04,821 - INFO - Epoch 1 Step 60 (Global: 60): loss=1.9780, ppl=7.23, grad_norm=1.67, lr=1.52e-06, throughput=1793 tok/s
2025-11-23 01:24:37,195 - INFO - Epoch 1 Step 70 (Global: 70): loss=1.6874, ppl=5.41, grad_norm=1.50, lr=1.61e-06, throughput=1762 tok/s
2025-11-23 01:29:04,834 - INFO - Epoch 1 Step 80 (Global: 80): loss=1.9371, ppl=6.94, grad_norm=1.38, lr=1.69e-06, throughput=1793 tok/s
2025-11-23 01:33:37,286 - INFO - Epoch 1 Step 90 (Global: 90): loss=1.9360, ppl=6.93, grad_norm=1.56, lr=1.78e-06, throughput=1762 tok/s
2025-11-23 01:38:03,630 - INFO - Epoch 1 Step 100 (Global: 100): loss=1.8629, ppl=6.44, grad_norm=1.52, lr=1.86e-06, throughput=1802 tok/s
2025-11-23 01:42:29,466 - INFO - Epoch 1 Step 110 (Global: 110): loss=1.6511, ppl=5.21, grad_norm=1.53, lr=1.95e-06, throughput=1806 tok/s
2025-11-23 01:47:02,233 - INFO - Epoch 1 Step 120 (Global: 120): loss=1.9235, ppl=6.84, grad_norm=1.64, lr=2.04e-06, throughput=1760 tok/s
2025-11-23 01:51:33,410 - INFO - Epoch 1 Step 130 (Global: 130): loss=1.9405, ppl=6.96, grad_norm=1.54, lr=2.12e-06, throughput=1770 tok/s
2025-11-23 01:56:07,108 - INFO - Epoch 1 Step 140 (Global: 140): loss=1.8248, ppl=6.20, grad_norm=1.50, lr=2.21e-06, throughput=1754 tok/s
2025-11-23 02:00:35,146 - INFO - Epoch 1 Step 150 (Global: 150): loss=1.8491, ppl=6.35, grad_norm=1.62, lr=2.30e-06, throughput=1791 tok/s
2025-11-23 02:05:05,505 - INFO - Epoch 1 Step 160 (Global: 160): loss=1.8049, ppl=6.08, grad_norm=1.58, lr=2.38e-06, throughput=1775 tok/s
2025-11-23 02:09:30,912 - INFO - Epoch 1 Step 170 (Global: 170): loss=1.6813, ppl=5.37, grad_norm=1.81, lr=2.47e-06, throughput=1809 tok/s
2025-11-23 02:14:01,439 - INFO - Epoch 1 Step 180 (Global: 180): loss=2.0785, ppl=7.99, grad_norm=2.08, lr=2.56e-06, throughput=1774 tok/s
2025-11-23 02:18:25,224 - INFO - Epoch 1 Step 190 (Global: 190): loss=1.9836, ppl=7.27, grad_norm=1.52, lr=2.64e-06, throughput=1820 tok/s
2025-11-23 02:22:58,056 - INFO - Epoch 1 Step 200 (Global: 200): loss=1.9136, ppl=6.78, grad_norm=1.78, lr=2.73e-06, throughput=1759 tok/s
2025-11-23 02:27:24,835 - INFO - Epoch 1 Step 210 (Global: 210): loss=1.8671, ppl=6.47, grad_norm=1.83, lr=2.82e-06, throughput=1799 tok/s
2025-11-23 02:32:01,169 - INFO - Epoch 1 Step 220 (Global: 220): loss=1.7657, ppl=5.85, grad_norm=1.38, lr=2.90e-06, throughput=1737 tok/s
2025-11-23 02:36:25,089 - INFO - Epoch 1 Step 230 (Global: 230): loss=1.7978, ppl=6.04, grad_norm=1.97, lr=2.99e-06, throughput=1819 tok/s
2025-11-23 02:40:56,278 - INFO - Epoch 1 Step 240 (Global: 240): loss=1.7170, ppl=5.57, grad_norm=1.44, lr=3.07e-06, throughput=1770 tok/s
2025-11-23 02:45:22,171 - INFO - Epoch 1 Step 250 (Global: 250): loss=1.5863, ppl=4.89, grad_norm=3.17, lr=3.16e-06, throughput=1805 tok/s
2025-11-23 02:49:47,545 - INFO - Epoch 1 Step 260 (Global: 260): loss=1.6857, ppl=5.40, grad_norm=1.66, lr=3.25e-06, throughput=1809 tok/s
2025-11-23 02:54:21,931 - INFO - Epoch 1 Step 270 (Global: 270): loss=1.6470, ppl=5.19, grad_norm=1.64, lr=3.33e-06, throughput=1749 tok/s
2025-11-23 02:58:46,065 - INFO - Epoch 1 Step 280 (Global: 280): loss=1.7513, ppl=5.76, grad_norm=1.70, lr=3.42e-06, throughput=1817 tok/s
2025-11-23 03:03:17,628 - INFO - Epoch 1 Step 290 (Global: 290): loss=1.7990, ppl=6.04, grad_norm=1.77, lr=3.51e-06, throughput=1768 tok/s
2025-11-23 03:07:39,529 - INFO - Epoch 1 Step 300 (Global: 300): loss=1.8349, ppl=6.26, grad_norm=1.75, lr=3.59e-06, throughput=1833 tok/s
2025-11-23 03:12:10,072 - INFO - Epoch 1 Step 310 (Global: 310): loss=1.7573, ppl=5.80, grad_norm=1.47, lr=3.68e-06, throughput=1774 tok/s
2025-11-23 03:16:33,201 - INFO - Epoch 1 Step 320 (Global: 320): loss=1.7261, ppl=5.62, grad_norm=1.39, lr=3.77e-06, throughput=1824 tok/s
2025-11-23 03:21:06,264 - INFO - Epoch 1 Step 330 (Global: 330): loss=1.8710, ppl=6.49, grad_norm=1.66, lr=3.85e-06, throughput=1758 tok/s
2025-11-23 03:25:30,199 - INFO - Epoch 1 Step 340 (Global: 340): loss=1.8300, ppl=6.23, grad_norm=1.73, lr=3.94e-06, throughput=1819 tok/s
2025-11-23 03:30:01,991 - INFO - Epoch 1 Step 350 (Global: 350): loss=1.8026, ppl=6.07, grad_norm=1.97, lr=4.03e-06, throughput=1766 tok/s
2025-11-23 03:34:23,484 - INFO - Epoch 1 Step 360 (Global: 360): loss=1.7302, ppl=5.64, grad_norm=2.05, lr=4.11e-06, throughput=1836 tok/s
2025-11-23 03:38:55,945 - INFO - Epoch 1 Step 370 (Global: 370): loss=1.9307, ppl=6.89, grad_norm=1.64, lr=4.20e-06, throughput=1762 tok/s
2025-11-23 03:43:13,440 - INFO - Epoch 1 Step 380 (Global: 380): loss=1.8742, ppl=6.52, grad_norm=1.98, lr=4.29e-06, throughput=1864 tok/s
2025-11-23 03:47:40,900 - INFO - Epoch 1 Step 390 (Global: 390): loss=1.8013, ppl=6.06, grad_norm=2.14, lr=4.37e-06, throughput=1795 tok/s
2025-11-23 03:52:06,421 - INFO - Epoch 1 Step 400 (Global: 400): loss=1.8168, ppl=6.15, grad_norm=3.77, lr=4.46e-06, throughput=1808 tok/s
2025-11-23 03:56:41,183 - INFO - Epoch 1 Step 410 (Global: 410): loss=1.7967, ppl=6.03, grad_norm=1.97, lr=4.54e-06, throughput=1747 tok/s
2025-11-23 04:01:07,390 - INFO - Epoch 1 Step 420 (Global: 420): loss=1.9419, ppl=6.97, grad_norm=1.66, lr=4.63e-06, throughput=1803 tok/s
2025-11-23 04:05:26,674 - INFO - Epoch 1 Step 430 (Global: 430): loss=2.0034, ppl=7.41, grad_norm=1.66, lr=4.72e-06, throughput=1851 tok/s
2025-11-23 04:09:57,532 - INFO - Epoch 1 Step 440 (Global: 440): loss=1.8251, ppl=6.20, grad_norm=1.52, lr=4.80e-06, throughput=1772 tok/s
2025-11-23 04:14:22,077 - INFO - Epoch 1 Step 450 (Global: 450): loss=1.7433, ppl=5.72, grad_norm=1.47, lr=4.89e-06, throughput=1814 tok/s
2025-11-23 04:18:52,978 - INFO - Epoch 1 Step 460 (Global: 460): loss=1.7405, ppl=5.70, grad_norm=1.48, lr=4.98e-06, throughput=1772 tok/s
2025-11-23 04:23:13,662 - INFO - Epoch 1 Step 470 (Global: 470): loss=2.0292, ppl=7.61, grad_norm=1.52, lr=5.06e-06, throughput=1841 tok/s
2025-11-23 04:27:52,546 - INFO - Epoch 1 Step 480 (Global: 480): loss=1.7619, ppl=5.82, grad_norm=1.92, lr=5.15e-06, throughput=1721 tok/s
2025-11-23 04:32:22,310 - INFO - Epoch 1 Step 490 (Global: 490): loss=1.6590, ppl=5.25, grad_norm=1.68, lr=5.24e-06, throughput=1779 tok/s
2025-11-23 04:36:57,430 - INFO - Epoch 1 Step 500 (Global: 500): loss=1.9220, ppl=6.83, grad_norm=1.53, lr=5.32e-06, throughput=1745 tok/s
2025-11-23 04:41:18,878 - INFO - Epoch 1 Step 510 (Global: 510): loss=1.9820, ppl=7.26, grad_norm=1.72, lr=5.41e-06, throughput=1836 tok/s
2025-11-23 04:45:47,594 - INFO - Epoch 1 Step 520 (Global: 520): loss=1.7555, ppl=5.79, grad_norm=1.45, lr=5.50e-06, throughput=1786 tok/s
2025-11-23 04:50:31,443 - INFO - Epoch 1 Step 530 (Global: 530): loss=1.8434, ppl=6.32, grad_norm=1.84, lr=5.58e-06, throughput=1691 tok/s
2025-11-23 04:55:13,819 - INFO - Epoch 1 Step 540 (Global: 540): loss=1.8080, ppl=6.10, grad_norm=1.96, lr=5.67e-06, throughput=1700 tok/s
2025-11-23 04:59:41,762 - INFO - Epoch 1 Step 550 (Global: 550): loss=1.7525, ppl=5.77, grad_norm=1.88, lr=5.76e-06, throughput=1791 tok/s
2025-11-23 05:04:19,053 - INFO - Epoch 1 Step 560 (Global: 560): loss=1.6107, ppl=5.01, grad_norm=2.19, lr=5.84e-06, throughput=1731 tok/s
2025-11-23 05:08:48,835 - INFO - Epoch 1 Step 570 (Global: 570): loss=1.7103, ppl=5.53, grad_norm=1.76, lr=5.93e-06, throughput=1779 tok/s
2025-11-23 05:13:24,176 - INFO - Epoch 1 Step 580 (Global: 580): loss=1.6759, ppl=5.34, grad_norm=1.56, lr=6.01e-06, throughput=1743 tok/s
2025-11-23 05:17:49,658 - INFO - Epoch 1 Step 590 (Global: 590): loss=1.7054, ppl=5.50, grad_norm=1.86, lr=6.10e-06, throughput=1808 tok/s
2025-11-23 05:22:16,893 - INFO - Epoch 1 Step 600 (Global: 600): loss=1.8302, ppl=6.24, grad_norm=2.12, lr=6.19e-06, throughput=1796 tok/s
2025-11-23 05:26:56,819 - INFO - Epoch 1 Step 610 (Global: 610): loss=1.7796, ppl=5.93, grad_norm=1.88, lr=6.27e-06, throughput=1715 tok/s
2025-11-23 05:31:23,370 - INFO - Epoch 1 Step 620 (Global: 620): loss=1.8601, ppl=6.42, grad_norm=1.50, lr=6.36e-06, throughput=1801 tok/s
2025-11-23 05:35:54,665 - INFO - Epoch 1 Step 630 (Global: 630): loss=1.9594, ppl=7.10, grad_norm=1.54, lr=6.45e-06, throughput=1769 tok/s
2025-11-23 05:40:18,241 - INFO - Epoch 1 Step 640 (Global: 640): loss=1.7596, ppl=5.81, grad_norm=1.97, lr=6.53e-06, throughput=1821 tok/s
2025-11-23 05:44:57,561 - INFO - Epoch 1 Step 650 (Global: 650): loss=1.8469, ppl=6.34, grad_norm=1.32, lr=6.62e-06, throughput=1718 tok/s
2025-11-23 05:49:29,856 - INFO - Epoch 1 Step 660 (Global: 660): loss=1.4370, ppl=4.21, grad_norm=1.66, lr=6.71e-06, throughput=1763 tok/s
2025-11-23 05:54:04,170 - INFO - Epoch 1 Step 670 (Global: 670): loss=1.8127, ppl=6.13, grad_norm=1.54, lr=6.79e-06, throughput=1750 tok/s
2025-11-23 05:58:34,903 - INFO - Epoch 1 Step 680 (Global: 680): loss=1.8977, ppl=6.67, grad_norm=1.72, lr=6.88e-06, throughput=1773 tok/s
2025-11-23 06:03:13,360 - INFO - Epoch 1 Step 690 (Global: 690): loss=1.6731, ppl=5.33, grad_norm=2.06, lr=6.97e-06, throughput=1724 tok/s
2025-11-23 06:07:43,687 - INFO - Epoch 1 Step 700 (Global: 700): loss=1.9364, ppl=6.93, grad_norm=1.72, lr=7.05e-06, throughput=1776 tok/s
2025-11-23 06:12:22,812 - INFO - Epoch 1 Step 710 (Global: 710): loss=1.7367, ppl=5.68, grad_norm=1.79, lr=7.14e-06, throughput=1720 tok/s
2025-11-23 06:16:54,320 - INFO - Epoch 1 Step 720 (Global: 720): loss=1.9686, ppl=7.16, grad_norm=1.56, lr=7.22e-06, throughput=1768 tok/s
2025-11-23 06:21:37,629 - INFO - Epoch 1 Step 730 (Global: 730): loss=1.9925, ppl=7.33, grad_norm=1.66, lr=7.31e-06, throughput=1694 tok/s
2025-11-23 06:26:12,390 - INFO - Epoch 1 Step 740 (Global: 740): loss=1.7124, ppl=5.54, grad_norm=1.61, lr=7.40e-06, throughput=1747 tok/s
2025-11-23 06:30:42,732 - INFO - Epoch 1 Step 750 (Global: 750): loss=1.6240, ppl=5.07, grad_norm=1.38, lr=7.48e-06, throughput=1776 tok/s
2025-11-23 06:35:19,563 - INFO - Epoch 1 Step 760 (Global: 760): loss=1.8446, ppl=6.33, grad_norm=1.66, lr=7.57e-06, throughput=1734 tok/s
2025-11-23 06:39:49,085 - INFO - Epoch 1 Step 770 (Global: 770): loss=1.7033, ppl=5.49, grad_norm=1.75, lr=7.66e-06, throughput=1781 tok/s
2025-11-23 06:44:24,611 - INFO - Epoch 1 Step 780 (Global: 780): loss=1.6641, ppl=5.28, grad_norm=1.59, lr=7.74e-06, throughput=1742 tok/s
2025-11-23 06:48:49,166 - INFO - Epoch 1 Step 790 (Global: 790): loss=1.5717, ppl=4.81, grad_norm=2.38, lr=7.83e-06, throughput=1814 tok/s
2025-11-23 06:53:27,207 - INFO - Epoch 1 Step 800 (Global: 800): loss=1.8194, ppl=6.17, grad_norm=1.51, lr=7.92e-06, throughput=1726 tok/s
2025-11-23 06:57:59,735 - INFO - Epoch 1 Step 810 (Global: 810): loss=1.8363, ppl=6.27, grad_norm=1.62, lr=8.00e-06, throughput=1761 tok/s
2025-11-23 07:02:39,560 - INFO - Epoch 1 Step 820 (Global: 820): loss=1.6740, ppl=5.33, grad_norm=1.87, lr=8.09e-06, throughput=1715 tok/s
2025-11-23 07:07:10,158 - INFO - Epoch 1 Step 830 (Global: 830): loss=1.8374, ppl=6.28, grad_norm=1.66, lr=8.18e-06, throughput=1774 tok/s
2025-11-23 07:11:55,310 - INFO - Epoch 1 Step 840 (Global: 840): loss=1.8046, ppl=6.08, grad_norm=1.80, lr=8.26e-06, throughput=1683 tok/s
2025-11-23 07:16:27,710 - INFO - Epoch 1 Step 850 (Global: 850): loss=2.1125, ppl=8.27, grad_norm=1.51, lr=8.35e-06, throughput=1762 tok/s
2025-11-23 07:21:05,529 - INFO - Epoch 1 Step 860 (Global: 860): loss=1.8099, ppl=6.11, grad_norm=1.74, lr=8.44e-06, throughput=1728 tok/s
2025-11-23 07:25:34,528 - INFO - Epoch 1 Step 870 (Global: 870): loss=1.5395, ppl=4.66, grad_norm=1.65, lr=8.52e-06, throughput=1784 tok/s
2025-11-23 07:30:20,213 - INFO - Epoch 1 Step 880 (Global: 880): loss=1.8677, ppl=6.47, grad_norm=1.95, lr=8.61e-06, throughput=1680 tok/s
2025-11-23 07:34:52,863 - INFO - Epoch 1 Step 890 (Global: 890): loss=1.6371, ppl=5.14, grad_norm=2.11, lr=8.69e-06, throughput=1761 tok/s
2025-11-23 07:39:32,779 - INFO - Epoch 1 Step 900 (Global: 900): loss=1.6667, ppl=5.29, grad_norm=1.55, lr=8.78e-06, throughput=1715 tok/s
2025-11-23 07:44:00,274 - INFO - Epoch 1 Step 910 (Global: 910): loss=1.7994, ppl=6.05, grad_norm=1.45, lr=8.87e-06, throughput=1794 tok/s
2025-11-23 07:48:30,835 - INFO - Epoch 1 Step 920 (Global: 920): loss=1.7632, ppl=5.83, grad_norm=1.41, lr=8.95e-06, throughput=1774 tok/s
2025-11-23 07:53:09,484 - INFO - Epoch 1 Step 930 (Global: 930): loss=1.9497, ppl=7.03, grad_norm=2.33, lr=9.04e-06, throughput=1723 tok/s
2025-11-23 07:57:36,327 - INFO - Epoch 1 Step 940 (Global: 940): loss=1.6791, ppl=5.36, grad_norm=2.25, lr=9.13e-06, throughput=1799 tok/s
2025-11-23 08:02:11,875 - INFO - Epoch 1 Step 950 (Global: 950): loss=1.7534, ppl=5.77, grad_norm=1.56, lr=9.21e-06, throughput=1742 tok/s
2025-11-23 08:06:40,795 - INFO - Epoch 1 Step 960 (Global: 960): loss=1.7496, ppl=5.75, grad_norm=1.69, lr=9.30e-06, throughput=1785 tok/s
2025-11-23 08:11:22,399 - INFO - Epoch 1 Step 970 (Global: 970): loss=1.8325, ppl=6.25, grad_norm=1.59, lr=9.39e-06, throughput=1705 tok/s
2025-11-23 08:15:57,763 - INFO - Epoch 1 Step 980 (Global: 980): loss=1.7496, ppl=5.75, grad_norm=1.56, lr=9.47e-06, throughput=1743 tok/s
2025-11-23 08:20:38,394 - INFO - Epoch 1 Step 990 (Global: 990): loss=1.7299, ppl=5.64, grad_norm=1.63, lr=9.56e-06, throughput=1710 tok/s
2025-11-23 08:25:11,235 - INFO - Epoch 1 Step 1000 (Global: 1000): loss=1.8158, ppl=6.15, grad_norm=2.00, lr=9.65e-06, throughput=1759 tok/s
2025-11-23 08:29:51,571 - INFO - Epoch 1 Step 1010 (Global: 1010): loss=2.0486, ppl=7.76, grad_norm=1.88, lr=9.73e-06, throughput=1712 tok/s
2025-11-23 08:34:18,589 - INFO - Epoch 1 Step 1020 (Global: 1020): loss=2.0579, ppl=7.83, grad_norm=1.54, lr=9.82e-06, throughput=1798 tok/s
2025-11-23 08:38:57,262 - INFO - Epoch 1 Step 1030 (Global: 1030): loss=1.8081, ppl=6.10, grad_norm=1.70, lr=9.90e-06, throughput=1722 tok/s
2025-11-23 08:43:27,900 - INFO - Epoch 1 Step 1040 (Global: 1040): loss=1.6845, ppl=5.39, grad_norm=1.29, lr=9.99e-06, throughput=1774 tok/s
2025-11-23 08:48:04,489 - INFO - Epoch 1 Step 1050 (Global: 1050): loss=1.7266, ppl=5.62, grad_norm=1.70, lr=1.00e-05, throughput=1735 tok/s
2025-11-23 08:52:39,228 - INFO - Epoch 1 Step 1060 (Global: 1060): loss=1.7992, ppl=6.04, grad_norm=1.59, lr=1.00e-05, throughput=1747 tok/s
2025-11-23 08:57:09,915 - INFO - Epoch 1 Step 1070 (Global: 1070): loss=1.7306, ppl=5.64, grad_norm=1.70, lr=1.00e-05, throughput=1773 tok/s
2025-11-23 09:01:50,384 - INFO - Epoch 1 Step 1080 (Global: 1080): loss=1.9913, ppl=7.33, grad_norm=1.55, lr=1.00e-05, throughput=1711 tok/s
2025-11-23 09:06:30,346 - INFO - Epoch 1 Step 1090 (Global: 1090): loss=1.7767, ppl=5.91, grad_norm=1.89, lr=1.00e-05, throughput=1715 tok/s
2025-11-23 09:11:52,775 - INFO - Epoch 1 Step 1100 (Global: 1100): loss=1.7927, ppl=6.01, grad_norm=1.58, lr=1.00e-05, throughput=1489 tok/s
2025-11-23 09:17:02,533 - INFO - Epoch 1 Step 1110 (Global: 1110): loss=1.6752, ppl=5.34, grad_norm=1.80, lr=1.00e-05, throughput=1550 tok/s
2025-11-23 09:22:25,918 - INFO - Epoch 1 Step 1120 (Global: 1120): loss=1.9292, ppl=6.88, grad_norm=1.73, lr=1.00e-05, throughput=1484 tok/s
2025-11-23 09:27:38,164 - INFO - Epoch 1 Step 1130 (Global: 1130): loss=1.7571, ppl=5.80, grad_norm=1.81, lr=1.00e-05, throughput=1537 tok/s
2025-11-23 09:32:55,186 - INFO - Epoch 1 Step 1140 (Global: 1140): loss=1.7812, ppl=5.94, grad_norm=1.58, lr=1.00e-05, throughput=1514 tok/s
2025-11-23 09:38:00,286 - INFO - Epoch 1 Step 1150 (Global: 1150): loss=1.8185, ppl=6.16, grad_norm=1.55, lr=1.00e-05, throughput=1573 tok/s
2025-11-23 09:43:21,131 - INFO - Epoch 1 Step 1160 (Global: 1160): loss=1.8385, ppl=6.29, grad_norm=1.74, lr=1.00e-05, throughput=1496 tok/s
2025-11-23 09:48:16,144 - INFO - Epoch 1 Step 1170 (Global: 1170): loss=1.6718, ppl=5.32, grad_norm=1.78, lr=1.00e-05, throughput=1627 tok/s
2025-11-23 09:53:03,079 - INFO - Epoch 1 Step 1180 (Global: 1180): loss=1.7708, ppl=5.88, grad_norm=1.36, lr=9.99e-06, throughput=1673 tok/s
2025-11-23 09:57:38,189 - INFO - Epoch 1 Step 1190 (Global: 1190): loss=1.9106, ppl=6.76, grad_norm=1.41, lr=9.99e-06, throughput=1745 tok/s
2025-11-23 10:02:19,571 - INFO - Epoch 1 Step 1200 (Global: 1200): loss=1.5958, ppl=4.93, grad_norm=1.42, lr=9.99e-06, throughput=1706 tok/s
2025-11-23 10:06:57,515 - INFO - Epoch 1 Step 1210 (Global: 1210): loss=1.6352, ppl=5.13, grad_norm=2.16, lr=9.99e-06, throughput=1727 tok/s
2025-11-23 10:11:33,689 - INFO - Epoch 1 Step 1220 (Global: 1220): loss=1.8879, ppl=6.61, grad_norm=1.84, lr=9.99e-06, throughput=1738 tok/s
2025-11-23 10:16:20,558 - INFO - Epoch 1 Step 1230 (Global: 1230): loss=1.7523, ppl=5.77, grad_norm=1.75, lr=9.99e-06, throughput=1673 tok/s
2025-11-23 10:20:56,565 - INFO - Epoch 1 Step 1240 (Global: 1240): loss=1.5437, ppl=4.68, grad_norm=2.09, lr=9.99e-06, throughput=1739 tok/s
2025-11-23 10:25:41,681 - INFO - Epoch 1 Step 1250 (Global: 1250): loss=1.7735, ppl=5.89, grad_norm=3.05, lr=9.99e-06, throughput=1684 tok/s
2025-11-23 10:30:18,488 - INFO - Epoch 1 Step 1260 (Global: 1260): loss=1.8289, ppl=6.23, grad_norm=1.38, lr=9.99e-06, throughput=1734 tok/s
2025-11-23 10:34:59,212 - INFO - Epoch 1 Step 1270 (Global: 1270): loss=1.8474, ppl=6.34, grad_norm=1.41, lr=9.99e-06, throughput=1710 tok/s
2025-11-23 10:39:33,590 - INFO - Epoch 1 Step 1280 (Global: 1280): loss=1.7974, ppl=6.03, grad_norm=1.38, lr=9.98e-06, throughput=1749 tok/s
2025-11-23 10:44:16,615 - INFO - Epoch 1 Step 1290 (Global: 1290): loss=1.7240, ppl=5.61, grad_norm=1.48, lr=9.98e-06, throughput=1696 tok/s
2025-11-23 10:48:52,263 - INFO - Epoch 1 Step 1300 (Global: 1300): loss=1.6631, ppl=5.28, grad_norm=1.70, lr=9.98e-06, throughput=1741 tok/s
2025-11-23 10:53:36,619 - INFO - Epoch 1 Step 1310 (Global: 1310): loss=1.5080, ppl=4.52, grad_norm=1.40, lr=9.98e-06, throughput=1688 tok/s
2025-11-23 10:58:09,923 - INFO - Epoch 1 Step 1320 (Global: 1320): loss=1.9739, ppl=7.20, grad_norm=1.45, lr=9.98e-06, throughput=1756 tok/s
2025-11-23 11:02:53,499 - INFO - Epoch 1 Step 1330 (Global: 1330): loss=2.0231, ppl=7.56, grad_norm=1.62, lr=9.98e-06, throughput=1693 tok/s
2025-11-23 11:07:26,633 - INFO - Epoch 1 Step 1340 (Global: 1340): loss=1.7390, ppl=5.69, grad_norm=2.58, lr=9.97e-06, throughput=1757 tok/s
2025-11-23 11:12:12,535 - INFO - Epoch 1 Step 1350 (Global: 1350): loss=1.9460, ppl=7.00, grad_norm=1.43, lr=9.97e-06, throughput=1679 tok/s
2025-11-23 11:16:46,129 - INFO - Epoch 1 Step 1360 (Global: 1360): loss=1.9002, ppl=6.69, grad_norm=1.53, lr=9.97e-06, throughput=1754 tok/s
2025-11-23 11:21:25,242 - INFO - Epoch 1 Step 1370 (Global: 1370): loss=1.9676, ppl=7.15, grad_norm=1.77, lr=9.97e-06, throughput=1720 tok/s
2025-11-23 11:26:14,529 - INFO - Epoch 1 Step 1380 (Global: 1380): loss=1.6781, ppl=5.36, grad_norm=1.28, lr=9.97e-06, throughput=1659 tok/s
2025-11-23 11:30:47,107 - INFO - Epoch 1 Step 1390 (Global: 1390): loss=1.8783, ppl=6.54, grad_norm=1.48, lr=9.97e-06, throughput=1761 tok/s
2025-11-23 11:35:34,554 - INFO - Epoch 1 Step 1400 (Global: 1400): loss=1.9413, ppl=6.97, grad_norm=1.79, lr=9.96e-06, throughput=1671 tok/s
2025-11-23 11:40:12,920 - INFO - Epoch 1 Step 1410 (Global: 1410): loss=1.7207, ppl=5.59, grad_norm=1.60, lr=9.96e-06, throughput=1724 tok/s
2025-11-23 11:44:53,839 - INFO - Epoch 1 Step 1420 (Global: 1420): loss=1.7518, ppl=5.76, grad_norm=1.48, lr=9.96e-06, throughput=1709 tok/s
2025-11-23 11:49:29,268 - INFO - Epoch 1 Step 1430 (Global: 1430): loss=1.6266, ppl=5.09, grad_norm=1.66, lr=9.96e-06, throughput=1743 tok/s
2025-11-23 11:54:10,230 - INFO - Epoch 1 Step 1440 (Global: 1440): loss=1.8610, ppl=6.43, grad_norm=1.78, lr=9.96e-06, throughput=1708 tok/s
2025-11-23 11:58:35,404 - INFO - Epoch 1 Step 1450 (Global: 1450): loss=2.0201, ppl=7.54, grad_norm=2.25, lr=9.95e-06, throughput=1810 tok/s
2025-11-23 12:02:57,173 - INFO - Epoch 1 Step 1460 (Global: 1460): loss=1.6170, ppl=5.04, grad_norm=1.30, lr=9.95e-06, throughput=1834 tok/s
2025-11-23 12:07:14,296 - INFO - Epoch 1 Step 1470 (Global: 1470): loss=1.8996, ppl=6.68, grad_norm=1.90, lr=9.95e-06, throughput=1867 tok/s
2025-11-23 12:11:41,080 - INFO - Epoch 1 Step 1480 (Global: 1480): loss=1.8149, ppl=6.14, grad_norm=1.54, lr=9.95e-06, throughput=1799 tok/s
2025-11-23 12:15:58,189 - INFO - Epoch 1 Step 1490 (Global: 1490): loss=1.7161, ppl=5.56, grad_norm=1.42, lr=9.94e-06, throughput=1867 tok/s
2025-11-23 12:20:23,729 - INFO - Epoch 1 Step 1500 (Global: 1500): loss=1.8604, ppl=6.43, grad_norm=1.43, lr=9.94e-06, throughput=1808 tok/s
2025-11-23 12:24:40,943 - INFO - Epoch 1 Step 1510 (Global: 1510): loss=1.8503, ppl=6.36, grad_norm=1.25, lr=9.94e-06, throughput=1866 tok/s
2025-11-23 12:28:57,145 - INFO - Epoch 1 Step 1520 (Global: 1520): loss=1.7975, ppl=6.03, grad_norm=1.33, lr=9.94e-06, throughput=1874 tok/s
2025-11-23 12:33:19,906 - INFO - Epoch 1 Step 1530 (Global: 1530): loss=1.6239, ppl=5.07, grad_norm=1.44, lr=9.93e-06, throughput=1827 tok/s
2025-11-23 12:37:33,860 - INFO - Epoch 1 Step 1540 (Global: 1540): loss=1.3864, ppl=4.00, grad_norm=1.95, lr=9.93e-06, throughput=1890 tok/s
2025-11-23 12:41:56,450 - INFO - Epoch 1 Step 1550 (Global: 1550): loss=1.5598, ppl=4.76, grad_norm=1.52, lr=9.93e-06, throughput=1828 tok/s
2025-11-23 12:46:11,614 - INFO - Epoch 1 Step 1560 (Global: 1560): loss=2.0108, ppl=7.47, grad_norm=1.69, lr=9.92e-06, throughput=1881 tok/s
2025-11-23 12:50:37,762 - INFO - Epoch 1 Step 1570 (Global: 1570): loss=1.8453, ppl=6.33, grad_norm=1.24, lr=9.92e-06, throughput=1804 tok/s
2025-11-23 12:55:03,335 - INFO - Epoch 1 Step 1580 (Global: 1580): loss=1.8278, ppl=6.22, grad_norm=1.59, lr=9.92e-06, throughput=1807 tok/s
2025-11-23 12:59:53,884 - INFO - Epoch 1 Step 1590 (Global: 1590): loss=1.7180, ppl=5.57, grad_norm=1.38, lr=9.92e-06, throughput=1652 tok/s
2025-11-23 13:04:32,368 - INFO - Epoch 1 Step 1600 (Global: 1600): loss=1.7084, ppl=5.52, grad_norm=2.12, lr=9.91e-06, throughput=1724 tok/s
2025-11-23 13:09:14,567 - INFO - Epoch 1 Step 1610 (Global: 1610): loss=1.8012, ppl=6.06, grad_norm=1.38, lr=9.91e-06, throughput=1701 tok/s
2025-11-23 13:13:49,979 - INFO - Epoch 1 Step 1620 (Global: 1620): loss=1.5019, ppl=4.49, grad_norm=1.57, lr=9.91e-06, throughput=1743 tok/s
2025-11-23 13:18:35,141 - INFO - Epoch 1 Step 1630 (Global: 1630): loss=1.8531, ppl=6.38, grad_norm=2.42, lr=9.90e-06, throughput=1683 tok/s
2025-11-23 13:23:04,933 - INFO - Epoch 1 Step 1640 (Global: 1640): loss=1.6424, ppl=5.17, grad_norm=1.34, lr=9.90e-06, throughput=1779 tok/s
2025-11-23 13:27:38,269 - INFO - Epoch 1 Step 1650 (Global: 1650): loss=1.8881, ppl=6.61, grad_norm=1.42, lr=9.90e-06, throughput=1756 tok/s
2025-11-23 13:32:07,003 - INFO - Epoch 1 Step 1660 (Global: 1660): loss=1.9101, ppl=6.75, grad_norm=1.31, lr=9.89e-06, throughput=1786 tok/s
2025-11-23 13:36:24,095 - INFO - Epoch 1 Step 1670 (Global: 1670): loss=1.9131, ppl=6.77, grad_norm=1.22, lr=9.89e-06, throughput=1867 tok/s
2025-11-23 13:40:44,944 - INFO - Epoch 1 Step 1680 (Global: 1680): loss=1.6259, ppl=5.08, grad_norm=1.61, lr=9.89e-06, throughput=1840 tok/s
2025-11-23 13:45:03,642 - INFO - Epoch 1 Step 1690 (Global: 1690): loss=1.7649, ppl=5.84, grad_norm=1.59, lr=9.88e-06, throughput=1855 tok/s
2025-11-23 13:49:27,295 - INFO - Epoch 1 Step 1700 (Global: 1700): loss=1.8114, ppl=6.12, grad_norm=1.99, lr=9.88e-06, throughput=1821 tok/s
2025-11-23 13:53:45,537 - INFO - Epoch 1 Step 1710 (Global: 1710): loss=1.8832, ppl=6.57, grad_norm=1.64, lr=9.87e-06, throughput=1859 tok/s
2025-11-23 13:58:11,232 - INFO - Epoch 1 Step 1720 (Global: 1720): loss=1.8805, ppl=6.56, grad_norm=1.81, lr=9.87e-06, throughput=1807 tok/s
2025-11-23 14:02:25,345 - INFO - Epoch 1 Step 1730 (Global: 1730): loss=1.5991, ppl=4.95, grad_norm=1.86, lr=9.87e-06, throughput=1889 tok/s
2025-11-23 14:07:02,287 - INFO - Epoch 1 Step 1740 (Global: 1740): loss=1.8932, ppl=6.64, grad_norm=1.60, lr=9.86e-06, throughput=1733 tok/s
2025-11-23 14:11:19,478 - INFO - Epoch 1 Step 1750 (Global: 1750): loss=1.6949, ppl=5.45, grad_norm=1.55, lr=9.86e-06, throughput=1866 tok/s
2025-11-23 14:16:01,709 - INFO - Epoch 1 Step 1760 (Global: 1760): loss=1.5372, ppl=4.65, grad_norm=1.60, lr=9.86e-06, throughput=1701 tok/s
2025-11-23 14:20:41,772 - INFO - Epoch 1 Step 1770 (Global: 1770): loss=1.7446, ppl=5.72, grad_norm=1.45, lr=9.85e-06, throughput=1714 tok/s
2025-11-23 14:25:42,085 - INFO - Epoch 1 Step 1780 (Global: 1780): loss=1.6518, ppl=5.22, grad_norm=1.41, lr=9.85e-06, throughput=1598 tok/s
2025-11-23 14:30:19,646 - INFO - Epoch 1 Step 1790 (Global: 1790): loss=1.7556, ppl=5.79, grad_norm=1.40, lr=9.84e-06, throughput=1729 tok/s
2025-11-23 14:35:10,893 - INFO - Epoch 1 Step 1800 (Global: 1800): loss=1.6196, ppl=5.05, grad_norm=1.24, lr=9.84e-06, throughput=1648 tok/s
2025-11-23 14:39:46,086 - INFO - Epoch 1 Step 1810 (Global: 1810): loss=1.7911, ppl=6.00, grad_norm=1.59, lr=9.83e-06, throughput=1744 tok/s
2025-11-23 14:44:27,445 - INFO - Epoch 1 Step 1820 (Global: 1820): loss=1.6682, ppl=5.30, grad_norm=1.61, lr=9.83e-06, throughput=1706 tok/s
2025-11-23 14:48:58,949 - INFO - Epoch 1 Step 1830 (Global: 1830): loss=1.5032, ppl=4.50, grad_norm=1.55, lr=9.83e-06, throughput=1768 tok/s
2025-11-23 14:53:26,038 - INFO - Epoch 1 Step 1840 (Global: 1840): loss=1.8502, ppl=6.36, grad_norm=1.35, lr=9.82e-06, throughput=1797 tok/s
2025-11-23 14:57:54,220 - INFO - Epoch 1 Step 1850 (Global: 1850): loss=1.8354, ppl=6.27, grad_norm=1.79, lr=9.82e-06, throughput=1790 tok/s
2025-11-23 15:02:09,241 - INFO - Epoch 1 Step 1860 (Global: 1860): loss=1.7477, ppl=5.74, grad_norm=1.36, lr=9.81e-06, throughput=1882 tok/s
2025-11-23 15:06:54,167 - INFO - Epoch 1 Step 1870 (Global: 1870): loss=1.7156, ppl=5.56, grad_norm=1.34, lr=9.81e-06, throughput=1685 tok/s
2025-11-23 15:11:30,550 - INFO - Epoch 1 Step 1880 (Global: 1880): loss=1.4651, ppl=4.33, grad_norm=1.43, lr=9.80e-06, throughput=1737 tok/s
2025-11-23 15:16:29,033 - INFO - Epoch 1 Step 1890 (Global: 1890): loss=1.7502, ppl=5.76, grad_norm=1.38, lr=9.80e-06, throughput=1608 tok/s
2025-11-23 15:21:32,586 - INFO - Epoch 1 Step 1900 (Global: 1900): loss=1.6253, ppl=5.08, grad_norm=1.41, lr=9.79e-06, throughput=1581 tok/s
2025-11-23 15:26:42,126 - INFO - Epoch 1 Step 1910 (Global: 1910): loss=1.8108, ppl=6.12, grad_norm=1.51, lr=9.79e-06, throughput=1551 tok/s
2025-11-23 15:31:36,099 - INFO - Epoch 1 Step 1920 (Global: 1920): loss=1.8351, ppl=6.27, grad_norm=2.00, lr=9.78e-06, throughput=1633 tok/s
2025-11-23 15:36:24,551 - INFO - Epoch 1 Step 1930 (Global: 1930): loss=1.5741, ppl=4.83, grad_norm=1.38, lr=9.78e-06, throughput=1664 tok/s
2025-11-23 15:40:46,885 - INFO - Epoch 1 Step 1940 (Global: 1940): loss=1.7876, ppl=5.98, grad_norm=4.44, lr=9.77e-06, throughput=1830 tok/s
2025-11-23 15:45:30,781 - INFO - Epoch 1 Step 1950 (Global: 1950): loss=1.6304, ppl=5.11, grad_norm=1.20, lr=9.77e-06, throughput=1691 tok/s
2025-11-23 15:50:22,398 - INFO - Epoch 1 Step 1960 (Global: 1960): loss=1.6829, ppl=5.38, grad_norm=1.80, lr=9.76e-06, throughput=1646 tok/s
2025-11-23 15:54:51,970 - INFO - Epoch 1 Step 1970 (Global: 1970): loss=1.7271, ppl=5.62, grad_norm=2.12, lr=9.76e-06, throughput=1781 tok/s
2025-11-23 15:59:22,747 - INFO - Epoch 1 Step 1980 (Global: 1980): loss=1.7655, ppl=5.84, grad_norm=1.55, lr=9.75e-06, throughput=1773 tok/s
2025-11-23 16:03:44,889 - INFO - Epoch 1 Step 1990 (Global: 1990): loss=1.6094, ppl=5.00, grad_norm=1.35, lr=9.75e-06, throughput=1831 tok/s
2025-11-23 16:08:40,786 - INFO - Epoch 1 Step 2000 (Global: 2000): loss=1.8451, ppl=6.33, grad_norm=1.32, lr=9.74e-06, throughput=1622 tok/s
2025-11-23 16:08:40,786 - INFO -
Running validation at step 2000...
2025-11-23 16:22:20,399 - INFO - Validation loss: 1.7648, perplexity: 5.84
2025-11-23 16:22:20,400 - INFO - Qualitative metrics (n=5):
2025-11-23 16:22:20,400 - INFO - BLEU: 0.1070
2025-11-23 16:22:20,400 - INFO - METEOR: 0.1883
2025-11-23 16:22:20,401 - INFO - Edit Distance: 0.7417
2025-11-23 16:22:20,401 - INFO - F-measure: 0.2050
2025-11-23 16:22:20,401 - INFO -
======================================================================
2025-11-23 16:22:20,402 - INFO - Qualitative Evaluation Samples:
2025-11-23 16:22:20,402 - INFO - ======================================================================
2025-11-23 16:22:20,402 - INFO -
Sample 1 (ID: sample_141920_chunk_1):
2025-11-23 16:22:20,402 - INFO - Context: [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-23 16:22:20,403 - INFO - Generated: ' to the work of the Beatles, saying that "the Beatles\' work is a work of art, and the work of the Beatles is a work of art. The Beatles\' work is a work of art, and the work of the Beatles is a work of...'
2025-11-23 16:22:20,404 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-23 16:22:20,404 - INFO - ----------------------------------------------------------------------
2025-11-23 16:22:20,404 - INFO -
Sample 2 (ID: sample_170543_chunk_2):
2025-11-23 16:22:20,405 - INFO - Context: [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-23 16:22:20,405 - INFO - Generated: "aternities, but rather among the student body. The Order of Angel's first meeting was held in the home of a student, and the Order's first official meeting was held in the home of a student. The Order..."
2025-11-23 16:22:20,405 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-23 16:22:20,406 - INFO - ----------------------------------------------------------------------
2025-11-23 16:22:20,406 - INFO -
Sample 3 (ID: sample_107152_chunk_9):
2025-11-23 16:22:20,407 - INFO - Context: [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-23 16:22:20,407 - INFO - Generated: " be defeated by Teimou's son, Teimou's son's son, and Teimou's son's son. Teimou's son, Teimou's son's son, and Teimou's son's son, all defeat Teimou's son, and Teimou's son's son, and Teimou's son's ..."
2025-11-23 16:22:20,408 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-23 16:22:20,408 - INFO - ----------------------------------------------------------------------
2025-11-23 16:22:20,408 - INFO -
Sample 4 (ID: sample_069148_chunk_0):
2025-11-23 16:22:20,409 - INFO - Context: [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-23 16:22:20,409 - INFO - Generated: ' | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x00..0x0f | 0x...'
2025-11-23 16:22:20,410 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...'
2025-11-23 16:22:20,410 - INFO - ----------------------------------------------------------------------
2025-11-23 16:22:20,411 - INFO -
Sample 5 (ID: sample_103176_chunk_4):
2025-11-23 16:22:20,411 - INFO - Context: [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-23 16:22:20,411 - INFO - Generated: '1 | iOS | EA Tiburon | [ 150 ] |\n| Madden NFL 12 2011 | August 30, 2011 | Android | EA Tiburon | [ 151 ] |\n| Madden NFL 12 2011 | August 30, 2011 | iOS | EA Tiburon | [ 152 ] |\n...'
2025-11-23 16:22:20,412 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...'
2025-11-23 16:22:20,412 - INFO - ----------------------------------------------------------------------
2025-11-23 16:22:20,414 - INFO -
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_2000.jsonl
2025-11-23 16:23:24,391 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-23 16:23:24,411 - INFO - New best validation loss: 1.7648, perplexity: 5.84
2025-11-23 16:29:28,251 - INFO - Epoch 1 Step 2010 (Global: 2010): loss=1.5840, ppl=4.87, grad_norm=1.48, lr=9.74e-06, throughput=1319 tok/s
2025-11-23 16:35:15,580 - INFO - Epoch 1 Step 2020 (Global: 2020): loss=1.8787, ppl=6.55, grad_norm=1.56, lr=9.73e-06, throughput=1382 tok/s
2025-11-23 16:40:44,874 - INFO - Epoch 1 Step 2030 (Global: 2030): loss=1.6081, ppl=4.99, grad_norm=1.38, lr=9.73e-06, throughput=1458 tok/s
2025-11-23 16:46:12,759 - INFO - Epoch 1 Step 2040 (Global: 2040): loss=1.7043, ppl=5.50, grad_norm=1.53, lr=9.72e-06, throughput=1464 tok/s
2025-11-23 16:51:24,387 - INFO - Epoch 1 Step 2050 (Global: 2050): loss=2.0032, ppl=7.41, grad_norm=2.56, lr=9.72e-06, throughput=1540 tok/s
2025-11-23 16:56:45,686 - INFO - Epoch 1 Step 2060 (Global: 2060): loss=1.5537, ppl=4.73, grad_norm=1.30, lr=9.71e-06, throughput=1494 tok/s
2025-11-23 17:01:31,315 - INFO - Epoch 1 Step 2070 (Global: 2070): loss=1.8877, ppl=6.60, grad_norm=1.36, lr=9.71e-06, throughput=1681 tok/s
2025-11-23 17:06:46,520 - INFO - Epoch 1 Step 2080 (Global: 2080): loss=1.5625, ppl=4.77, grad_norm=1.55, lr=9.70e-06, throughput=1523 tok/s
2025-11-23 17:12:12,084 - INFO - Epoch 1 Step 2090 (Global: 2090): loss=1.5253, ppl=4.60, grad_norm=1.78, lr=9.69e-06, throughput=1474 tok/s
2025-11-23 17:17:20,195 - INFO - Epoch 1 Step 2100 (Global: 2100): loss=1.7189, ppl=5.58, grad_norm=2.14, lr=9.69e-06, throughput=1558 tok/s
2025-11-23 17:22:41,589 - INFO - Epoch 1 Step 2110 (Global: 2110): loss=1.7053, ppl=5.50, grad_norm=1.94, lr=9.68e-06, throughput=1494 tok/s
2025-11-23 17:27:57,429 - INFO - Epoch 1 Step 2120 (Global: 2120): loss=1.7178, ppl=5.57, grad_norm=1.29, lr=9.68e-06, throughput=1520 tok/s
2025-11-23 17:32:59,749 - INFO - Epoch 1 Step 2130 (Global: 2130): loss=1.7913, ppl=6.00, grad_norm=1.50, lr=9.67e-06, throughput=1588 tok/s
2025-11-23 17:37:44,189 - INFO - Epoch 1 Step 2140 (Global: 2140): loss=1.7613, ppl=5.82, grad_norm=1.55, lr=9.66e-06, throughput=1688 tok/s
2025-11-23 17:42:46,475 - INFO - Epoch 1 Step 2150 (Global: 2150): loss=1.7173, ppl=5.57, grad_norm=1.41, lr=9.66e-06, throughput=1588 tok/s
2025-11-23 17:47:43,749 - INFO - Epoch 1 Step 2160 (Global: 2160): loss=1.6595, ppl=5.26, grad_norm=1.48, lr=9.65e-06, throughput=1615 tok/s
2025-11-23 17:52:54,880 - INFO - Epoch 1 Step 2170 (Global: 2170): loss=1.6350, ppl=5.13, grad_norm=1.43, lr=9.65e-06, throughput=1543 tok/s
2025-11-23 17:58:12,598 - INFO - Epoch 1 Step 2180 (Global: 2180): loss=1.6980, ppl=5.46, grad_norm=1.35, lr=9.64e-06, throughput=1511 tok/s
2025-11-23 18:03:15,460 - INFO - Epoch 1 Step 2190 (Global: 2190): loss=1.7360, ppl=5.67, grad_norm=1.80, lr=9.63e-06, throughput=1585 tok/s
2025-11-23 18:08:32,374 - INFO - Epoch 1 Step 2200 (Global: 2200): loss=1.7259, ppl=5.62, grad_norm=1.47, lr=9.63e-06, throughput=1515 tok/s
2025-11-23 18:14:43,266 - INFO - Epoch 1 Step 2210 (Global: 2210): loss=1.6563, ppl=5.24, grad_norm=1.30, lr=9.62e-06, throughput=1294 tok/s
2025-11-23 18:20:24,539 - INFO - Epoch 1 Step 2220 (Global: 2220): loss=1.7047, ppl=5.50, grad_norm=1.62, lr=9.61e-06, throughput=1407 tok/s
2025-11-23 18:25:35,616 - INFO - Epoch 1 Step 2230 (Global: 2230): loss=1.7395, ppl=5.69, grad_norm=1.20, lr=9.61e-06, throughput=1543 tok/s
2025-11-23 18:30:53,709 - INFO - Epoch 1 Step 2240 (Global: 2240): loss=1.6651, ppl=5.29, grad_norm=1.91, lr=9.60e-06, throughput=1509 tok/s
2025-11-23 18:36:08,481 - INFO - Epoch 1 Step 2250 (Global: 2250): loss=1.6573, ppl=5.25, grad_norm=1.48, lr=9.60e-06, throughput=1525 tok/s
2025-11-23 18:41:28,262 - INFO - Epoch 1 Step 2260 (Global: 2260): loss=1.7934, ppl=6.01, grad_norm=1.52, lr=9.59e-06, throughput=1501 tok/s
2025-11-23 18:47:00,899 - INFO - Epoch 1 Step 2270 (Global: 2270): loss=1.6940, ppl=5.44, grad_norm=1.73, lr=9.58e-06, throughput=1443 tok/s
2025-11-23 18:52:23,802 - INFO - Epoch 1 Step 2280 (Global: 2280): loss=1.7043, ppl=5.50, grad_norm=1.18, lr=9.58e-06, throughput=1487 tok/s
2025-11-23 18:57:45,819 - INFO - Epoch 1 Step 2290 (Global: 2290): loss=1.7401, ppl=5.70, grad_norm=1.68, lr=9.57e-06, throughput=1491 tok/s
2025-11-23 19:02:51,879 - INFO - Epoch 1 Step 2300 (Global: 2300): loss=1.3915, ppl=4.02, grad_norm=1.76, lr=9.56e-06, throughput=1568 tok/s
2025-11-23 19:08:20,279 - INFO - Epoch 1 Step 2310 (Global: 2310): loss=1.7187, ppl=5.58, grad_norm=1.37, lr=9.55e-06, throughput=1462 tok/s
2025-11-23 19:13:45,519 - INFO - Epoch 1 Step 2320 (Global: 2320): loss=1.8331, ppl=6.25, grad_norm=1.18, lr=9.55e-06, throughput=1476 tok/s
2025-11-23 19:19:33,214 - INFO - Epoch 1 Step 2330 (Global: 2330): loss=1.9408, ppl=6.96, grad_norm=1.71, lr=9.54e-06, throughput=1381 tok/s
2025-11-23 19:24:49,551 - INFO - Epoch 1 Step 2340 (Global: 2340): loss=1.6437, ppl=5.17, grad_norm=1.62, lr=9.53e-06, throughput=1517 tok/s
2025-11-23 19:29:32,350 - INFO - Epoch 1 Step 2350 (Global: 2350): loss=1.7757, ppl=5.90, grad_norm=1.48, lr=9.53e-06, throughput=1697 tok/s
2025-11-23 19:34:27,415 - INFO - Epoch 1 Step 2360 (Global: 2360): loss=1.6096, ppl=5.00, grad_norm=1.28, lr=9.52e-06, throughput=1627 tok/s
2025-11-23 19:39:09,033 - INFO - Epoch 1 Step 2370 (Global: 2370): loss=1.6994, ppl=5.47, grad_norm=1.27, lr=9.51e-06, throughput=1704 tok/s
2025-11-23 19:44:01,369 - INFO - Epoch 1 Step 2380 (Global: 2380): loss=1.7777, ppl=5.92, grad_norm=1.48, lr=9.51e-06, throughput=1642 tok/s
2025-11-23 19:49:34,188 - INFO - Epoch 1 Step 2390 (Global: 2390): loss=1.5939, ppl=4.92, grad_norm=1.40, lr=9.50e-06, throughput=1442 tok/s
2025-11-23 19:56:01,035 - INFO - Epoch 1 Step 2400 (Global: 2400): loss=1.9223, ppl=6.84, grad_norm=1.26, lr=9.49e-06, throughput=1241 tok/s
2025-11-23 20:01:11,406 - INFO - Epoch 1 Step 2410 (Global: 2410): loss=1.6921, ppl=5.43, grad_norm=1.36, lr=9.48e-06, throughput=1547 tok/s
2025-11-23 20:06:31,240 - INFO - Epoch 1 Step 2420 (Global: 2420): loss=1.7208, ppl=5.59, grad_norm=1.23, lr=9.48e-06, throughput=1501 tok/s
2025-11-23 20:12:07,542 - INFO - Epoch 1 Step 2430 (Global: 2430): loss=1.6635, ppl=5.28, grad_norm=1.26, lr=9.47e-06, throughput=1427 tok/s
2025-11-23 20:17:19,543 - INFO - Epoch 1 Step 2440 (Global: 2440): loss=1.7167, ppl=5.57, grad_norm=1.40, lr=9.46e-06, throughput=1538 tok/s
2025-11-23 20:25:42,156 - INFO - Epoch 1 Step 2450 (Global: 2450): loss=1.6049, ppl=4.98, grad_norm=1.96, lr=9.45e-06, throughput=955 tok/s
2025-11-23 20:34:37,864 - INFO - Epoch 1 Step 2460 (Global: 2460): loss=1.6283, ppl=5.10, grad_norm=1.23, lr=9.45e-06, throughput=896 tok/s
2025-11-23 20:40:02,971 - INFO - Epoch 1 Step 2470 (Global: 2470): loss=1.8982, ppl=6.67, grad_norm=1.90, lr=9.44e-06, throughput=1476 tok/s
2025-11-23 20:45:34,212 - INFO - Epoch 1 Step 2480 (Global: 2480): loss=1.7487, ppl=5.75, grad_norm=1.52, lr=9.43e-06, throughput=1449 tok/s
2025-11-23 20:51:00,279 - INFO - Epoch 1 Step 2490 (Global: 2490): loss=1.7698, ppl=5.87, grad_norm=2.34, lr=9.42e-06, throughput=1472 tok/s
2025-11-23 20:56:36,283 - INFO - Epoch 1 Step 2500 (Global: 2500): loss=1.6904, ppl=5.42, grad_norm=1.37, lr=9.41e-06, throughput=1429 tok/s
2025-11-23 21:02:36,511 - INFO - Epoch 1 Step 2510 (Global: 2510): loss=1.6698, ppl=5.31, grad_norm=1.56, lr=9.41e-06, throughput=1332 tok/s
2025-11-23 21:08:47,354 - INFO - Epoch 1 Step 2520 (Global: 2520): loss=1.8006, ppl=6.05, grad_norm=1.36, lr=9.40e-06, throughput=1294 tok/s
2025-11-23 21:14:35,534 - INFO - Epoch 1 Step 2530 (Global: 2530): loss=1.6619, ppl=5.27, grad_norm=1.85, lr=9.39e-06, throughput=1379 tok/s
2025-11-23 21:20:28,094 - INFO - Epoch 1 Step 2540 (Global: 2540): loss=1.8045, ppl=6.08, grad_norm=1.65, lr=9.38e-06, throughput=1361 tok/s
2025-11-23 21:26:06,522 - INFO - Epoch 1 Step 2550 (Global: 2550): loss=1.8761, ppl=6.53, grad_norm=1.59, lr=9.37e-06, throughput=1418 tok/s
2025-11-23 21:32:04,632 - INFO - Epoch 1 Step 2560 (Global: 2560): loss=1.6369, ppl=5.14, grad_norm=1.77, lr=9.37e-06, throughput=1340 tok/s
2025-11-23 21:37:50,259 - INFO - Epoch 1 Step 2570 (Global: 2570): loss=1.6688, ppl=5.31, grad_norm=2.12, lr=9.36e-06, throughput=1389 tok/s
2025-11-23 21:43:38,570 - INFO - Epoch 1 Step 2580 (Global: 2580): loss=1.7748, ppl=5.90, grad_norm=1.41, lr=9.35e-06, throughput=1378 tok/s
2025-11-23 21:49:38,941 - INFO - Epoch 1 Step 2590 (Global: 2590): loss=1.7028, ppl=5.49, grad_norm=1.73, lr=9.34e-06, throughput=1332 tok/s
2025-11-23 21:55:28,819 - INFO - Epoch 1 Step 2600 (Global: 2600): loss=1.7614, ppl=5.82, grad_norm=1.34, lr=9.33e-06, throughput=1372 tok/s
2025-11-23 22:01:42,325 - INFO - Epoch 1 Step 2610 (Global: 2610): loss=1.6468, ppl=5.19, grad_norm=1.38, lr=9.32e-06, throughput=1285 tok/s
2025-11-23 22:07:37,175 - INFO - Epoch 1 Step 2620 (Global: 2620): loss=1.7488, ppl=5.75, grad_norm=1.42, lr=9.32e-06, throughput=1353 tok/s
2025-11-23 22:13:40,025 - INFO - Epoch 1 Step 2630 (Global: 2630): loss=1.4240, ppl=4.15, grad_norm=1.95, lr=9.31e-06, throughput=1323 tok/s
2025-11-23 22:18:58,677 - INFO - Epoch 1 Step 2640 (Global: 2640): loss=1.8505, ppl=6.36, grad_norm=2.38, lr=9.30e-06, throughput=1506 tok/s
2025-11-23 22:24:18,519 - INFO - Epoch 1 Step 2650 (Global: 2650): loss=1.8924, ppl=6.64, grad_norm=1.27, lr=9.29e-06, throughput=1501 tok/s
2025-11-23 22:29:51,566 - INFO - Epoch 1 Step 2660 (Global: 2660): loss=1.5050, ppl=4.50, grad_norm=1.20, lr=9.28e-06, throughput=1441 tok/s
2025-11-23 22:35:09,727 - INFO - Epoch 1 Step 2670 (Global: 2670): loss=1.6824, ppl=5.38, grad_norm=1.56, lr=9.27e-06, throughput=1509 tok/s
2025-11-23 22:40:45,836 - INFO - Epoch 1 Step 2680 (Global: 2680): loss=1.8919, ppl=6.63, grad_norm=2.00, lr=9.26e-06, throughput=1428 tok/s
2025-11-23 22:46:19,911 - INFO - Epoch 1 Step 2690 (Global: 2690): loss=1.6255, ppl=5.08, grad_norm=1.25, lr=9.26e-06, throughput=1437 tok/s
2025-11-23 22:51:53,547 - INFO - Epoch 1 Step 2700 (Global: 2700): loss=1.7758, ppl=5.90, grad_norm=1.88, lr=9.25e-06, throughput=1439 tok/s
2025-11-23 22:57:04,268 - INFO - Epoch 1 Step 2710 (Global: 2710): loss=1.8293, ppl=6.23, grad_norm=1.91, lr=9.24e-06, throughput=1545 tok/s
2025-11-23 23:02:27,662 - INFO - Epoch 1 Step 2720 (Global: 2720): loss=1.7584, ppl=5.80, grad_norm=1.23, lr=9.23e-06, throughput=1484 tok/s
2025-11-23 23:08:19,420 - INFO - Epoch 1 Step 2730 (Global: 2730): loss=1.7922, ppl=6.00, grad_norm=1.37, lr=9.22e-06, throughput=1365 tok/s
2025-11-23 23:13:47,425 - INFO - Epoch 1 Step 2740 (Global: 2740): loss=1.7002, ppl=5.47, grad_norm=1.41, lr=9.21e-06, throughput=1463 tok/s
2025-11-23 23:18:53,596 - INFO - Epoch 1 Step 2750 (Global: 2750): loss=1.5691, ppl=4.80, grad_norm=1.39, lr=9.20e-06, throughput=1568 tok/s
2025-11-23 23:23:41,528 - INFO - Epoch 1 Step 2760 (Global: 2760): loss=1.6779, ppl=5.35, grad_norm=1.44, lr=9.19e-06, throughput=1667 tok/s
2025-11-23 23:29:14,487 - INFO - Epoch 1 Step 2770 (Global: 2770): loss=1.6777, ppl=5.35, grad_norm=1.19, lr=9.18e-06, throughput=1442 tok/s
2025-11-23 23:35:28,897 - INFO - Epoch 1 Step 2780 (Global: 2780): loss=1.5896, ppl=4.90, grad_norm=1.55, lr=9.17e-06, throughput=1282 tok/s
2025-11-23 23:41:30,878 - INFO - Epoch 1 Step 2790 (Global: 2790): loss=1.5973, ppl=4.94, grad_norm=1.32, lr=9.17e-06, throughput=1326 tok/s
2025-11-23 23:47:16,233 - INFO - Epoch 1 Step 2800 (Global: 2800): loss=1.7595, ppl=5.81, grad_norm=1.69, lr=9.16e-06, throughput=1390 tok/s
2025-11-23 23:52:42,369 - INFO - Epoch 1 Step 2810 (Global: 2810): loss=1.5911, ppl=4.91, grad_norm=2.72, lr=9.15e-06, throughput=1472 tok/s
2025-11-23 23:57:44,159 - INFO - Epoch 1 Step 2820 (Global: 2820): loss=1.7860, ppl=5.97, grad_norm=1.94, lr=9.14e-06, throughput=1591 tok/s
2025-11-24 00:03:17,739 - INFO - Epoch 1 Step 2830 (Global: 2830): loss=1.6397, ppl=5.15, grad_norm=1.30, lr=9.13e-06, throughput=1439 tok/s
2025-11-24 00:09:25,197 - INFO - Epoch 1 Step 2840 (Global: 2840): loss=1.8479, ppl=6.35, grad_norm=1.59, lr=9.12e-06, throughput=1306 tok/s
2025-11-24 00:15:05,193 - INFO - Epoch 1 Step 2850 (Global: 2850): loss=1.5765, ppl=4.84, grad_norm=1.45, lr=9.11e-06, throughput=1412 tok/s
2025-11-24 00:20:54,612 - INFO - Epoch 1 Step 2860 (Global: 2860): loss=1.7694, ppl=5.87, grad_norm=1.53, lr=9.10e-06, throughput=1374 tok/s
2025-11-24 00:26:28,038 - INFO - Epoch 1 Step 2870 (Global: 2870): loss=1.6778, ppl=5.35, grad_norm=1.23, lr=9.09e-06, throughput=1440 tok/s
2025-11-24 00:31:41,738 - INFO - Epoch 1 Step 2880 (Global: 2880): loss=1.6309, ppl=5.11, grad_norm=1.38, lr=9.08e-06, throughput=1530 tok/s
2025-11-24 00:36:57,004 - INFO - Epoch 1 Step 2890 (Global: 2890): loss=1.7919, ppl=6.00, grad_norm=2.22, lr=9.07e-06, throughput=1523 tok/s
2025-11-24 00:42:18,190 - INFO - Epoch 1 Step 2900 (Global: 2900): loss=1.7632, ppl=5.83, grad_norm=1.21, lr=9.06e-06, throughput=1494 tok/s
2025-11-24 00:47:38,828 - INFO - Epoch 1 Step 2910 (Global: 2910): loss=1.6727, ppl=5.33, grad_norm=2.03, lr=9.05e-06, throughput=1497 tok/s
2025-11-24 00:52:30,881 - INFO - Epoch 1 Step 2920 (Global: 2920): loss=1.6450, ppl=5.18, grad_norm=1.41, lr=9.04e-06, throughput=1644 tok/s
2025-11-24 00:57:37,126 - INFO - Epoch 1 Step 2930 (Global: 2930): loss=1.6672, ppl=5.30, grad_norm=1.30, lr=9.03e-06, throughput=1567 tok/s
2025-11-24 01:02:20,409 - INFO - Epoch 1 Step 2940 (Global: 2940): loss=1.6456, ppl=5.18, grad_norm=1.23, lr=9.02e-06, throughput=1694 tok/s
2025-11-24 01:07:27,826 - INFO - Epoch 1 Step 2950 (Global: 2950): loss=1.7564, ppl=5.79, grad_norm=1.60, lr=9.01e-06, throughput=1561 tok/s
2025-11-24 01:12:15,201 - INFO - Epoch 1 Step 2960 (Global: 2960): loss=1.6244, ppl=5.08, grad_norm=1.32, lr=9.00e-06, throughput=1670 tok/s
2025-11-24 01:16:59,070 - INFO - Epoch 1 Step 2970 (Global: 2970): loss=1.7833, ppl=5.95, grad_norm=1.48, lr=8.99e-06, throughput=1691 tok/s
2025-11-24 01:21:56,480 - INFO - Epoch 1 Step 2980 (Global: 2980): loss=1.6224, ppl=5.07, grad_norm=1.27, lr=8.98e-06, throughput=1614 tok/s
2025-11-24 01:26:41,012 - INFO - Epoch 1 Step 2990 (Global: 2990): loss=1.6986, ppl=5.47, grad_norm=1.91, lr=8.97e-06, throughput=1687 tok/s
2025-11-24 01:31:43,010 - INFO - Epoch 1 Step 3000 (Global: 3000): loss=1.5507, ppl=4.71, grad_norm=1.50, lr=8.96e-06, throughput=1589 tok/s
2025-11-24 01:36:38,089 - INFO - Epoch 1 Step 3010 (Global: 3010): loss=1.6490, ppl=5.20, grad_norm=1.27, lr=8.95e-06, throughput=1627 tok/s
2025-11-24 01:41:37,816 - INFO - Epoch 1 Step 3020 (Global: 3020): loss=1.8260, ppl=6.21, grad_norm=1.59, lr=8.94e-06, throughput=1601 tok/s
2025-11-24 01:46:21,123 - INFO - Epoch 1 Step 3030 (Global: 3030): loss=1.7844, ppl=5.96, grad_norm=1.46, lr=8.93e-06, throughput=1694 tok/s
2025-11-24 01:51:00,091 - INFO - Epoch 1 Step 3040 (Global: 3040): loss=1.5789, ppl=4.85, grad_norm=2.00, lr=8.92e-06, throughput=1721 tok/s
2025-11-24 01:56:14,467 - INFO - Epoch 1 Step 3050 (Global: 3050): loss=1.7341, ppl=5.66, grad_norm=1.31, lr=8.91e-06, throughput=1527 tok/s
2025-11-24 02:01:45,395 - INFO - Epoch 1 Step 3060 (Global: 3060): loss=1.7360, ppl=5.67, grad_norm=1.38, lr=8.90e-06, throughput=1450 tok/s
2025-11-24 02:07:39,006 - INFO - Epoch 1 Step 3070 (Global: 3070): loss=1.7230, ppl=5.60, grad_norm=1.21, lr=8.89e-06, throughput=1357 tok/s
2025-11-24 02:13:16,143 - INFO - Epoch 1 Step 3080 (Global: 3080): loss=1.5886, ppl=4.90, grad_norm=1.72, lr=8.88e-06, throughput=1424 tok/s
2025-11-24 02:19:15,935 - INFO - Epoch 1 Step 3090 (Global: 3090): loss=1.3508, ppl=3.86, grad_norm=1.16, lr=8.87e-06, throughput=1334 tok/s
2025-11-24 02:24:40,801 - INFO - Epoch 1 Step 3100 (Global: 3100): loss=1.6252, ppl=5.08, grad_norm=1.83, lr=8.86e-06, throughput=1478 tok/s
2025-11-24 02:30:10,013 - INFO - Epoch 1 Step 3110 (Global: 3110): loss=1.5357, ppl=4.64, grad_norm=1.09, lr=8.85e-06, throughput=1458 tok/s
2025-11-24 02:35:51,781 - INFO - Epoch 1 Step 3120 (Global: 3120): loss=1.7571, ppl=5.80, grad_norm=1.12, lr=8.84e-06, throughput=1404 tok/s
2025-11-24 02:41:14,067 - INFO - Epoch 1 Step 3130 (Global: 3130): loss=1.7818, ppl=5.94, grad_norm=1.23, lr=8.82e-06, throughput=1489 tok/s
2025-11-24 02:46:50,770 - INFO - Epoch 1 Step 3140 (Global: 3140): loss=1.6905, ppl=5.42, grad_norm=1.34, lr=8.81e-06, throughput=1426 tok/s
2025-11-24 02:52:18,913 - INFO - Epoch 1 Step 3150 (Global: 3150): loss=1.5032, ppl=4.50, grad_norm=1.27, lr=8.80e-06, throughput=1463 tok/s
2025-11-24 02:58:07,690 - INFO - Epoch 1 Step 3160 (Global: 3160): loss=1.5661, ppl=4.79, grad_norm=1.30, lr=8.79e-06, throughput=1376 tok/s
2025-11-24 03:03:32,119 - INFO - Epoch 1 Step 3170 (Global: 3170): loss=1.8191, ppl=6.17, grad_norm=1.62, lr=8.78e-06, throughput=1480 tok/s
2025-11-24 03:09:08,975 - INFO - Epoch 1 Step 3180 (Global: 3180): loss=1.8804, ppl=6.56, grad_norm=1.70, lr=8.77e-06, throughput=1425 tok/s
2025-11-24 03:15:00,880 - INFO - Epoch 1 Step 3190 (Global: 3190): loss=1.7724, ppl=5.88, grad_norm=1.08, lr=8.76e-06, throughput=1364 tok/s
2025-11-24 03:20:26,196 - INFO - Epoch 1 Step 3200 (Global: 3200): loss=1.5930, ppl=4.92, grad_norm=1.60, lr=8.75e-06, throughput=1475 tok/s
2025-11-24 03:25:55,741 - INFO - Epoch 1 Step 3210 (Global: 3210): loss=1.8602, ppl=6.42, grad_norm=2.20, lr=8.74e-06, throughput=1457 tok/s
2025-11-24 03:31:19,744 - INFO - Epoch 1 Step 3220 (Global: 3220): loss=1.7019, ppl=5.48, grad_norm=1.60, lr=8.73e-06, throughput=1481 tok/s
2025-11-24 03:36:57,856 - INFO - Epoch 1 Step 3230 (Global: 3230): loss=1.4074, ppl=4.09, grad_norm=1.30, lr=8.71e-06, throughput=1420 tok/s
2025-11-24 03:42:20,732 - INFO - Epoch 1 Step 3240 (Global: 3240): loss=1.8178, ppl=6.16, grad_norm=1.64, lr=8.70e-06, throughput=1487 tok/s
2025-11-24 03:47:47,012 - INFO - Epoch 1 Step 3250 (Global: 3250): loss=1.5800, ppl=4.86, grad_norm=1.63, lr=8.69e-06, throughput=1471 tok/s
2025-11-24 03:53:19,792 - INFO - Epoch 1 Step 3260 (Global: 3260): loss=1.5935, ppl=4.92, grad_norm=1.19, lr=8.68e-06, throughput=1442 tok/s
2025-11-24 03:58:39,414 - INFO - Epoch 1 Step 3270 (Global: 3270): loss=1.9031, ppl=6.71, grad_norm=1.32, lr=8.67e-06, throughput=1502 tok/s
2025-11-24 04:04:08,989 - INFO - Epoch 1 Step 3280 (Global: 3280): loss=1.8715, ppl=6.50, grad_norm=1.67, lr=8.66e-06, throughput=1456 tok/s
2025-11-24 04:09:28,740 - INFO - Epoch 1 Step 3290 (Global: 3290): loss=1.9338, ppl=6.92, grad_norm=2.42, lr=8.65e-06, throughput=1501 tok/s
2025-11-24 04:14:58,238 - INFO - Epoch 1 Step 3300 (Global: 3300): loss=2.0750, ppl=7.96, grad_norm=1.55, lr=8.63e-06, throughput=1457 tok/s
2025-11-24 04:20:10,343 - INFO - Epoch 1 Step 3310 (Global: 3310): loss=1.6609, ppl=5.26, grad_norm=1.47, lr=8.62e-06, throughput=1538 tok/s
2025-11-24 04:25:39,870 - INFO - Epoch 1 Step 3320 (Global: 3320): loss=1.6898, ppl=5.42, grad_norm=1.61, lr=8.61e-06, throughput=1457 tok/s
2025-11-24 04:30:59,545 - INFO - Epoch 1 Step 3330 (Global: 3330): loss=1.3696, ppl=3.93, grad_norm=1.20, lr=8.60e-06, throughput=1502 tok/s
2025-11-24 04:36:17,206 - INFO - Epoch 1 Step 3340 (Global: 3340): loss=2.0836, ppl=8.03, grad_norm=1.38, lr=8.59e-06, throughput=1511 tok/s
2025-11-24 04:41:56,060 - INFO - Epoch 1 Step 3350 (Global: 3350): loss=1.8718, ppl=6.50, grad_norm=1.69, lr=8.58e-06, throughput=1417 tok/s
2025-11-24 04:47:18,524 - INFO - Epoch 1 Step 3360 (Global: 3360): loss=1.5494, ppl=4.71, grad_norm=1.12, lr=8.57e-06, throughput=1489 tok/s
2025-11-24 04:52:52,289 - INFO - Epoch 1 Step 3370 (Global: 3370): loss=1.7058, ppl=5.51, grad_norm=1.77, lr=8.55e-06, throughput=1438 tok/s
2025-11-24 04:58:03,644 - INFO - Epoch 1 Step 3380 (Global: 3380): loss=1.6674, ppl=5.30, grad_norm=1.83, lr=8.54e-06, throughput=1542 tok/s
2025-11-24 05:03:20,299 - INFO - Epoch 1 Step 3390 (Global: 3390): loss=1.8189, ppl=6.17, grad_norm=1.97, lr=8.53e-06, throughput=1516 tok/s
2025-11-24 05:08:25,849 - INFO - Epoch 1 Step 3400 (Global: 3400): loss=1.4819, ppl=4.40, grad_norm=1.77, lr=8.52e-06, throughput=1571 tok/s
2025-11-24 05:13:59,249 - INFO - Epoch 1 Step 3410 (Global: 3410): loss=1.7473, ppl=5.74, grad_norm=1.65, lr=8.51e-06, throughput=1440 tok/s
2025-11-24 05:19:15,307 - INFO - Epoch 1 Step 3420 (Global: 3420): loss=1.8197, ppl=6.17, grad_norm=1.98, lr=8.49e-06, throughput=1519 tok/s
2025-11-24 05:24:21,311 - INFO - Epoch 1 Step 3430 (Global: 3430): loss=1.4965, ppl=4.47, grad_norm=1.45, lr=8.48e-06, throughput=1569 tok/s
2025-11-24 05:29:43,568 - INFO - Epoch 1 Step 3440 (Global: 3440): loss=1.5994, ppl=4.95, grad_norm=1.33, lr=8.47e-06, throughput=1490 tok/s
2025-11-24 05:35:04,421 - INFO - Epoch 1 Step 3450 (Global: 3450): loss=1.6012, ppl=4.96, grad_norm=1.20, lr=8.46e-06, throughput=1496 tok/s
2025-11-24 05:40:51,719 - INFO - Epoch 1 Step 3460 (Global: 3460): loss=1.8142, ppl=6.14, grad_norm=2.08, lr=8.45e-06, throughput=1382 tok/s
2025-11-24 05:46:17,164 - INFO - Epoch 1 Step 3470 (Global: 3470): loss=1.7211, ppl=5.59, grad_norm=1.55, lr=8.43e-06, throughput=1475 tok/s
2025-11-24 05:51:55,436 - INFO - Epoch 1 Step 3480 (Global: 3480): loss=1.5514, ppl=4.72, grad_norm=1.51, lr=8.42e-06, throughput=1419 tok/s
2025-11-24 05:57:21,601 - INFO - Epoch 1 Step 3490 (Global: 3490): loss=1.7301, ppl=5.64, grad_norm=3.50, lr=8.41e-06, throughput=1472 tok/s
2025-11-24 06:03:00,345 - INFO - Epoch 1 Step 3500 (Global: 3500): loss=1.8923, ppl=6.63, grad_norm=2.00, lr=8.40e-06, throughput=1417 tok/s
2025-11-24 06:08:45,393 - INFO - Epoch 1 Step 3510 (Global: 3510): loss=1.8113, ppl=6.12, grad_norm=1.16, lr=8.38e-06, throughput=1391 tok/s
2025-11-24 06:14:37,678 - INFO - Epoch 1 Step 3520 (Global: 3520): loss=1.8193, ppl=6.17, grad_norm=1.69, lr=8.37e-06, throughput=1363 tok/s
2025-11-24 06:20:27,441 - INFO - Epoch 1 Step 3530 (Global: 3530): loss=1.5946, ppl=4.93, grad_norm=1.19, lr=8.36e-06, throughput=1372 tok/s
2025-11-24 06:25:49,937 - INFO - Epoch 1 Step 3540 (Global: 3540): loss=1.6871, ppl=5.40, grad_norm=1.34, lr=8.35e-06, throughput=1488 tok/s
2025-11-24 06:31:44,947 - INFO - Epoch 1 Step 3550 (Global: 3550): loss=1.8530, ppl=6.38, grad_norm=1.33, lr=8.33e-06, throughput=1352 tok/s
2025-11-24 06:37:57,900 - INFO - Epoch 1 Step 3560 (Global: 3560): loss=1.8897, ppl=6.62, grad_norm=1.21, lr=8.32e-06, throughput=1287 tok/s
2025-11-24 06:43:53,069 - INFO - Epoch 1 Step 3570 (Global: 3570): loss=1.5388, ppl=4.66, grad_norm=1.54, lr=8.31e-06, throughput=1351 tok/s
2025-11-24 06:49:28,442 - INFO - Epoch 1 Step 3580 (Global: 3580): loss=1.8543, ppl=6.39, grad_norm=1.44, lr=8.30e-06, throughput=1431 tok/s
2025-11-24 06:54:54,167 - INFO - Epoch 1 Step 3590 (Global: 3590): loss=1.5853, ppl=4.88, grad_norm=1.80, lr=8.28e-06, throughput=1474 tok/s
2025-11-24 07:00:52,272 - INFO - Epoch 1 Step 3600 (Global: 3600): loss=1.4740, ppl=4.37, grad_norm=1.42, lr=8.27e-06, throughput=1340 tok/s
2025-11-24 07:05:56,452 - INFO - Epoch 1 Step 3610 (Global: 3610): loss=1.6454, ppl=5.18, grad_norm=1.46, lr=8.26e-06, throughput=1578 tok/s
2025-11-24 07:11:17,551 - INFO - Epoch 1 Step 3620 (Global: 3620): loss=1.5001, ppl=4.48, grad_norm=2.09, lr=8.25e-06, throughput=1495 tok/s
2025-11-24 07:16:19,755 - INFO - Epoch 1 Step 3630 (Global: 3630): loss=1.3267, ppl=3.77, grad_norm=1.27, lr=8.23e-06, throughput=1588 tok/s
2025-11-24 07:21:32,891 - INFO - Epoch 1 Step 3640 (Global: 3640): loss=1.4707, ppl=4.35, grad_norm=1.77, lr=8.22e-06, throughput=1533 tok/s
2025-11-24 07:26:25,944 - INFO - Epoch 1 Step 3650 (Global: 3650): loss=1.8117, ppl=6.12, grad_norm=1.45, lr=8.21e-06, throughput=1638 tok/s
2025-11-24 07:31:33,781 - INFO - Epoch 1 Step 3660 (Global: 3660): loss=1.7129, ppl=5.55, grad_norm=1.70, lr=8.20e-06, throughput=1559 tok/s
2025-11-24 07:36:27,168 - INFO - Epoch 1 Step 3670 (Global: 3670): loss=1.5937, ppl=4.92, grad_norm=1.70, lr=8.18e-06, throughput=1636 tok/s
2025-11-24 07:41:15,906 - INFO - Epoch 1 Step 3680 (Global: 3680): loss=1.2619, ppl=3.53, grad_norm=2.50, lr=8.17e-06, throughput=1662 tok/s
2025-11-24 07:46:23,753 - INFO - Epoch 1 Step 3690 (Global: 3690): loss=1.6501, ppl=5.21, grad_norm=1.36, lr=8.16e-06, throughput=1559 tok/s
2025-11-24 07:51:13,200 - INFO - Epoch 1 Step 3700 (Global: 3700): loss=1.5883, ppl=4.90, grad_norm=1.42, lr=8.14e-06, throughput=1658 tok/s
2025-11-24 07:56:08,503 - INFO - Epoch 1 Step 3710 (Global: 3710): loss=1.4121, ppl=4.10, grad_norm=1.22, lr=8.13e-06, throughput=1625 tok/s
2025-11-24 08:00:50,688 - INFO - Epoch 1 Step 3720 (Global: 3720): loss=1.5653, ppl=4.78, grad_norm=1.75, lr=8.12e-06, throughput=1701 tok/s
2025-11-24 08:05:56,531 - INFO - Epoch 1 Step 3730 (Global: 3730): loss=1.8384, ppl=6.29, grad_norm=1.90, lr=8.10e-06, throughput=1569 tok/s
2025-11-24 08:10:46,192 - INFO - Epoch 1 Step 3740 (Global: 3740): loss=1.7941, ppl=6.01, grad_norm=4.50, lr=8.09e-06, throughput=1657 tok/s
2025-11-24 08:15:49,504 - INFO - Epoch 1 Step 3750 (Global: 3750): loss=1.5211, ppl=4.58, grad_norm=1.43, lr=8.08e-06, throughput=1583 tok/s
2025-11-24 08:20:45,287 - INFO - Epoch 1 Step 3760 (Global: 3760): loss=1.6703, ppl=5.31, grad_norm=1.98, lr=8.06e-06, throughput=1623 tok/s
2025-11-24 08:25:38,926 - INFO - Epoch 1 Step 3770 (Global: 3770): loss=1.5521, ppl=4.72, grad_norm=1.57, lr=8.05e-06, throughput=1635 tok/s
2025-11-24 08:30:51,433 - INFO - Epoch 1 Step 3780 (Global: 3780): loss=1.7336, ppl=5.66, grad_norm=1.19, lr=8.04e-06, throughput=1536 tok/s
2025-11-24 08:35:45,517 - INFO - Epoch 1 Step 3790 (Global: 3790): loss=1.4077, ppl=4.09, grad_norm=1.45, lr=8.02e-06, throughput=1632 tok/s
2025-11-24 08:40:49,663 - INFO - Epoch 1 Step 3800 (Global: 3800): loss=1.7765, ppl=5.91, grad_norm=1.19, lr=8.01e-06, throughput=1578 tok/s
2025-11-24 08:45:46,218 - INFO - Epoch 1 Step 3810 (Global: 3810): loss=1.7064, ppl=5.51, grad_norm=1.70, lr=8.00e-06, throughput=1619 tok/s
2025-11-24 08:50:51,738 - INFO - Epoch 1 Step 3820 (Global: 3820): loss=1.7030, ppl=5.49, grad_norm=1.18, lr=7.98e-06, throughput=1571 tok/s
2025-11-24 08:55:44,644 - INFO - Epoch 1 Step 3830 (Global: 3830): loss=1.7979, ppl=6.04, grad_norm=2.44, lr=7.97e-06, throughput=1639 tok/s
2025-11-24 09:00:48,510 - INFO - Epoch 1 Step 3840 (Global: 3840): loss=1.6486, ppl=5.20, grad_norm=1.50, lr=7.96e-06, throughput=1580 tok/s
2025-11-24 09:05:35,414 - INFO - Epoch 1 Step 3850 (Global: 3850): loss=1.7207, ppl=5.59, grad_norm=1.24, lr=7.94e-06, throughput=1673 tok/s
2025-11-24 09:10:21,520 - INFO - Epoch 1 Step 3860 (Global: 3860): loss=1.4644, ppl=4.32, grad_norm=1.80, lr=7.93e-06, throughput=1678 tok/s
2025-11-24 09:15:24,042 - INFO - Epoch 1 Step 3870 (Global: 3870): loss=1.6293, ppl=5.10, grad_norm=1.85, lr=7.92e-06, throughput=1587 tok/s
2025-11-24 09:20:08,035 - INFO - Epoch 1 Step 3880 (Global: 3880): loss=1.4442, ppl=4.24, grad_norm=2.05, lr=7.90e-06, throughput=1690 tok/s
2025-11-24 09:25:01,832 - INFO - Epoch 1 Step 3890 (Global: 3890): loss=1.7823, ppl=5.94, grad_norm=2.50, lr=7.89e-06, throughput=1634 tok/s
2025-11-24 09:29:39,762 - INFO - Epoch 1 Step 3900 (Global: 3900): loss=1.5292, ppl=4.61, grad_norm=1.89, lr=7.88e-06, throughput=1727 tok/s
2025-11-24 09:34:32,480 - INFO - Epoch 1 Step 3910 (Global: 3910): loss=1.6625, ppl=5.27, grad_norm=1.19, lr=7.86e-06, throughput=1640 tok/s
2025-11-24 09:39:23,077 - INFO - Epoch 1 Step 3920 (Global: 3920): loss=1.5899, ppl=4.90, grad_norm=1.07, lr=7.85e-06, throughput=1652 tok/s
2025-11-24 09:44:43,412 - INFO - Epoch 1 Step 3930 (Global: 3930): loss=1.6526, ppl=5.22, grad_norm=1.96, lr=7.83e-06, throughput=1498 tok/s
2025-11-24 09:49:51,452 - INFO - Epoch 1 Step 3940 (Global: 3940): loss=1.6522, ppl=5.22, grad_norm=2.11, lr=7.82e-06, throughput=1558 tok/s
2025-11-24 09:55:25,497 - INFO - Epoch 1 Step 3950 (Global: 3950): loss=1.8583, ppl=6.41, grad_norm=1.56, lr=7.81e-06, throughput=1437 tok/s
2025-11-24 10:01:16,797 - INFO - Epoch 1 Step 3960 (Global: 3960): loss=1.8830, ppl=6.57, grad_norm=1.43, lr=7.79e-06, throughput=1366 tok/s
2025-11-24 10:06:49,860 - INFO - Epoch 1 Step 3970 (Global: 3970): loss=1.6664, ppl=5.29, grad_norm=1.27, lr=7.78e-06, throughput=1441 tok/s
2025-11-24 10:12:43,874 - INFO - Epoch 1 Step 3980 (Global: 3980): loss=1.7669, ppl=5.85, grad_norm=1.30, lr=7.77e-06, throughput=1356 tok/s
2025-11-24 10:18:16,566 - INFO - Epoch 1 Step 3990 (Global: 3990): loss=1.6032, ppl=4.97, grad_norm=1.44, lr=7.75e-06, throughput=1443 tok/s
2025-11-24 10:24:05,151 - INFO - Epoch 1 Step 4000 (Global: 4000): loss=1.5491, ppl=4.71, grad_norm=2.28, lr=7.74e-06, throughput=1377 tok/s
2025-11-24 10:24:05,151 - INFO -
Running validation at step 4000...
2025-11-24 10:34:55,631 - INFO - Validation loss: 1.6909, perplexity: 5.42
2025-11-24 10:34:55,631 - INFO - Qualitative metrics (n=5):
2025-11-24 10:34:55,632 - INFO - BLEU: 0.1219
2025-11-24 10:34:55,632 - INFO - METEOR: 0.2151
2025-11-24 10:34:55,632 - INFO - Edit Distance: 0.7064
2025-11-24 10:34:55,632 - INFO - F-measure: 0.2283
2025-11-24 10:34:55,632 - INFO -
======================================================================
2025-11-24 10:34:55,632 - INFO - Qualitative Evaluation Samples:
2025-11-24 10:34:55,632 - INFO - ======================================================================
2025-11-24 10:34:55,633 - INFO -
Sample 1 (ID: sample_141920_chunk_1):
2025-11-24 10:34:55,633 - INFO - Context: [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-24 10:34:55,633 - INFO - Generated: ' to the band\'s 2005 album, The A.V. Club, writing that "the album is a more accessible, less experimental, and more accessible album than the band\'s last two." He also said that "the album is a more a...'
2025-11-24 10:34:55,633 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-24 10:34:55,633 - INFO - ----------------------------------------------------------------------
2025-11-24 10:34:55,633 - INFO -
Sample 2 (ID: sample_170543_chunk_2):
2025-11-24 10:34:55,633 - INFO - Context: [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-24 10:34:55,633 - INFO - Generated: 'aternity groups in the United States. The Order of the Arrow was the first fraternity to adopt a Native American theme, and the first to adopt a Native American theme for a fraternity. The Order of th...'
2025-11-24 10:34:55,634 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-24 10:34:55,634 - INFO - ----------------------------------------------------------------------
2025-11-24 10:34:55,634 - INFO -
Sample 3 (ID: sample_107152_chunk_9):
2025-11-24 10:34:55,634 - INFO - Context: [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-24 10:34:55,635 - INFO - Generated: " be defeated by Oga and Miki. Teimou's group is then defeated by Oga and Miki, and the Red Knights are defeated by Oga and Miki. Teimou is then defeated by Oga and Miki, and the Red Knights are defeat..."
2025-11-24 10:34:55,635 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-24 10:34:55,635 - INFO - ----------------------------------------------------------------------
2025-11-24 10:34:55,635 - INFO -
Sample 4 (ID: sample_069148_chunk_0):
2025-11-24 10:34:55,635 - INFO - Context: [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-24 10:34:55,635 - INFO - Generated: '-0001 | 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0....'
2025-11-24 10:34:55,636 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...'
2025-11-24 10:34:55,636 - INFO - ----------------------------------------------------------------------
2025-11-24 10:34:55,636 - INFO -
Sample 5 (ID: sample_103176_chunk_4):
2025-11-24 10:34:55,636 - INFO - Context: [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-24 10:34:55,636 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 | August 30, 2011 | Windows Phone | EA Tiburon ...'
2025-11-24 10:34:55,636 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...'
2025-11-24 10:34:55,636 - INFO - ----------------------------------------------------------------------
2025-11-24 10:34:55,638 - INFO -
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_4000.jsonl
2025-11-24 10:35:49,072 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-24 10:35:49,086 - INFO - New best validation loss: 1.6909, perplexity: 5.42
2025-11-24 10:40:49,307 - INFO - Epoch 1 Step 4010 (Global: 4010): loss=1.5952, ppl=4.93, grad_norm=1.45, lr=7.72e-06, throughput=1599 tok/s
2025-11-24 10:46:07,227 - INFO - Epoch 1 Step 4020 (Global: 4020): loss=1.8715, ppl=6.50, grad_norm=1.47, lr=7.71e-06, throughput=1510 tok/s
2025-11-24 10:51:03,584 - INFO - Epoch 1 Step 4030 (Global: 4030): loss=1.5472, ppl=4.70, grad_norm=1.63, lr=7.70e-06, throughput=1620 tok/s
2025-11-24 10:55:54,532 - INFO - Epoch 1 Step 4040 (Global: 4040): loss=1.7048, ppl=5.50, grad_norm=6.12, lr=7.68e-06, throughput=1650 tok/s
2025-11-24 11:00:52,224 - INFO - Epoch 1 Step 4050 (Global: 4050): loss=1.4836, ppl=4.41, grad_norm=1.63, lr=7.67e-06, throughput=1612 tok/s
2025-11-24 11:05:38,789 - INFO - Epoch 1 Step 4060 (Global: 4060): loss=1.7968, ppl=6.03, grad_norm=1.52, lr=7.65e-06, throughput=1675 tok/s
2025-11-24 11:10:39,171 - INFO - Epoch 1 Step 4070 (Global: 4070): loss=1.5424, ppl=4.68, grad_norm=1.31, lr=7.64e-06, throughput=1598 tok/s
2025-11-24 11:15:27,041 - INFO - Epoch 1 Step 4080 (Global: 4080): loss=1.7416, ppl=5.71, grad_norm=1.65, lr=7.62e-06, throughput=1667 tok/s
2025-11-24 11:20:15,228 - INFO - Epoch 1 Step 4090 (Global: 4090): loss=1.7182, ppl=5.57, grad_norm=1.27, lr=7.61e-06, throughput=1666 tok/s
2025-11-24 11:25:10,295 - INFO - Epoch 1 Step 4100 (Global: 4100): loss=1.6525, ppl=5.22, grad_norm=1.16, lr=7.60e-06, throughput=1627 tok/s
2025-11-24 11:29:58,737 - INFO - Epoch 1 Step 4110 (Global: 4110): loss=1.5644, ppl=4.78, grad_norm=1.30, lr=7.58e-06, throughput=1664 tok/s
2025-11-24 11:34:57,396 - INFO - Epoch 1 Step 4120 (Global: 4120): loss=1.8527, ppl=6.38, grad_norm=1.40, lr=7.57e-06, throughput=1607 tok/s
2025-11-24 11:39:41,750 - INFO - Epoch 1 Step 4130 (Global: 4130): loss=1.7791, ppl=5.92, grad_norm=1.83, lr=7.55e-06, throughput=1688 tok/s
2025-11-24 11:44:33,971 - INFO - Epoch 1 Step 4140 (Global: 4140): loss=1.6942, ppl=5.44, grad_norm=1.70, lr=7.54e-06, throughput=1643 tok/s
2025-11-24 11:49:16,575 - INFO - Epoch 1 Step 4150 (Global: 4150): loss=1.7041, ppl=5.50, grad_norm=1.30, lr=7.52e-06, throughput=1698 tok/s
2025-11-24 11:53:59,857 - INFO - Epoch 1 Step 4160 (Global: 4160): loss=1.4694, ppl=4.35, grad_norm=1.81, lr=7.51e-06, throughput=1694 tok/s
2025-11-24 11:58:58,298 - INFO - Epoch 1 Step 4170 (Global: 4170): loss=1.8423, ppl=6.31, grad_norm=1.33, lr=7.49e-06, throughput=1608 tok/s
2025-11-24 12:03:40,686 - INFO - Epoch 1 Step 4180 (Global: 4180): loss=1.7502, ppl=5.76, grad_norm=2.16, lr=7.48e-06, throughput=1700 tok/s
2025-11-24 12:08:32,851 - INFO - Epoch 1 Step 4190 (Global: 4190): loss=1.4775, ppl=4.38, grad_norm=1.74, lr=7.47e-06, throughput=1643 tok/s
2025-11-24 12:13:17,832 - INFO - Epoch 1 Step 4200 (Global: 4200): loss=1.5940, ppl=4.92, grad_norm=1.39, lr=7.45e-06, throughput=1684 tok/s
2025-11-24 12:18:12,681 - INFO - Epoch 1 Step 4210 (Global: 4210): loss=1.6196, ppl=5.05, grad_norm=1.88, lr=7.44e-06, throughput=1628 tok/s
2025-11-24 12:22:54,536 - INFO - Epoch 1 Step 4220 (Global: 4220): loss=1.7733, ppl=5.89, grad_norm=1.73, lr=7.42e-06, throughput=1703 tok/s
2025-11-24 12:27:47,337 - INFO - Epoch 1 Step 4230 (Global: 4230): loss=1.7052, ppl=5.50, grad_norm=1.46, lr=7.41e-06, throughput=1639 tok/s
2025-11-24 12:32:31,319 - INFO - Epoch 1 Step 4240 (Global: 4240): loss=1.7652, ppl=5.84, grad_norm=1.74, lr=7.39e-06, throughput=1690 tok/s
2025-11-24 12:37:13,979 - INFO - Epoch 1 Step 4250 (Global: 4250): loss=1.7193, ppl=5.58, grad_norm=1.34, lr=7.38e-06, throughput=1698 tok/s
2025-11-24 12:42:07,589 - INFO - Epoch 1 Step 4260 (Global: 4260): loss=1.6922, ppl=5.43, grad_norm=1.59, lr=7.36e-06, throughput=1635 tok/s
2025-11-24 12:48:03,280 - INFO - Epoch 1 Step 4270 (Global: 4270): loss=1.6225, ppl=5.07, grad_norm=1.38, lr=7.35e-06, throughput=1349 tok/s
2025-11-24 12:54:47,243 - INFO - Epoch 1 Step 4280 (Global: 4280): loss=1.5675, ppl=4.79, grad_norm=1.78, lr=7.33e-06, throughput=1188 tok/s
2025-11-24 13:00:25,667 - INFO - Epoch 1 Step 4290 (Global: 4290): loss=1.5705, ppl=4.81, grad_norm=1.40, lr=7.32e-06, throughput=1418 tok/s
2025-11-24 13:06:27,594 - INFO - Epoch 1 Step 4300 (Global: 4300): loss=1.4040, ppl=4.07, grad_norm=1.33, lr=7.30e-06, throughput=1326 tok/s
2025-11-24 13:12:23,908 - INFO - Epoch 1 Step 4310 (Global: 4310): loss=1.3817, ppl=3.98, grad_norm=1.28, lr=7.29e-06, throughput=1347 tok/s
2025-11-24 13:18:05,840 - INFO - Epoch 1 Step 4320 (Global: 4320): loss=1.7035, ppl=5.49, grad_norm=1.75, lr=7.27e-06, throughput=1404 tok/s
2025-11-24 13:24:23,129 - INFO - Epoch 1 Step 4330 (Global: 4330): loss=1.5989, ppl=4.95, grad_norm=1.58, lr=7.26e-06, throughput=1272 tok/s
2025-11-24 13:30:03,346 - INFO - Epoch 1 Step 4340 (Global: 4340): loss=1.5159, ppl=4.55, grad_norm=1.21, lr=7.24e-06, throughput=1411 tok/s
2025-11-24 13:36:08,610 - INFO - Epoch 1 Step 4350 (Global: 4350): loss=1.6193, ppl=5.05, grad_norm=1.45, lr=7.23e-06, throughput=1314 tok/s
2025-11-24 13:42:07,704 - INFO - Epoch 1 Step 4360 (Global: 4360): loss=1.4276, ppl=4.17, grad_norm=1.23, lr=7.21e-06, throughput=1337 tok/s
2025-11-24 13:48:14,093 - INFO - Epoch 1 Step 4370 (Global: 4370): loss=1.7500, ppl=5.75, grad_norm=1.48, lr=7.20e-06, throughput=1310 tok/s
2025-11-24 13:54:08,079 - INFO - Epoch 1 Step 4380 (Global: 4380): loss=1.7672, ppl=5.85, grad_norm=1.41, lr=7.18e-06, throughput=1356 tok/s
2025-11-24 13:59:35,042 - INFO - Epoch 1 Step 4390 (Global: 4390): loss=1.7438, ppl=5.72, grad_norm=1.67, lr=7.17e-06, throughput=1468 tok/s
2025-11-24 14:05:04,279 - INFO - Epoch 1 Step 4400 (Global: 4400): loss=1.5519, ppl=4.72, grad_norm=1.78, lr=7.15e-06, throughput=1458 tok/s
2025-11-24 14:10:30,165 - INFO - Epoch 1 Step 4410 (Global: 4410): loss=1.7879, ppl=5.98, grad_norm=2.75, lr=7.14e-06, throughput=1473 tok/s
2025-11-24 14:16:18,511 - INFO - Epoch 1 Step 4420 (Global: 4420): loss=1.8286, ppl=6.23, grad_norm=1.95, lr=7.12e-06, throughput=1378 tok/s
2025-11-24 14:21:20,515 - INFO - Epoch 1 Step 4430 (Global: 4430): loss=1.6419, ppl=5.16, grad_norm=2.11, lr=7.11e-06, throughput=1589 tok/s
2025-11-24 14:26:34,189 - INFO - Epoch 1 Step 4440 (Global: 4440): loss=1.9620, ppl=7.11, grad_norm=1.87, lr=7.09e-06, throughput=1530 tok/s
2025-11-24 14:31:04,980 - INFO - Epoch 1 Step 4450 (Global: 4450): loss=1.5524, ppl=4.72, grad_norm=1.19, lr=7.08e-06, throughput=1773 tok/s
2025-11-24 14:35:38,887 - INFO - Epoch 1 Step 4460 (Global: 4460): loss=1.8279, ppl=6.22, grad_norm=1.58, lr=7.06e-06, throughput=1752 tok/s
2025-11-24 14:40:07,422 - INFO - Epoch 1 Step 4470 (Global: 4470): loss=1.6282, ppl=5.09, grad_norm=1.30, lr=7.05e-06, throughput=1787 tok/s
2025-11-24 14:44:42,449 - INFO - Epoch 1 Step 4480 (Global: 4480): loss=1.8558, ppl=6.40, grad_norm=1.24, lr=7.03e-06, throughput=1745 tok/s
2025-11-24 14:49:22,542 - INFO - Epoch 1 Step 4490 (Global: 4490): loss=1.7864, ppl=5.97, grad_norm=7.28, lr=7.02e-06, throughput=1714 tok/s
2025-11-24 14:53:54,527 - INFO - Epoch 1 Step 4500 (Global: 4500): loss=1.7713, ppl=5.88, grad_norm=2.59, lr=7.00e-06, throughput=1765 tok/s
2025-11-24 14:58:31,101 - INFO - Epoch 1 Step 4510 (Global: 4510): loss=1.6332, ppl=5.12, grad_norm=1.82, lr=6.99e-06, throughput=1736 tok/s
2025-11-24 15:02:59,034 - INFO - Epoch 1 Step 4520 (Global: 4520): loss=1.6177, ppl=5.04, grad_norm=1.27, lr=6.97e-06, throughput=1792 tok/s
2025-11-24 15:07:57,382 - INFO - Epoch 1 Step 4530 (Global: 4530): loss=1.5990, ppl=4.95, grad_norm=1.16, lr=6.96e-06, throughput=1609 tok/s
2025-11-24 15:13:33,320 - INFO - Epoch 1 Step 4540 (Global: 4540): loss=1.4769, ppl=4.38, grad_norm=1.83, lr=6.94e-06, throughput=1429 tok/s
2025-11-24 15:38:05,256 - INFO - Epoch 1 Step 4550 (Global: 4550): loss=1.6511, ppl=5.21, grad_norm=1.63, lr=6.92e-06, throughput=326 tok/s
2025-11-24 15:49:18,639 - INFO - Epoch 1 Step 4560 (Global: 4560): loss=1.6620, ppl=5.27, grad_norm=1.73, lr=6.91e-06, throughput=713 tok/s
2025-11-24 15:59:13,640 - INFO - Epoch 1 Step 4570 (Global: 4570): loss=1.5077, ppl=4.52, grad_norm=1.45, lr=6.89e-06, throughput=807 tok/s
2025-11-24 16:08:01,964 - INFO - Epoch 1 Step 4580 (Global: 4580): loss=1.4775, ppl=4.38, grad_norm=1.52, lr=6.88e-06, throughput=909 tok/s
2025-11-24 16:13:46,186 - INFO - Epoch 1 Step 4590 (Global: 4590): loss=1.6900, ppl=5.42, grad_norm=1.59, lr=6.86e-06, throughput=1394 tok/s
2025-11-24 16:18:18,138 - INFO - Epoch 1 Step 4600 (Global: 4600): loss=1.6801, ppl=5.37, grad_norm=1.41, lr=6.85e-06, throughput=1765 tok/s
2025-11-24 16:22:39,742 - INFO - Epoch 1 Step 4610 (Global: 4610): loss=1.5483, ppl=4.70, grad_norm=1.45, lr=6.83e-06, throughput=1835 tok/s
2025-11-24 16:26:58,750 - INFO - Epoch 1 Step 4620 (Global: 4620): loss=1.8758, ppl=6.53, grad_norm=1.44, lr=6.82e-06, throughput=1853 tok/s
2025-11-24 16:31:26,942 - INFO - Epoch 1 Step 4630 (Global: 4630): loss=1.6358, ppl=5.13, grad_norm=1.27, lr=6.80e-06, throughput=1790 tok/s
2025-11-24 16:35:45,149 - INFO - Epoch 1 Step 4640 (Global: 4640): loss=1.7868, ppl=5.97, grad_norm=1.57, lr=6.78e-06, throughput=1859 tok/s
2025-11-24 16:40:13,077 - INFO - Epoch 1 Step 4650 (Global: 4650): loss=1.4805, ppl=4.40, grad_norm=3.16, lr=6.77e-06, throughput=1792 tok/s
2025-11-24 16:44:29,825 - INFO - Epoch 1 Step 4660 (Global: 4660): loss=1.6962, ppl=5.45, grad_norm=2.38, lr=6.75e-06, throughput=1870 tok/s
2025-11-24 16:48:49,999 - INFO - Epoch 1 Step 4670 (Global: 4670): loss=1.5473, ppl=4.70, grad_norm=1.40, lr=6.74e-06, throughput=1845 tok/s
2025-11-24 16:53:07,333 - INFO - Epoch 1 Step 4680 (Global: 4680): loss=1.4236, ppl=4.15, grad_norm=1.30, lr=6.72e-06, throughput=1865 tok/s
2025-11-24 16:57:24,347 - INFO - Epoch 1 Step 4690 (Global: 4690): loss=1.6978, ppl=5.46, grad_norm=1.54, lr=6.71e-06, throughput=1868 tok/s
2025-11-24 17:01:54,819 - INFO - Epoch 1 Step 4700 (Global: 4700): loss=1.6431, ppl=5.17, grad_norm=1.39, lr=6.69e-06, throughput=1775 tok/s
2025-11-24 17:06:15,846 - INFO - Epoch 1 Step 4710 (Global: 4710): loss=1.5693, ppl=4.80, grad_norm=1.38, lr=6.67e-06, throughput=1839 tok/s
2025-11-24 17:10:49,945 - INFO - Epoch 1 Step 4720 (Global: 4720): loss=1.6784, ppl=5.36, grad_norm=1.48, lr=6.66e-06, throughput=1751 tok/s
2025-11-24 17:15:15,976 - INFO - Epoch 1 Step 4730 (Global: 4730): loss=1.8042, ppl=6.07, grad_norm=1.58, lr=6.64e-06, throughput=1804 tok/s
2025-11-24 17:19:50,486 - INFO - Epoch 1 Step 4740 (Global: 4740): loss=1.8439, ppl=6.32, grad_norm=1.70, lr=6.63e-06, throughput=1749 tok/s
2025-11-24 17:24:21,174 - INFO - Epoch 1 Step 4750 (Global: 4750): loss=1.7036, ppl=5.49, grad_norm=1.65, lr=6.61e-06, throughput=1773 tok/s
2025-11-24 17:29:44,257 - INFO - Epoch 1 Step 4760 (Global: 4760): loss=1.4958, ppl=4.46, grad_norm=1.52, lr=6.60e-06, throughput=1486 tok/s
2025-11-24 17:34:37,687 - INFO - Epoch 1 Step 4770 (Global: 4770): loss=1.8783, ppl=6.54, grad_norm=2.72, lr=6.58e-06, throughput=1636 tok/s
2025-11-24 17:39:32,720 - INFO - Epoch 1 Step 4780 (Global: 4780): loss=1.6077, ppl=4.99, grad_norm=1.59, lr=6.56e-06, throughput=1627 tok/s
2025-11-24 17:44:14,429 - INFO - Epoch 1 Step 4790 (Global: 4790): loss=1.5547, ppl=4.73, grad_norm=1.70, lr=6.55e-06, throughput=1704 tok/s
2025-11-24 17:51:56,493 - INFO - Epoch 1 Step 4800 (Global: 4800): loss=1.7137, ppl=5.55, grad_norm=1.88, lr=6.53e-06, throughput=1039 tok/s
2025-11-24 17:58:09,076 - INFO - Epoch 1 Step 4810 (Global: 4810): loss=1.7015, ppl=5.48, grad_norm=1.14, lr=6.52e-06, throughput=1288 tok/s
2025-11-24 18:02:52,954 - INFO - Epoch 1 Step 4820 (Global: 4820): loss=1.6154, ppl=5.03, grad_norm=1.92, lr=6.50e-06, throughput=1691 tok/s
2025-11-24 18:08:17,622 - INFO - Epoch 1 Step 4830 (Global: 4830): loss=1.5996, ppl=4.95, grad_norm=1.47, lr=6.48e-06, throughput=1478 tok/s
2025-11-24 18:12:58,704 - INFO - Epoch 1 Step 4840 (Global: 4840): loss=1.7646, ppl=5.84, grad_norm=1.21, lr=6.47e-06, throughput=1708 tok/s
2025-11-24 18:18:32,301 - INFO - Epoch 1 Step 4850 (Global: 4850): loss=1.6626, ppl=5.27, grad_norm=1.12, lr=6.45e-06, throughput=1439 tok/s
2025-11-24 18:23:52,736 - INFO - Epoch 1 Step 4860 (Global: 4860): loss=1.6853, ppl=5.39, grad_norm=1.52, lr=6.44e-06, throughput=1498 tok/s
2025-11-24 18:29:07,701 - INFO - Epoch 1 Step 4870 (Global: 4870): loss=1.7637, ppl=5.83, grad_norm=1.21, lr=6.42e-06, throughput=1524 tok/s
2025-11-24 18:35:38,534 - INFO - Epoch 1 Step 4880 (Global: 4880): loss=1.3831, ppl=3.99, grad_norm=1.15, lr=6.40e-06, throughput=1228 tok/s
2025-11-24 18:45:29,610 - INFO - Epoch 1 Step 4890 (Global: 4890): loss=1.6838, ppl=5.39, grad_norm=1.51, lr=6.39e-06, throughput=812 tok/s
2025-11-24 18:51:08,000 - INFO - Epoch 1 Step 4900 (Global: 4900): loss=1.6571, ppl=5.24, grad_norm=1.37, lr=6.37e-06, throughput=1418 tok/s
2025-11-24 18:56:00,536 - INFO - Epoch 1 Step 4910 (Global: 4910): loss=1.6703, ppl=5.31, grad_norm=1.94, lr=6.35e-06, throughput=1641 tok/s
2025-11-24 19:01:10,305 - INFO - Epoch 1 Step 4920 (Global: 4920): loss=1.6507, ppl=5.21, grad_norm=1.88, lr=6.34e-06, throughput=1550 tok/s
2025-11-24 19:05:56,022 - INFO - Epoch 1 Step 4930 (Global: 4930): loss=1.5251, ppl=4.60, grad_norm=1.55, lr=6.32e-06, throughput=1680 tok/s
2025-11-24 19:10:36,606 - INFO - Epoch 1 Step 4940 (Global: 4940): loss=1.7246, ppl=5.61, grad_norm=1.54, lr=6.31e-06, throughput=1711 tok/s
2025-11-24 19:15:24,274 - INFO - Epoch 1 Step 4950 (Global: 4950): loss=1.7868, ppl=5.97, grad_norm=1.60, lr=6.29e-06, throughput=1669 tok/s
2025-11-24 19:21:19,820 - INFO - Epoch 1 Step 4960 (Global: 4960): loss=1.6828, ppl=5.38, grad_norm=1.81, lr=6.27e-06, throughput=1350 tok/s
2025-11-24 19:26:40,515 - INFO - Epoch 1 Step 4970 (Global: 4970): loss=1.4798, ppl=4.39, grad_norm=2.20, lr=6.26e-06, throughput=1497 tok/s
2025-11-24 19:32:32,408 - INFO - Epoch 1 Step 4980 (Global: 4980): loss=1.5995, ppl=4.95, grad_norm=1.47, lr=6.24e-06, throughput=1364 tok/s
2025-11-24 19:38:55,616 - INFO - Epoch 1 Step 4990 (Global: 4990): loss=1.4227, ppl=4.15, grad_norm=1.29, lr=6.23e-06, throughput=1253 tok/s
2025-11-24 19:44:19,138 - INFO - Epoch 1 Step 5000 (Global: 5000): loss=1.5556, ppl=4.74, grad_norm=1.41, lr=6.21e-06, throughput=1484 tok/s
2025-11-24 19:49:32,844 - INFO - Epoch 1 Step 5010 (Global: 5010): loss=1.7958, ppl=6.02, grad_norm=1.37, lr=6.19e-06, throughput=1530 tok/s
2025-11-24 19:54:37,768 - INFO - Epoch 1 Step 5020 (Global: 5020): loss=1.6070, ppl=4.99, grad_norm=1.14, lr=6.18e-06, throughput=1574 tok/s
2025-11-24 20:01:18,068 - INFO - Epoch 1 Step 5030 (Global: 5030): loss=1.3993, ppl=4.05, grad_norm=1.28, lr=6.16e-06, throughput=1199 tok/s
2025-11-24 20:08:01,868 - INFO - Epoch 1 Step 5040 (Global: 5040): loss=1.6521, ppl=5.22, grad_norm=1.72, lr=6.14e-06, throughput=1189 tok/s
2025-11-24 20:14:15,983 - INFO - Epoch 1 Step 5050 (Global: 5050): loss=1.8766, ppl=6.53, grad_norm=1.30, lr=6.13e-06, throughput=1283 tok/s
2025-11-24 20:19:47,351 - INFO - Epoch 1 Step 5060 (Global: 5060): loss=1.4195, ppl=4.14, grad_norm=1.38, lr=6.11e-06, throughput=1449 tok/s
2025-11-24 20:26:12,314 - INFO - Epoch 1 Step 5070 (Global: 5070): loss=1.5076, ppl=4.52, grad_norm=1.33, lr=6.10e-06, throughput=1247 tok/s
2025-11-24 20:32:10,147 - INFO - Epoch 1 Step 5080 (Global: 5080): loss=1.8086, ppl=6.10, grad_norm=1.33, lr=6.08e-06, throughput=1341 tok/s
2025-11-24 20:38:06,445 - INFO - Epoch 1 Step 5090 (Global: 5090): loss=1.7168, ppl=5.57, grad_norm=1.63, lr=6.06e-06, throughput=1347 tok/s
2025-11-24 20:44:35,995 - INFO - Epoch 1 Step 5100 (Global: 5100): loss=1.3610, ppl=3.90, grad_norm=1.21, lr=6.05e-06, throughput=1232 tok/s
2025-11-24 20:50:29,737 - INFO - Epoch 1 Step 5110 (Global: 5110): loss=1.5522, ppl=4.72, grad_norm=1.16, lr=6.03e-06, throughput=1357 tok/s
2025-11-24 20:55:36,507 - INFO - Epoch 1 Step 5120 (Global: 5120): loss=1.6613, ppl=5.27, grad_norm=1.71, lr=6.01e-06, throughput=1565 tok/s
2025-11-24 21:00:36,968 - INFO - Epoch 1 Step 5130 (Global: 5130): loss=1.6936, ppl=5.44, grad_norm=1.21, lr=6.00e-06, throughput=1598 tok/s
2025-11-24 21:05:26,721 - INFO - Epoch 1 Step 5140 (Global: 5140): loss=1.6708, ppl=5.32, grad_norm=1.38, lr=5.98e-06, throughput=1657 tok/s
2025-11-24 21:10:26,974 - INFO - Epoch 1 Step 5150 (Global: 5150): loss=1.4891, ppl=4.43, grad_norm=1.62, lr=5.96e-06, throughput=1599 tok/s
2025-11-24 21:18:16,200 - INFO - Epoch 1 Step 5160 (Global: 5160): loss=1.6576, ppl=5.25, grad_norm=1.14, lr=5.95e-06, throughput=1023 tok/s
2025-11-24 21:23:57,548 - INFO - Epoch 1 Step 5170 (Global: 5170): loss=1.5366, ppl=4.65, grad_norm=1.32, lr=5.93e-06, throughput=1406 tok/s
2025-11-24 21:30:13,363 - INFO - Epoch 1 Step 5180 (Global: 5180): loss=1.5580, ppl=4.75, grad_norm=2.41, lr=5.91e-06, throughput=1277 tok/s
2025-11-24 21:36:54,574 - INFO - Epoch 1 Step 5190 (Global: 5190): loss=1.5646, ppl=4.78, grad_norm=1.39, lr=5.90e-06, throughput=1196 tok/s
2025-11-24 21:43:33,418 - INFO - Epoch 1 Step 5200 (Global: 5200): loss=1.6517, ppl=5.22, grad_norm=1.87, lr=5.88e-06, throughput=1203 tok/s
2025-11-24 21:50:27,497 - INFO - Epoch 1 Step 5210 (Global: 5210): loss=1.8906, ppl=6.62, grad_norm=1.43, lr=5.87e-06, throughput=1159 tok/s
2025-11-24 21:57:08,741 - INFO - Epoch 1 Step 5220 (Global: 5220): loss=1.7470, ppl=5.74, grad_norm=1.31, lr=5.85e-06, throughput=1196 tok/s
2025-11-24 22:03:51,738 - INFO - Epoch 1 Step 5230 (Global: 5230): loss=1.5311, ppl=4.62, grad_norm=1.46, lr=5.83e-06, throughput=1191 tok/s
2025-11-24 22:08:43,515 - INFO - Epoch 1 Step 5240 (Global: 5240): loss=1.5680, ppl=4.80, grad_norm=1.16, lr=5.82e-06, throughput=1645 tok/s
2025-11-24 22:13:51,117 - INFO - Epoch 1 Step 5250 (Global: 5250): loss=1.7145, ppl=5.55, grad_norm=1.57, lr=5.80e-06, throughput=1560 tok/s
2025-11-24 22:18:56,787 - INFO - Epoch 1 Step 5260 (Global: 5260): loss=1.6998, ppl=5.47, grad_norm=1.27, lr=5.78e-06, throughput=1570 tok/s
2025-11-24 22:24:04,732 - INFO - Epoch 1 Step 5270 (Global: 5270): loss=1.7779, ppl=5.92, grad_norm=1.52, lr=5.77e-06, throughput=1559 tok/s
2025-11-24 22:28:59,541 - INFO - Epoch 1 Step 5280 (Global: 5280): loss=1.8119, ppl=6.12, grad_norm=1.16, lr=5.75e-06, throughput=1628 tok/s
2025-11-24 22:33:56,792 - INFO - Epoch 1 Step 5290 (Global: 5290): loss=1.6627, ppl=5.27, grad_norm=1.20, lr=5.73e-06, throughput=1615 tok/s
2025-11-26 10:54:56,045 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train_arrow', output_dir='outputs/production_vision_base_lm_20251123_003859', objective='lm', val_data_path='data/training/splits_510k/val_arrow', max_samples=None, vision_mode='base', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='\nFree OCR.', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=True, compression_target=None, conv_kernel=5, timestamp='20251123_003859', batch_size=8, gradient_accumulation_steps=6, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=False, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name='production_vision_base_lm_20251123_003859', resume_from_checkpoint='outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt', resume='outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt', init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=16, prefetch_factor=4, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=False, compile_mode='default', use_optimized_model=True, use_encoder_checkpointing=True, use_decoder_checkpointing=True, use_8bit_optimizer=False)
2025-11-26 10:54:56,045 - WARNING - --train_projection is deprecated. Use --train_encoder instead. Automatically setting --train_encoder=True.
2025-11-26 10:54:56,045 - INFO - Resuming training from checkpoint: outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-26 10:54:56,045 - INFO - Continuing outputs in directory: outputs/production_vision_base_lm_20251123_003859
2025-11-26 10:54:56,045 - INFO - Using custom vision prompt: ''\nFree OCR.''
2025-11-26 10:54:56,046 - INFO - Setting random seed: 42
2025-11-26 10:54:56,855 - INFO - Peeking checkpoint metadata from outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-26 10:55:06,973 - INFO - Checkpoint metadata: epoch=0, batch_idx=23999, global_step=4000
2025-11-26 10:55:06,974 - INFO - W&B run ID: 7aj57hve
2025-11-26 10:55:07,054 - INFO - Checkpoint has WandB run ID: 7aj57hve
2025-11-26 10:55:07,055 - INFO - Creating fresh WandB run (not resuming to avoid stale data)
2025-11-26 10:55:08,268 - INFO - Initialized W&B run: vision-compression-2/production_vision_base_lm_20251123_003859 (ID: xyk0cc3f)
2025-11-26 10:55:08,268 - INFO - Loading model and tokenizer...
2025-11-26 10:55:18,545 - INFO - Enabling decoder gradient checkpointing...
2025-11-26 10:55:18,554 - INFO - ✓ Decoder checkpointing enabled for 12 transformer layers
2025-11-26 10:55:18,554 - INFO - Expected: ~30-50% activation memory reduction, ~15-20% compute overhead
2025-11-26 10:55:18,586 - INFO - Created Vision Compression trainer (mode: base)
2025-11-26 10:55:18,587 - INFO - Training objective: lm
2025-11-26 10:55:18,624 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640
2025-11-26 10:55:18,624 - INFO - Loading training data from data/training/splits_510k/train_arrow
2025-11-26 10:55:18,624 - INFO - Detected Arrow format: data/training/splits_510k/train_arrow
2025-11-26 10:55:18,625 - INFO - Loading Arrow dataset from data/training/splits_510k/train_arrow (memory-mapped)
2025-11-26 10:55:18,677 - INFO - Loaded 500,000 samples from data/training/splits_510k/train_arrow (memory-mapped)
2025-11-26 10:55:18,677 - INFO - Vision mode: base (273 tokens, 1024x1024)
2025-11-26 10:55:18,677 - INFO - Mid-epoch resume: skipping first 192000 samples at sampler level (batch 24000)
2025-11-26 10:55:18,774 - INFO - Loading validation data from data/training/splits_510k/val_arrow
2025-11-26 10:55:18,774 - INFO - Detected Arrow format: data/training/splits_510k/val_arrow
2025-11-26 10:55:18,775 - INFO - Loading Arrow dataset from data/training/splits_510k/val_arrow (memory-mapped)
2025-11-26 10:55:18,783 - INFO - Loaded 10,000 samples from data/training/splits_510k/val_arrow (memory-mapped)
2025-11-26 10:55:18,783 - INFO - Vision mode: base (273 tokens, 1024x1024)
2025-11-26 10:55:18,813 - INFO - Created AdamW optimizer with differential LR:
Encoder: 474 param tensors @ lr=1e-05
Decoder: 2236 param tensors @ lr=0.0001
Fused kernels: True
2025-11-26 10:55:18,814 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417
2025-11-26 10:55:18,822 - INFO - Logged optimizer config to W&B: type=adamw_fused, memory=24.86GB
2025-11-26 10:55:18,822 - INFO - Loading checkpoint state (model/optimizer/scheduler) from outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-26 10:55:30,142 - INFO - ✓ Successfully loaded optimizer state from checkpoint
2025-11-26 10:55:30,143 - INFO - ✓ Successfully loaded scheduler state from checkpoint
2025-11-26 10:55:30,149 - WARNING - Failed to restore RNG states: RNG state must be a torch.ByteTensor. Continuing with current RNG state.
2025-11-26 10:55:30,162 - INFO - Restored training state: epoch=0, batch_idx=23999, global_step=4000, best_val_loss=1.6909
2025-11-26 10:55:30,163 - INFO - Resuming mid-epoch: will skip first 24000 batches of epoch 0
2025-11-26 10:55:30,164 - INFO - Starting training loop...
2025-11-26 10:55:30,164 - INFO -
======================================================================
2025-11-26 10:55:30,164 - INFO - Epoch 1/1
2025-11-26 10:55:30,164 - INFO - ======================================================================
2025-11-26 10:55:42,349 - WARNING - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`transformers.
2025-11-26 10:55:43,551 - INFO - Effective context tokens (per-sample): 278 | Compression ratio: 3.60x
2025-11-26 10:55:43,552 - INFO - Target tokens per sample: 1000
2025-11-26 10:59:07,778 - INFO - Epoch 1 Step 10 (Global: 4010): loss=1.5953, ppl=4.93, grad_norm=1.31, lr=7.72e-06, throughput=2206 tok/s
2025-11-26 11:02:31,886 - INFO - Epoch 1 Step 20 (Global: 4020): loss=1.8728, ppl=6.51, grad_norm=1.68, lr=7.71e-06, throughput=2352 tok/s
2025-11-26 11:05:59,802 - INFO - Epoch 1 Step 30 (Global: 4030): loss=1.5460, ppl=4.69, grad_norm=1.20, lr=7.70e-06, throughput=2309 tok/s
2025-11-26 11:09:26,300 - INFO - Epoch 1 Step 40 (Global: 4040): loss=1.7038, ppl=5.49, grad_norm=2.27, lr=7.68e-06, throughput=2324 tok/s
2025-11-26 11:12:51,182 - INFO - Epoch 1 Step 50 (Global: 4050): loss=1.4818, ppl=4.40, grad_norm=1.29, lr=7.67e-06, throughput=2343 tok/s
2025-11-26 11:16:16,527 - INFO - Epoch 1 Step 60 (Global: 4060): loss=1.7971, ppl=6.03, grad_norm=1.83, lr=7.65e-06, throughput=2338 tok/s
2025-11-26 11:19:40,409 - INFO - Epoch 1 Step 70 (Global: 4070): loss=1.5431, ppl=4.68, grad_norm=1.60, lr=7.64e-06, throughput=2354 tok/s
2025-11-26 11:23:04,244 - INFO - Epoch 1 Step 80 (Global: 4080): loss=1.7389, ppl=5.69, grad_norm=1.27, lr=7.62e-06, throughput=2355 tok/s
2025-11-26 11:26:27,304 - INFO - Epoch 1 Step 90 (Global: 4090): loss=1.7134, ppl=5.55, grad_norm=1.70, lr=7.61e-06, throughput=2364 tok/s
2025-11-26 11:29:51,764 - INFO - Epoch 1 Step 100 (Global: 4100): loss=1.6501, ppl=5.21, grad_norm=1.27, lr=7.60e-06, throughput=2348 tok/s
2025-11-26 11:33:15,553 - INFO - Epoch 1 Step 110 (Global: 4110): loss=1.5634, ppl=4.77, grad_norm=1.57, lr=7.58e-06, throughput=2355 tok/s
2025-11-26 11:36:41,822 - INFO - Epoch 1 Step 120 (Global: 4120): loss=1.8531, ppl=6.38, grad_norm=2.45, lr=7.57e-06, throughput=2327 tok/s
2025-11-26 11:40:05,840 - INFO - Epoch 1 Step 130 (Global: 4130): loss=1.7781, ppl=5.92, grad_norm=1.40, lr=7.55e-06, throughput=2353 tok/s
2025-11-26 11:43:31,651 - INFO - Epoch 1 Step 140 (Global: 4140): loss=1.6938, ppl=5.44, grad_norm=1.39, lr=7.54e-06, throughput=2332 tok/s
2025-11-26 11:47:00,799 - INFO - Epoch 1 Step 150 (Global: 4150): loss=1.7069, ppl=5.51, grad_norm=2.00, lr=7.52e-06, throughput=2295 tok/s
2025-11-26 11:50:26,371 - INFO - Epoch 1 Step 160 (Global: 4160): loss=1.4696, ppl=4.35, grad_norm=1.42, lr=7.51e-06, throughput=2335 tok/s
2025-11-26 11:53:55,718 - INFO - Epoch 1 Step 170 (Global: 4170): loss=1.8436, ppl=6.32, grad_norm=1.55, lr=7.49e-06, throughput=2293 tok/s
2025-11-26 11:57:21,570 - INFO - Epoch 1 Step 180 (Global: 4180): loss=1.7474, ppl=5.74, grad_norm=1.20, lr=7.48e-06, throughput=2332 tok/s
2025-11-26 12:00:47,154 - INFO - Epoch 1 Step 190 (Global: 4190): loss=1.4784, ppl=4.39, grad_norm=1.21, lr=7.47e-06, throughput=2335 tok/s
2025-11-26 12:04:10,015 - INFO - Epoch 1 Step 200 (Global: 4200): loss=1.5963, ppl=4.93, grad_norm=1.31, lr=7.45e-06, throughput=2366 tok/s
2025-11-26 12:07:32,840 - INFO - Epoch 1 Step 210 (Global: 4210): loss=1.6186, ppl=5.05, grad_norm=1.99, lr=7.44e-06, throughput=2367 tok/s
2025-11-26 12:10:56,683 - INFO - Epoch 1 Step 220 (Global: 4220): loss=1.7733, ppl=5.89, grad_norm=1.88, lr=7.42e-06, throughput=2355 tok/s
2025-11-26 12:14:19,344 - INFO - Epoch 1 Step 230 (Global: 4230): loss=1.7065, ppl=5.51, grad_norm=3.75, lr=7.41e-06, throughput=2369 tok/s
2025-11-26 12:17:44,744 - INFO - Epoch 1 Step 240 (Global: 4240): loss=1.7663, ppl=5.85, grad_norm=1.66, lr=7.39e-06, throughput=2337 tok/s
2025-11-26 12:21:09,840 - INFO - Epoch 1 Step 250 (Global: 4250): loss=1.7145, ppl=5.55, grad_norm=1.33, lr=7.38e-06, throughput=2340 tok/s
2025-11-26 12:24:36,277 - INFO - Epoch 1 Step 260 (Global: 4260): loss=1.6920, ppl=5.43, grad_norm=1.33, lr=7.36e-06, throughput=2325 tok/s
2025-11-26 12:28:01,361 - INFO - Epoch 1 Step 270 (Global: 4270): loss=1.6231, ppl=5.07, grad_norm=1.87, lr=7.35e-06, throughput=2341 tok/s
2025-11-26 12:31:24,607 - INFO - Epoch 1 Step 280 (Global: 4280): loss=1.5611, ppl=4.76, grad_norm=1.35, lr=7.33e-06, throughput=2362 tok/s
2025-11-26 12:34:47,625 - INFO - Epoch 1 Step 290 (Global: 4290): loss=1.5658, ppl=4.79, grad_norm=2.20, lr=7.32e-06, throughput=2364 tok/s
2025-11-26 12:38:11,804 - INFO - Epoch 1 Step 300 (Global: 4300): loss=1.4066, ppl=4.08, grad_norm=1.23, lr=7.30e-06, throughput=2351 tok/s
2025-11-26 12:41:35,869 - INFO - Epoch 1 Step 310 (Global: 4310): loss=1.3833, ppl=3.99, grad_norm=3.17, lr=7.29e-06, throughput=2352 tok/s
2025-11-26 12:45:02,975 - INFO - Epoch 1 Step 320 (Global: 4320): loss=1.7036, ppl=5.49, grad_norm=1.17, lr=7.27e-06, throughput=2318 tok/s
2025-11-26 12:48:29,156 - INFO - Epoch 1 Step 330 (Global: 4330): loss=1.6021, ppl=4.96, grad_norm=1.17, lr=7.26e-06, throughput=2328 tok/s
2025-11-26 12:51:55,837 - INFO - Epoch 1 Step 340 (Global: 4340): loss=1.5165, ppl=4.56, grad_norm=1.28, lr=7.24e-06, throughput=2322 tok/s
2025-11-26 12:55:22,378 - INFO - Epoch 1 Step 350 (Global: 4350): loss=1.6196, ppl=5.05, grad_norm=1.38, lr=7.23e-06, throughput=2324 tok/s
2025-11-26 12:58:49,366 - INFO - Epoch 1 Step 360 (Global: 4360): loss=1.4258, ppl=4.16, grad_norm=1.21, lr=7.21e-06, throughput=2319 tok/s
2025-11-26 13:02:14,639 - INFO - Epoch 1 Step 370 (Global: 4370): loss=1.7505, ppl=5.76, grad_norm=1.83, lr=7.20e-06, throughput=2338 tok/s
2025-11-26 13:05:40,540 - INFO - Epoch 1 Step 380 (Global: 4380): loss=1.7673, ppl=5.85, grad_norm=1.40, lr=7.18e-06, throughput=2331 tok/s
2025-11-26 13:09:04,125 - INFO - Epoch 1 Step 390 (Global: 4390): loss=1.7419, ppl=5.71, grad_norm=1.66, lr=7.17e-06, throughput=2358 tok/s
2025-11-26 13:12:27,820 - INFO - Epoch 1 Step 400 (Global: 4400): loss=1.5554, ppl=4.74, grad_norm=2.30, lr=7.15e-06, throughput=2356 tok/s
2025-11-26 13:15:52,785 - INFO - Epoch 1 Step 410 (Global: 4410): loss=1.7893, ppl=5.99, grad_norm=1.34, lr=7.14e-06, throughput=2342 tok/s
2025-11-26 13:19:17,942 - INFO - Epoch 1 Step 420 (Global: 4420): loss=1.8293, ppl=6.23, grad_norm=1.30, lr=7.12e-06, throughput=2340 tok/s
2025-11-26 13:22:47,948 - INFO - Epoch 1 Step 430 (Global: 4430): loss=1.6393, ppl=5.15, grad_norm=1.55, lr=7.11e-06, throughput=2286 tok/s
2025-11-26 13:26:13,714 - INFO - Epoch 1 Step 440 (Global: 4440): loss=1.9649, ppl=7.13, grad_norm=1.34, lr=7.09e-06, throughput=2333 tok/s
2025-11-26 13:29:39,289 - INFO - Epoch 1 Step 450 (Global: 4450): loss=1.5512, ppl=4.72, grad_norm=1.87, lr=7.08e-06, throughput=2335 tok/s
2025-11-26 13:33:04,981 - INFO - Epoch 1 Step 460 (Global: 4460): loss=1.8276, ppl=6.22, grad_norm=2.03, lr=7.06e-06, throughput=2334 tok/s
2025-11-26 13:36:30,668 - INFO - Epoch 1 Step 470 (Global: 4470): loss=1.6299, ppl=5.10, grad_norm=1.45, lr=7.05e-06, throughput=2334 tok/s
2025-11-26 13:39:57,110 - INFO - Epoch 1 Step 480 (Global: 4480): loss=1.8573, ppl=6.41, grad_norm=2.08, lr=7.03e-06, throughput=2325 tok/s
2025-11-26 13:43:22,460 - INFO - Epoch 1 Step 490 (Global: 4490): loss=1.7869, ppl=5.97, grad_norm=2.23, lr=7.02e-06, throughput=2337 tok/s
2025-11-26 13:46:47,165 - INFO - Epoch 1 Step 500 (Global: 4500): loss=1.7727, ppl=5.89, grad_norm=1.58, lr=7.00e-06, throughput=2345 tok/s
2025-11-26 13:50:13,281 - INFO - Epoch 1 Step 510 (Global: 4510): loss=1.6357, ppl=5.13, grad_norm=1.27, lr=6.99e-06, throughput=2329 tok/s
2025-11-26 13:53:39,484 - INFO - Epoch 1 Step 520 (Global: 4520): loss=1.6233, ppl=5.07, grad_norm=1.45, lr=6.97e-06, throughput=2328 tok/s
2025-11-26 13:57:04,548 - INFO - Epoch 1 Step 530 (Global: 4530): loss=1.6000, ppl=4.95, grad_norm=1.08, lr=6.96e-06, throughput=2341 tok/s
2025-11-26 14:00:29,863 - INFO - Epoch 1 Step 540 (Global: 4540): loss=1.4770, ppl=4.38, grad_norm=1.20, lr=6.94e-06, throughput=2338 tok/s
2025-11-26 14:03:55,805 - INFO - Epoch 1 Step 550 (Global: 4550): loss=1.6530, ppl=5.22, grad_norm=1.20, lr=6.92e-06, throughput=2331 tok/s
2025-11-26 14:07:23,587 - INFO - Epoch 1 Step 560 (Global: 4560): loss=1.6620, ppl=5.27, grad_norm=1.32, lr=6.91e-06, throughput=2310 tok/s
2025-11-26 14:10:54,057 - INFO - Epoch 1 Step 570 (Global: 4570): loss=1.5083, ppl=4.52, grad_norm=1.50, lr=6.89e-06, throughput=2281 tok/s
2025-11-26 14:14:25,871 - INFO - Epoch 1 Step 580 (Global: 4580): loss=1.4806, ppl=4.40, grad_norm=1.23, lr=6.88e-06, throughput=2266 tok/s
2025-11-26 14:17:53,574 - INFO - Epoch 1 Step 590 (Global: 4590): loss=1.6875, ppl=5.41, grad_norm=1.45, lr=6.86e-06, throughput=2311 tok/s
2025-11-26 14:21:18,814 - INFO - Epoch 1 Step 600 (Global: 4600): loss=1.6815, ppl=5.37, grad_norm=1.84, lr=6.85e-06, throughput=2339 tok/s
2025-11-26 14:24:44,351 - INFO - Epoch 1 Step 610 (Global: 4610): loss=1.5520, ppl=4.72, grad_norm=1.35, lr=6.83e-06, throughput=2335 tok/s
2025-11-26 14:28:09,979 - INFO - Epoch 1 Step 620 (Global: 4620): loss=1.8772, ppl=6.54, grad_norm=1.42, lr=6.82e-06, throughput=2334 tok/s
2025-11-26 14:31:35,328 - INFO - Epoch 1 Step 630 (Global: 4630): loss=1.6348, ppl=5.13, grad_norm=1.15, lr=6.80e-06, throughput=2338 tok/s
2025-11-26 14:35:02,599 - INFO - Epoch 1 Step 640 (Global: 4640): loss=1.7853, ppl=5.96, grad_norm=3.75, lr=6.78e-06, throughput=2316 tok/s
2025-11-26 14:38:31,391 - INFO - Epoch 1 Step 650 (Global: 4650): loss=1.4731, ppl=4.36, grad_norm=1.62, lr=6.77e-06, throughput=2299 tok/s
2025-11-26 14:41:58,185 - INFO - Epoch 1 Step 660 (Global: 4660): loss=1.6962, ppl=5.45, grad_norm=1.24, lr=6.75e-06, throughput=2321 tok/s
2025-11-26 14:45:23,551 - INFO - Epoch 1 Step 670 (Global: 4670): loss=1.5454, ppl=4.69, grad_norm=1.92, lr=6.74e-06, throughput=2337 tok/s
2025-11-26 14:48:49,086 - INFO - Epoch 1 Step 680 (Global: 4680): loss=1.4202, ppl=4.14, grad_norm=1.95, lr=6.72e-06, throughput=2335 tok/s
2025-11-26 14:52:13,340 - INFO - Epoch 1 Step 690 (Global: 4690): loss=1.6956, ppl=5.45, grad_norm=1.23, lr=6.71e-06, throughput=2350 tok/s
2025-11-26 14:55:37,045 - INFO - Epoch 1 Step 700 (Global: 4700): loss=1.6405, ppl=5.16, grad_norm=7.62, lr=6.69e-06, throughput=2356 tok/s
2025-11-26 14:59:00,853 - INFO - Epoch 1 Step 710 (Global: 4710): loss=1.5683, ppl=4.80, grad_norm=1.45, lr=6.67e-06, throughput=2355 tok/s
2025-11-26 15:02:24,207 - INFO - Epoch 1 Step 720 (Global: 4720): loss=1.6794, ppl=5.36, grad_norm=1.38, lr=6.66e-06, throughput=2360 tok/s
2025-11-26 15:05:49,302 - INFO - Epoch 1 Step 730 (Global: 4730): loss=1.8037, ppl=6.07, grad_norm=1.99, lr=6.64e-06, throughput=2340 tok/s
2025-11-26 15:09:14,502 - INFO - Epoch 1 Step 740 (Global: 4740): loss=1.8448, ppl=6.33, grad_norm=1.72, lr=6.63e-06, throughput=2339 tok/s
2025-11-26 15:12:41,048 - INFO - Epoch 1 Step 750 (Global: 4750): loss=1.7020, ppl=5.49, grad_norm=2.50, lr=6.61e-06, throughput=2324 tok/s
2025-11-26 15:16:07,516 - INFO - Epoch 1 Step 760 (Global: 4760): loss=1.4959, ppl=4.46, grad_norm=2.64, lr=6.60e-06, throughput=2325 tok/s
2025-11-26 15:19:34,658 - INFO - Epoch 1 Step 770 (Global: 4770): loss=1.8787, ppl=6.54, grad_norm=1.34, lr=6.58e-06, throughput=2317 tok/s
2025-11-26 15:23:01,827 - INFO - Epoch 1 Step 780 (Global: 4780): loss=1.6085, ppl=5.00, grad_norm=1.30, lr=6.56e-06, throughput=2317 tok/s
2025-11-26 15:26:28,119 - INFO - Epoch 1 Step 790 (Global: 4790): loss=1.5552, ppl=4.74, grad_norm=1.14, lr=6.55e-06, throughput=2327 tok/s
2025-11-26 15:29:54,797 - INFO - Epoch 1 Step 800 (Global: 4800): loss=1.7161, ppl=5.56, grad_norm=1.24, lr=6.53e-06, throughput=2322 tok/s
2025-11-26 15:33:22,265 - INFO - Epoch 1 Step 810 (Global: 4810): loss=1.7006, ppl=5.48, grad_norm=1.66, lr=6.52e-06, throughput=2314 tok/s
2025-11-26 15:36:50,145 - INFO - Epoch 1 Step 820 (Global: 4820): loss=1.6117, ppl=5.01, grad_norm=1.16, lr=6.50e-06, throughput=2309 tok/s
2025-11-26 15:40:17,353 - INFO - Epoch 1 Step 830 (Global: 4830): loss=1.5950, ppl=4.93, grad_norm=1.41, lr=6.48e-06, throughput=2317 tok/s
2025-11-26 15:43:46,112 - INFO - Epoch 1 Step 840 (Global: 4840): loss=1.7649, ppl=5.84, grad_norm=2.03, lr=6.47e-06, throughput=2299 tok/s
2025-11-26 15:47:13,775 - INFO - Epoch 1 Step 850 (Global: 4850): loss=1.6651, ppl=5.29, grad_norm=1.27, lr=6.45e-06, throughput=2311 tok/s
2025-11-26 15:50:44,101 - INFO - Epoch 1 Step 860 (Global: 4860): loss=1.6814, ppl=5.37, grad_norm=1.56, lr=6.44e-06, throughput=2282 tok/s
2025-11-26 15:54:11,667 - INFO - Epoch 1 Step 870 (Global: 4870): loss=1.7633, ppl=5.83, grad_norm=1.46, lr=6.42e-06, throughput=2313 tok/s
2025-11-26 15:57:38,483 - INFO - Epoch 1 Step 880 (Global: 4880): loss=1.3826, ppl=3.99, grad_norm=1.41, lr=6.40e-06, throughput=2321 tok/s
2025-11-26 16:01:04,604 - INFO - Epoch 1 Step 890 (Global: 4890): loss=1.6819, ppl=5.38, grad_norm=1.57, lr=6.39e-06, throughput=2329 tok/s
2025-11-26 16:04:30,047 - INFO - Epoch 1 Step 900 (Global: 4900): loss=1.6564, ppl=5.24, grad_norm=1.41, lr=6.37e-06, throughput=2336 tok/s
2025-11-26 16:07:55,241 - INFO - Epoch 1 Step 910 (Global: 4910): loss=1.6726, ppl=5.33, grad_norm=1.69, lr=6.35e-06, throughput=2339 tok/s
2025-11-26 16:11:23,239 - INFO - Epoch 1 Step 920 (Global: 4920): loss=1.6469, ppl=5.19, grad_norm=1.34, lr=6.34e-06, throughput=2308 tok/s
2025-11-26 16:14:52,016 - INFO - Epoch 1 Step 930 (Global: 4930): loss=1.5233, ppl=4.59, grad_norm=1.27, lr=6.32e-06, throughput=2299 tok/s
2025-11-26 16:18:19,601 - INFO - Epoch 1 Step 940 (Global: 4940): loss=1.7237, ppl=5.61, grad_norm=1.59, lr=6.31e-06, throughput=2312 tok/s
2025-11-26 16:21:48,730 - INFO - Epoch 1 Step 950 (Global: 4950): loss=1.7854, ppl=5.96, grad_norm=1.20, lr=6.29e-06, throughput=2295 tok/s
2025-11-26 16:25:17,220 - INFO - Epoch 1 Step 960 (Global: 4960): loss=1.6801, ppl=5.37, grad_norm=1.30, lr=6.27e-06, throughput=2302 tok/s
2025-11-26 16:28:44,496 - INFO - Epoch 1 Step 970 (Global: 4970): loss=1.4785, ppl=4.39, grad_norm=2.33, lr=6.26e-06, throughput=2316 tok/s
2025-11-26 16:32:10,538 - INFO - Epoch 1 Step 980 (Global: 4980): loss=1.5979, ppl=4.94, grad_norm=1.90, lr=6.24e-06, throughput=2330 tok/s
2025-11-26 16:35:36,640 - INFO - Epoch 1 Step 990 (Global: 4990): loss=1.4211, ppl=4.14, grad_norm=1.29, lr=6.23e-06, throughput=2329 tok/s
2025-11-26 16:39:06,513 - INFO - Epoch 1 Step 1000 (Global: 5000): loss=1.5556, ppl=4.74, grad_norm=1.55, lr=6.21e-06, throughput=2287 tok/s
2025-11-26 16:42:36,734 - INFO - Epoch 1 Step 1010 (Global: 5010): loss=1.7957, ppl=6.02, grad_norm=1.61, lr=6.19e-06, throughput=2283 tok/s
2025-11-26 16:46:07,831 - INFO - Epoch 1 Step 1020 (Global: 5020): loss=1.6076, ppl=4.99, grad_norm=1.27, lr=6.18e-06, throughput=2274 tok/s
2025-11-26 16:49:38,670 - INFO - Epoch 1 Step 1030 (Global: 5030): loss=1.4002, ppl=4.06, grad_norm=1.34, lr=6.16e-06, throughput=2277 tok/s
2025-11-26 16:53:08,317 - INFO - Epoch 1 Step 1040 (Global: 5040): loss=1.6525, ppl=5.22, grad_norm=1.30, lr=6.14e-06, throughput=2290 tok/s
2025-11-26 16:56:36,748 - INFO - Epoch 1 Step 1050 (Global: 5050): loss=1.8744, ppl=6.52, grad_norm=1.47, lr=6.13e-06, throughput=2303 tok/s
2025-11-26 17:00:05,207 - INFO - Epoch 1 Step 1060 (Global: 5060): loss=1.4214, ppl=4.14, grad_norm=1.38, lr=6.11e-06, throughput=2303 tok/s
2025-11-26 17:03:35,080 - INFO - Epoch 1 Step 1070 (Global: 5070): loss=1.5064, ppl=4.51, grad_norm=1.77, lr=6.10e-06, throughput=2287 tok/s
2025-11-26 17:07:03,031 - INFO - Epoch 1 Step 1080 (Global: 5080): loss=1.8061, ppl=6.09, grad_norm=1.41, lr=6.08e-06, throughput=2308 tok/s
2025-11-26 17:10:31,517 - INFO - Epoch 1 Step 1090 (Global: 5090): loss=1.7182, ppl=5.57, grad_norm=1.51, lr=6.06e-06, throughput=2302 tok/s
2025-11-26 17:14:02,333 - INFO - Epoch 1 Step 1100 (Global: 5100): loss=1.3599, ppl=3.90, grad_norm=1.27, lr=6.05e-06, throughput=2277 tok/s
2025-11-26 17:17:31,325 - INFO - Epoch 1 Step 1110 (Global: 5110): loss=1.5500, ppl=4.71, grad_norm=1.77, lr=6.03e-06, throughput=2297 tok/s
2025-11-26 17:20:59,574 - INFO - Epoch 1 Step 1120 (Global: 5120): loss=1.6610, ppl=5.26, grad_norm=1.23, lr=6.01e-06, throughput=2305 tok/s
2025-11-26 17:24:28,770 - INFO - Epoch 1 Step 1130 (Global: 5130): loss=1.6901, ppl=5.42, grad_norm=1.49, lr=6.00e-06, throughput=2295 tok/s
2025-11-26 17:27:57,204 - INFO - Epoch 1 Step 1140 (Global: 5140): loss=1.6691, ppl=5.31, grad_norm=1.83, lr=5.98e-06, throughput=2303 tok/s
2025-11-26 17:31:23,345 - INFO - Epoch 1 Step 1150 (Global: 5150): loss=1.4859, ppl=4.42, grad_norm=1.36, lr=5.96e-06, throughput=2329 tok/s
2025-11-26 17:34:49,376 - INFO - Epoch 1 Step 1160 (Global: 5160): loss=1.6571, ppl=5.24, grad_norm=1.45, lr=5.95e-06, throughput=2330 tok/s
2025-11-26 17:38:14,889 - INFO - Epoch 1 Step 1170 (Global: 5170): loss=1.5316, ppl=4.63, grad_norm=1.25, lr=5.93e-06, throughput=2336 tok/s
2025-11-26 17:41:39,399 - INFO - Epoch 1 Step 1180 (Global: 5180): loss=1.5583, ppl=4.75, grad_norm=2.02, lr=5.91e-06, throughput=2347 tok/s
2025-11-26 17:45:02,846 - INFO - Epoch 1 Step 1190 (Global: 5190): loss=1.5618, ppl=4.77, grad_norm=1.83, lr=5.90e-06, throughput=2359 tok/s
2025-11-26 17:48:26,259 - INFO - Epoch 1 Step 1200 (Global: 5200): loss=1.6518, ppl=5.22, grad_norm=1.17, lr=5.88e-06, throughput=2360 tok/s
2025-11-26 17:51:49,658 - INFO - Epoch 1 Step 1210 (Global: 5210): loss=1.8940, ppl=6.65, grad_norm=1.93, lr=5.87e-06, throughput=2360 tok/s
2025-11-26 17:55:13,466 - INFO - Epoch 1 Step 1220 (Global: 5220): loss=1.7546, ppl=5.78, grad_norm=1.72, lr=5.85e-06, throughput=2355 tok/s
2025-11-26 17:58:36,883 - INFO - Epoch 1 Step 1230 (Global: 5230): loss=1.5336, ppl=4.63, grad_norm=1.57, lr=5.83e-06, throughput=2360 tok/s
2025-11-26 18:02:00,745 - INFO - Epoch 1 Step 1240 (Global: 5240): loss=1.5682, ppl=4.80, grad_norm=1.51, lr=5.82e-06, throughput=2355 tok/s
2025-11-26 18:05:24,458 - INFO - Epoch 1 Step 1250 (Global: 5250): loss=1.7219, ppl=5.60, grad_norm=1.59, lr=5.80e-06, throughput=2356 tok/s
2025-11-26 18:08:48,646 - INFO - Epoch 1 Step 1260 (Global: 5260): loss=1.6992, ppl=5.47, grad_norm=2.08, lr=5.78e-06, throughput=2351 tok/s
2025-11-26 18:12:12,446 - INFO - Epoch 1 Step 1270 (Global: 5270): loss=1.7772, ppl=5.91, grad_norm=1.84, lr=5.77e-06, throughput=2355 tok/s
2025-11-26 18:15:36,246 - INFO - Epoch 1 Step 1280 (Global: 5280): loss=1.8137, ppl=6.13, grad_norm=1.73, lr=5.75e-06, throughput=2355 tok/s
2025-11-26 18:18:59,585 - INFO - Epoch 1 Step 1290 (Global: 5290): loss=1.6611, ppl=5.26, grad_norm=1.70, lr=5.73e-06, throughput=2361 tok/s
2025-11-26 18:22:24,341 - INFO - Epoch 1 Step 1300 (Global: 5300): loss=1.4779, ppl=4.38, grad_norm=1.27, lr=5.72e-06, throughput=2344 tok/s
2025-11-26 18:25:47,278 - INFO - Epoch 1 Step 1310 (Global: 5310): loss=1.4563, ppl=4.29, grad_norm=3.03, lr=5.70e-06, throughput=2365 tok/s
2025-11-26 18:29:10,610 - INFO - Epoch 1 Step 1320 (Global: 5320): loss=1.7373, ppl=5.68, grad_norm=16.75, lr=5.68e-06, throughput=2361 tok/s
2025-11-26 18:32:34,359 - INFO - Epoch 1 Step 1330 (Global: 5330): loss=1.6035, ppl=4.97, grad_norm=1.48, lr=5.67e-06, throughput=2356 tok/s
2025-11-26 18:35:57,773 - INFO - Epoch 1 Step 1340 (Global: 5340): loss=1.5352, ppl=4.64, grad_norm=2.11, lr=5.65e-06, throughput=2360 tok/s
2025-11-26 18:39:20,236 - INFO - Epoch 1 Step 1350 (Global: 5350): loss=1.4879, ppl=4.43, grad_norm=2.11, lr=5.63e-06, throughput=2371 tok/s
2025-11-26 18:42:43,156 - INFO - Epoch 1 Step 1360 (Global: 5360): loss=1.6878, ppl=5.41, grad_norm=1.55, lr=5.62e-06, throughput=2365 tok/s
2025-11-26 18:46:09,386 - INFO - Epoch 1 Step 1370 (Global: 5370): loss=1.8401, ppl=6.30, grad_norm=1.52, lr=5.60e-06, throughput=2328 tok/s
2025-11-26 18:49:36,563 - INFO - Epoch 1 Step 1380 (Global: 5380): loss=1.6474, ppl=5.19, grad_norm=1.58, lr=5.58e-06, throughput=2317 tok/s
2025-11-26 18:53:05,103 - INFO - Epoch 1 Step 1390 (Global: 5390): loss=1.8087, ppl=6.10, grad_norm=1.57, lr=5.57e-06, throughput=2302 tok/s
2025-11-26 18:56:33,823 - INFO - Epoch 1 Step 1400 (Global: 5400): loss=1.4765, ppl=4.38, grad_norm=2.31, lr=5.55e-06, throughput=2300 tok/s
2025-11-26 19:00:01,474 - INFO - Epoch 1 Step 1410 (Global: 5410): loss=1.6024, ppl=4.96, grad_norm=2.62, lr=5.53e-06, throughput=2312 tok/s
2025-11-26 19:03:30,966 - INFO - Epoch 1 Step 1420 (Global: 5420): loss=1.5479, ppl=4.70, grad_norm=2.50, lr=5.52e-06, throughput=2291 tok/s
2025-11-26 19:07:00,061 - INFO - Epoch 1 Step 1430 (Global: 5430): loss=1.6556, ppl=5.24, grad_norm=1.23, lr=5.50e-06, throughput=2296 tok/s
2025-11-26 19:10:28,439 - INFO - Epoch 1 Step 1440 (Global: 5440): loss=1.9449, ppl=6.99, grad_norm=1.52, lr=5.48e-06, throughput=2304 tok/s
2025-11-26 19:13:58,448 - INFO - Epoch 1 Step 1450 (Global: 5450): loss=1.7501, ppl=5.76, grad_norm=1.38, lr=5.47e-06, throughput=2286 tok/s
2025-11-26 19:17:24,719 - INFO - Epoch 1 Step 1460 (Global: 5460): loss=1.6826, ppl=5.38, grad_norm=1.16, lr=5.45e-06, throughput=2327 tok/s
2025-11-26 19:20:52,595 - INFO - Epoch 1 Step 1470 (Global: 5470): loss=1.5204, ppl=4.57, grad_norm=1.26, lr=5.43e-06, throughput=2309 tok/s
2025-11-26 19:24:18,631 - INFO - Epoch 1 Step 1480 (Global: 5480): loss=1.6249, ppl=5.08, grad_norm=1.38, lr=5.42e-06, throughput=2330 tok/s
2025-11-26 19:27:42,845 - INFO - Epoch 1 Step 1490 (Global: 5490): loss=1.5220, ppl=4.58, grad_norm=1.37, lr=5.40e-06, throughput=2350 tok/s
2025-11-26 19:31:08,476 - INFO - Epoch 1 Step 1500 (Global: 5500): loss=1.6931, ppl=5.44, grad_norm=3.03, lr=5.38e-06, throughput=2334 tok/s
2025-11-26 19:34:33,112 - INFO - Epoch 1 Step 1510 (Global: 5510): loss=1.4829, ppl=4.41, grad_norm=2.39, lr=5.37e-06, throughput=2346 tok/s
2025-11-26 19:37:57,966 - INFO - Epoch 1 Step 1520 (Global: 5520): loss=1.6404, ppl=5.16, grad_norm=1.38, lr=5.35e-06, throughput=2343 tok/s
2025-11-26 19:41:23,383 - INFO - Epoch 1 Step 1530 (Global: 5530): loss=1.7479, ppl=5.74, grad_norm=1.59, lr=5.33e-06, throughput=2337 tok/s
2025-11-26 19:44:47,957 - INFO - Epoch 1 Step 1540 (Global: 5540): loss=1.5609, ppl=4.76, grad_norm=1.78, lr=5.32e-06, throughput=2346 tok/s
2025-11-26 19:48:13,234 - INFO - Epoch 1 Step 1550 (Global: 5550): loss=1.6103, ppl=5.00, grad_norm=1.88, lr=5.30e-06, throughput=2338 tok/s
2025-11-26 19:51:37,108 - INFO - Epoch 1 Step 1560 (Global: 5560): loss=1.5914, ppl=4.91, grad_norm=1.22, lr=5.28e-06, throughput=2354 tok/s
2025-11-26 19:55:00,732 - INFO - Epoch 1 Step 1570 (Global: 5570): loss=1.7428, ppl=5.71, grad_norm=1.38, lr=5.27e-06, throughput=2357 tok/s
2025-11-26 19:58:25,710 - INFO - Epoch 1 Step 1580 (Global: 5580): loss=1.6435, ppl=5.17, grad_norm=1.71, lr=5.25e-06, throughput=2342 tok/s
2025-11-26 20:01:49,854 - INFO - Epoch 1 Step 1590 (Global: 5590): loss=1.9722, ppl=7.19, grad_norm=1.52, lr=5.23e-06, throughput=2351 tok/s
2025-11-26 20:05:14,459 - INFO - Epoch 1 Step 1600 (Global: 5600): loss=1.6881, ppl=5.41, grad_norm=1.62, lr=5.22e-06, throughput=2346 tok/s
2025-11-26 20:08:44,929 - INFO - Epoch 1 Step 1610 (Global: 5610): loss=1.5186, ppl=4.57, grad_norm=1.48, lr=5.20e-06, throughput=2281 tok/s
2025-11-26 20:12:13,095 - INFO - Epoch 1 Step 1620 (Global: 5620): loss=1.7530, ppl=5.77, grad_norm=1.49, lr=5.18e-06, throughput=2306 tok/s
2025-11-26 20:15:42,661 - INFO - Epoch 1 Step 1630 (Global: 5630): loss=1.3741, ppl=3.95, grad_norm=1.37, lr=5.17e-06, throughput=2290 tok/s
2025-11-26 20:19:10,232 - INFO - Epoch 1 Step 1640 (Global: 5640): loss=1.4650, ppl=4.33, grad_norm=1.17, lr=5.15e-06, throughput=2312 tok/s
2025-11-26 20:22:39,989 - INFO - Epoch 1 Step 1650 (Global: 5650): loss=1.7150, ppl=5.56, grad_norm=1.41, lr=5.13e-06, throughput=2288 tok/s
2025-11-26 20:26:10,344 - INFO - Epoch 1 Step 1660 (Global: 5660): loss=1.6300, ppl=5.10, grad_norm=1.61, lr=5.12e-06, throughput=2282 tok/s
2025-11-26 20:29:39,633 - INFO - Epoch 1 Step 1670 (Global: 5670): loss=1.8847, ppl=6.58, grad_norm=1.34, lr=5.10e-06, throughput=2293 tok/s
2025-11-26 20:33:09,258 - INFO - Epoch 1 Step 1680 (Global: 5680): loss=1.7402, ppl=5.70, grad_norm=1.55, lr=5.08e-06, throughput=2290 tok/s
2025-11-26 20:36:37,671 - INFO - Epoch 1 Step 1690 (Global: 5690): loss=1.5155, ppl=4.55, grad_norm=1.72, lr=5.07e-06, throughput=2303 tok/s
2025-11-26 20:40:07,810 - INFO - Epoch 1 Step 1700 (Global: 5700): loss=1.5077, ppl=4.52, grad_norm=1.34, lr=5.05e-06, throughput=2284 tok/s
2025-11-26 20:43:36,536 - INFO - Epoch 1 Step 1710 (Global: 5710): loss=1.6036, ppl=4.97, grad_norm=1.51, lr=5.03e-06, throughput=2300 tok/s
2025-11-26 20:47:06,160 - INFO - Epoch 1 Step 1720 (Global: 5720): loss=1.5908, ppl=4.91, grad_norm=1.52, lr=5.02e-06, throughput=2290 tok/s
2025-11-26 20:50:35,678 - INFO - Epoch 1 Step 1730 (Global: 5730): loss=1.6747, ppl=5.34, grad_norm=1.46, lr=5.00e-06, throughput=2291 tok/s
2025-11-26 20:54:03,974 - INFO - Epoch 1 Step 1740 (Global: 5740): loss=1.6198, ppl=5.05, grad_norm=1.50, lr=4.98e-06, throughput=2304 tok/s
2025-11-26 20:57:34,815 - INFO - Epoch 1 Step 1750 (Global: 5750): loss=1.4641, ppl=4.32, grad_norm=1.46, lr=4.96e-06, throughput=2277 tok/s
2025-11-26 21:01:03,854 - INFO - Epoch 1 Step 1760 (Global: 5760): loss=1.6753, ppl=5.34, grad_norm=2.14, lr=4.95e-06, throughput=2296 tok/s
2025-11-26 21:04:33,467 - INFO - Epoch 1 Step 1770 (Global: 5770): loss=1.5430, ppl=4.68, grad_norm=1.55, lr=4.93e-06, throughput=2290 tok/s
2025-11-26 21:08:02,792 - INFO - Epoch 1 Step 1780 (Global: 5780): loss=1.5668, ppl=4.79, grad_norm=2.19, lr=4.91e-06, throughput=2293 tok/s
2025-11-26 21:11:31,727 - INFO - Epoch 1 Step 1790 (Global: 5790): loss=1.4746, ppl=4.37, grad_norm=2.17, lr=4.90e-06, throughput=2297 tok/s
2025-11-26 21:15:01,687 - INFO - Epoch 1 Step 1800 (Global: 5800): loss=1.6367, ppl=5.14, grad_norm=3.62, lr=4.88e-06, throughput=2286 tok/s
2025-11-26 21:18:29,721 - INFO - Epoch 1 Step 1810 (Global: 5810): loss=1.7872, ppl=5.97, grad_norm=1.53, lr=4.86e-06, throughput=2307 tok/s
2025-11-26 21:21:57,639 - INFO - Epoch 1 Step 1820 (Global: 5820): loss=1.3995, ppl=4.05, grad_norm=1.04, lr=4.85e-06, throughput=2309 tok/s
2025-11-26 21:25:22,944 - INFO - Epoch 1 Step 1830 (Global: 5830): loss=1.6517, ppl=5.22, grad_norm=1.71, lr=4.83e-06, throughput=2338 tok/s
2025-11-26 21:28:51,379 - INFO - Epoch 1 Step 1840 (Global: 5840): loss=1.6664, ppl=5.29, grad_norm=1.15, lr=4.81e-06, throughput=2303 tok/s
2025-11-26 21:32:21,802 - INFO - Epoch 1 Step 1850 (Global: 5850): loss=1.6906, ppl=5.42, grad_norm=1.34, lr=4.80e-06, throughput=2281 tok/s
2025-11-26 21:35:50,515 - INFO - Epoch 1 Step 1860 (Global: 5860): loss=1.6730, ppl=5.33, grad_norm=1.34, lr=4.78e-06, throughput=2300 tok/s
2025-11-26 21:39:19,830 - INFO - Epoch 1 Step 1870 (Global: 5870): loss=1.6172, ppl=5.04, grad_norm=1.26, lr=4.76e-06, throughput=2293 tok/s
2025-11-26 21:42:49,268 - INFO - Epoch 1 Step 1880 (Global: 5880): loss=1.4436, ppl=4.24, grad_norm=1.82, lr=4.75e-06, throughput=2292 tok/s
2025-11-26 21:46:18,551 - INFO - Epoch 1 Step 1890 (Global: 5890): loss=1.8251, ppl=6.20, grad_norm=1.23, lr=4.73e-06, throughput=2294 tok/s
2025-11-26 21:49:48,315 - INFO - Epoch 1 Step 1900 (Global: 5900): loss=1.7322, ppl=5.65, grad_norm=1.78, lr=4.71e-06, throughput=2288 tok/s
2025-11-26 21:53:16,076 - INFO - Epoch 1 Step 1910 (Global: 5910): loss=1.8912, ppl=6.63, grad_norm=1.52, lr=4.70e-06, throughput=2310 tok/s
2025-11-26 21:56:45,519 - INFO - Epoch 1 Step 1920 (Global: 5920): loss=1.6152, ppl=5.03, grad_norm=1.46, lr=4.68e-06, throughput=2292 tok/s
2025-11-26 22:00:15,001 - INFO - Epoch 1 Step 1930 (Global: 5930): loss=1.5182, ppl=4.56, grad_norm=1.40, lr=4.66e-06, throughput=2291 tok/s
2025-11-26 22:03:43,597 - INFO - Epoch 1 Step 1940 (Global: 5940): loss=1.5657, ppl=4.79, grad_norm=1.23, lr=4.65e-06, throughput=2301 tok/s
2025-11-26 22:07:15,354 - INFO - Epoch 1 Step 1950 (Global: 5950): loss=1.4983, ppl=4.47, grad_norm=1.74, lr=4.63e-06, throughput=2267 tok/s
2025-11-26 22:10:44,094 - INFO - Epoch 1 Step 1960 (Global: 5960): loss=1.4744, ppl=4.37, grad_norm=1.19, lr=4.61e-06, throughput=2300 tok/s
2025-11-26 22:14:14,296 - INFO - Epoch 1 Step 1970 (Global: 5970): loss=1.6319, ppl=5.11, grad_norm=1.13, lr=4.60e-06, throughput=2284 tok/s
2025-11-26 22:17:43,214 - INFO - Epoch 1 Step 1980 (Global: 5980): loss=1.5544, ppl=4.73, grad_norm=2.16, lr=4.58e-06, throughput=2298 tok/s
2025-11-26 22:21:10,123 - INFO - Epoch 1 Step 1990 (Global: 5990): loss=1.5317, ppl=4.63, grad_norm=1.96, lr=4.56e-06, throughput=2320 tok/s
2025-11-26 22:24:36,446 - INFO - Epoch 1 Step 2000 (Global: 6000): loss=1.6058, ppl=4.98, grad_norm=1.34, lr=4.55e-06, throughput=2326 tok/s
2025-11-26 22:24:36,446 - INFO -
Running validation at step 6000...
2025-11-26 22:36:45,891 - WARNING - NLTK wordnet data missing - METEOR score unavailable. Run: python -m nltk.downloader wordnet omw-1.4
2025-11-26 22:36:45,910 - INFO - Validation loss: 1.6441, perplexity: 5.18
2025-11-26 22:36:45,911 - INFO -
======================================================================
2025-11-26 22:36:45,911 - INFO - Qualitative Evaluation Samples:
2025-11-26 22:36:45,911 - INFO - ======================================================================
2025-11-26 22:36:45,912 - INFO -
Sample 1 (ID: sample_141920_chunk_1):
2025-11-26 22:36:45,912 - INFO - Context: [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-26 22:36:45,913 - INFO - Generated: ' to the band\'s previous work, saying that "Death Cab for Cutie is a lot more mature, a lot more confident, a lot more confident in their own abilities, and they\'re not afraid to be themselves. And tha...'
2025-11-26 22:36:45,913 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-26 22:36:45,913 - INFO - ----------------------------------------------------------------------
2025-11-26 22:36:45,914 - INFO -
Sample 2 (ID: sample_170543_chunk_2):
2025-11-26 22:36:45,914 - INFO - Context: [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-26 22:36:45,914 - INFO - Generated: 'aternity and sorority houses, but it was not uncommon for fraternity and sorority houses to have Native American-themed events. The Order of the Arrow has a Native American theme, and the Order of the...'
2025-11-26 22:36:45,915 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-26 22:36:45,915 - INFO - ----------------------------------------------------------------------
2025-11-26 22:36:45,915 - INFO -
Sample 3 (ID: sample_107152_chunk_9):
2025-11-26 22:36:45,916 - INFO - Context: [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-26 22:36:45,916 - INFO - Generated: " be killed by the Red Tail's leader, Shigeki. Oga is then forced to fight the Red Tail's leader, Shigeki, in a duel, but is defeated. Oga is then forced to fight the Red Tail's leader, Shigeki, in a d..."
2025-11-26 22:36:45,917 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-26 22:36:45,917 - INFO - ----------------------------------------------------------------------
2025-11-26 22:36:45,917 - INFO -
Sample 4 (ID: sample_069148_chunk_0):
2025-11-26 22:36:45,918 - INFO - Context: [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-26 22:36:45,918 - INFO - Generated: '-1 | 1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-32-33-34-35-36-37-38-39-40-41-42-43-44-45-46-47-48-49-50-51-52-53-54-55-56-57-58-59-60-61-62-63-64-65-66-67-68-...'
2025-11-26 22:36:45,919 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...'
2025-11-26 22:36:45,919 - INFO - ----------------------------------------------------------------------
2025-11-26 22:36:45,919 - INFO -
Sample 5 (ID: sample_103176_chunk_4):
2025-11-26 22:36:45,920 - INFO - Context: [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-26 22:36:45,920 - INFO - Generated: '1 | BlackBerry PlayBook | EA Tiburon | [ 150 ] |\n| Madden NFL 12 | August 30, 2011 | iOS | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...'
2025-11-26 22:36:45,921 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...'
2025-11-26 22:36:45,921 - INFO - ----------------------------------------------------------------------
2025-11-26 22:36:45,924 - INFO -
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_6000.jsonl
2025-11-26 22:39:11,210 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-26 22:39:11,225 - INFO - New best validation loss: 1.6441, perplexity: 5.18
2025-11-26 22:42:42,003 - INFO - Epoch 1 Step 2010 (Global: 6010): loss=1.5365, ppl=4.65, grad_norm=1.12, lr=4.53e-06, throughput=2278 tok/s
2025-11-26 22:46:11,278 - INFO - Epoch 1 Step 2020 (Global: 6020): loss=1.5825, ppl=4.87, grad_norm=1.89, lr=4.51e-06, throughput=2294 tok/s
2025-11-26 22:49:35,682 - INFO - Epoch 1 Step 2030 (Global: 6030): loss=1.8149, ppl=6.14, grad_norm=2.05, lr=4.50e-06, throughput=2348 tok/s
2025-11-26 22:53:04,832 - INFO - Epoch 1 Step 2040 (Global: 6040): loss=1.5978, ppl=4.94, grad_norm=1.34, lr=4.48e-06, throughput=2295 tok/s
2025-11-26 22:56:33,844 - INFO - Epoch 1 Step 2050 (Global: 6050): loss=1.6242, ppl=5.07, grad_norm=1.59, lr=4.46e-06, throughput=2297 tok/s
2025-11-26 23:00:01,675 - INFO - Epoch 1 Step 2060 (Global: 6060): loss=1.7453, ppl=5.73, grad_norm=3.45, lr=4.45e-06, throughput=2310 tok/s
2025-11-26 23:03:30,408 - INFO - Epoch 1 Step 2070 (Global: 6070): loss=1.9817, ppl=7.26, grad_norm=1.52, lr=4.43e-06, throughput=2300 tok/s
2025-11-26 23:06:57,547 - INFO - Epoch 1 Step 2080 (Global: 6080): loss=1.5477, ppl=4.70, grad_norm=1.12, lr=4.41e-06, throughput=2317 tok/s
2025-11-26 23:10:24,976 - INFO - Epoch 1 Step 2090 (Global: 6090): loss=1.6698, ppl=5.31, grad_norm=1.52, lr=4.40e-06, throughput=2314 tok/s
2025-11-26 23:13:54,344 - INFO - Epoch 1 Step 2100 (Global: 6100): loss=1.4559, ppl=4.29, grad_norm=1.73, lr=4.38e-06, throughput=2293 tok/s
2025-11-26 23:17:22,072 - INFO - Epoch 1 Step 2110 (Global: 6110): loss=1.3795, ppl=3.97, grad_norm=1.01, lr=4.36e-06, throughput=2311 tok/s
2025-11-26 23:20:50,207 - INFO - Epoch 1 Step 2120 (Global: 6120): loss=1.6584, ppl=5.25, grad_norm=2.06, lr=4.35e-06, throughput=2306 tok/s
2025-11-26 23:24:17,149 - INFO - Epoch 1 Step 2130 (Global: 6130): loss=1.6136, ppl=5.02, grad_norm=1.34, lr=4.33e-06, throughput=2320 tok/s
2025-11-26 23:27:43,576 - INFO - Epoch 1 Step 2140 (Global: 6140): loss=1.6063, ppl=4.98, grad_norm=1.98, lr=4.31e-06, throughput=2325 tok/s
2025-11-26 23:31:15,974 - INFO - Epoch 1 Step 2150 (Global: 6150): loss=1.8216, ppl=6.18, grad_norm=2.02, lr=4.30e-06, throughput=2260 tok/s
2025-11-26 23:34:45,061 - INFO - Epoch 1 Step 2160 (Global: 6160): loss=1.6107, ppl=5.01, grad_norm=1.16, lr=4.28e-06, throughput=2296 tok/s
2025-11-26 23:38:16,844 - INFO - Epoch 1 Step 2170 (Global: 6170): loss=1.8281, ppl=6.22, grad_norm=1.41, lr=4.26e-06, throughput=2266 tok/s
2025-11-26 23:41:44,869 - INFO - Epoch 1 Step 2180 (Global: 6180): loss=1.6540, ppl=5.23, grad_norm=1.55, lr=4.25e-06, throughput=2307 tok/s
2025-11-26 23:45:15,971 - INFO - Epoch 1 Step 2190 (Global: 6190): loss=1.5850, ppl=4.88, grad_norm=1.31, lr=4.23e-06, throughput=2274 tok/s
2025-11-26 23:48:45,636 - INFO - Epoch 1 Step 2200 (Global: 6200): loss=1.5717, ppl=4.81, grad_norm=1.27, lr=4.21e-06, throughput=2289 tok/s
2025-11-26 23:52:15,851 - INFO - Epoch 1 Step 2210 (Global: 6210): loss=1.5257, ppl=4.60, grad_norm=1.24, lr=4.20e-06, throughput=2283 tok/s
2025-11-26 23:55:46,844 - INFO - Epoch 1 Step 2220 (Global: 6220): loss=1.7978, ppl=6.04, grad_norm=1.55, lr=4.18e-06, throughput=2275 tok/s
2025-11-26 23:59:15,622 - INFO - Epoch 1 Step 2230 (Global: 6230): loss=1.5428, ppl=4.68, grad_norm=1.26, lr=4.16e-06, throughput=2299 tok/s
2025-11-27 00:02:47,405 - INFO - Epoch 1 Step 2240 (Global: 6240): loss=1.5888, ppl=4.90, grad_norm=1.55, lr=4.15e-06, throughput=2266 tok/s
2025-11-27 00:06:16,560 - INFO - Epoch 1 Step 2250 (Global: 6250): loss=1.3918, ppl=4.02, grad_norm=1.64, lr=4.13e-06, throughput=2295 tok/s
2025-11-27 00:09:46,152 - INFO - Epoch 1 Step 2260 (Global: 6260): loss=1.7444, ppl=5.72, grad_norm=1.93, lr=4.12e-06, throughput=2290 tok/s
2025-11-27 00:13:18,192 - INFO - Epoch 1 Step 2270 (Global: 6270): loss=1.6026, ppl=4.97, grad_norm=3.36, lr=4.10e-06, throughput=2264 tok/s
2025-11-27 00:16:47,801 - INFO - Epoch 1 Step 2280 (Global: 6280): loss=1.4486, ppl=4.26, grad_norm=1.48, lr=4.08e-06, throughput=2290 tok/s
2025-11-27 00:20:16,925 - INFO - Epoch 1 Step 2290 (Global: 6290): loss=1.5349, ppl=4.64, grad_norm=1.62, lr=4.07e-06, throughput=2295 tok/s
2025-11-27 00:23:46,492 - INFO - Epoch 1 Step 2300 (Global: 6300): loss=1.6205, ppl=5.06, grad_norm=1.27, lr=4.05e-06, throughput=2290 tok/s
2025-11-27 00:27:12,439 - INFO - Epoch 1 Step 2310 (Global: 6310): loss=1.4740, ppl=4.37, grad_norm=1.75, lr=4.03e-06, throughput=2331 tok/s
2025-11-27 00:30:39,262 - INFO - Epoch 1 Step 2320 (Global: 6320): loss=1.8079, ppl=6.10, grad_norm=1.62, lr=4.02e-06, throughput=2321 tok/s
2025-11-27 00:34:08,568 - INFO - Epoch 1 Step 2330 (Global: 6330): loss=1.5097, ppl=4.53, grad_norm=1.89, lr=4.00e-06, throughput=2293 tok/s
2025-11-27 00:37:36,224 - INFO - Epoch 1 Step 2340 (Global: 6340): loss=1.6409, ppl=5.16, grad_norm=1.66, lr=3.98e-06, throughput=2312 tok/s
2025-11-27 00:41:07,289 - INFO - Epoch 1 Step 2350 (Global: 6350): loss=1.5278, ppl=4.61, grad_norm=2.30, lr=3.97e-06, throughput=2274 tok/s
2025-11-27 00:44:38,985 - INFO - Epoch 1 Step 2360 (Global: 6360): loss=1.5100, ppl=4.53, grad_norm=1.91, lr=3.95e-06, throughput=2267 tok/s
2025-11-27 00:48:08,011 - INFO - Epoch 1 Step 2370 (Global: 6370): loss=1.5218, ppl=4.58, grad_norm=1.75, lr=3.93e-06, throughput=2296 tok/s
2025-11-27 00:51:37,415 - INFO - Epoch 1 Step 2380 (Global: 6380): loss=1.7147, ppl=5.55, grad_norm=1.14, lr=3.92e-06, throughput=2292 tok/s
2025-11-27 00:55:08,049 - INFO - Epoch 1 Step 2390 (Global: 6390): loss=1.6802, ppl=5.37, grad_norm=1.06, lr=3.90e-06, throughput=2279 tok/s
2025-11-27 00:58:38,419 - INFO - Epoch 1 Step 2400 (Global: 6400): loss=1.6613, ppl=5.27, grad_norm=1.12, lr=3.89e-06, throughput=2282 tok/s
2025-11-27 01:02:09,669 - INFO - Epoch 1 Step 2410 (Global: 6410): loss=1.6886, ppl=5.41, grad_norm=1.30, lr=3.87e-06, throughput=2272 tok/s
2025-11-27 01:05:40,972 - INFO - Epoch 1 Step 2420 (Global: 6420): loss=1.6153, ppl=5.03, grad_norm=1.31, lr=3.85e-06, throughput=2272 tok/s
2025-11-27 01:09:03,975 - INFO - Epoch 1 Step 2430 (Global: 6430): loss=1.8013, ppl=6.06, grad_norm=1.36, lr=3.84e-06, throughput=2365 tok/s
2025-11-27 01:13:01,774 - INFO - Epoch 1 Step 2440 (Global: 6440): loss=1.5954, ppl=4.93, grad_norm=2.16, lr=3.82e-06, throughput=2019 tok/s
2025-11-27 01:17:07,423 - INFO - Epoch 1 Step 2450 (Global: 6450): loss=1.4617, ppl=4.31, grad_norm=1.45, lr=3.80e-06, throughput=1954 tok/s
2025-11-27 01:20:37,024 - INFO - Epoch 1 Step 2460 (Global: 6460): loss=1.8627, ppl=6.44, grad_norm=1.40, lr=3.79e-06, throughput=2290 tok/s
2025-11-27 01:24:06,975 - INFO - Epoch 1 Step 2470 (Global: 6470): loss=1.5834, ppl=4.87, grad_norm=1.37, lr=3.77e-06, throughput=2286 tok/s
2025-11-27 01:27:41,438 - INFO - Epoch 1 Step 2480 (Global: 6480): loss=1.7328, ppl=5.66, grad_norm=1.52, lr=3.76e-06, throughput=2238 tok/s
2025-11-27 01:31:13,298 - INFO - Epoch 1 Step 2490 (Global: 6490): loss=1.4429, ppl=4.23, grad_norm=2.61, lr=3.74e-06, throughput=2266 tok/s
2025-11-27 01:34:44,742 - INFO - Epoch 1 Step 2500 (Global: 6500): loss=1.8865, ppl=6.60, grad_norm=1.91, lr=3.72e-06, throughput=2270 tok/s
2025-11-27 01:38:17,628 - INFO - Epoch 1 Step 2510 (Global: 6510): loss=1.4727, ppl=4.36, grad_norm=1.68, lr=3.71e-06, throughput=2255 tok/s
2025-11-27 01:41:51,485 - INFO - Epoch 1 Step 2520 (Global: 6520): loss=1.7039, ppl=5.50, grad_norm=2.08, lr=3.69e-06, throughput=2245 tok/s
2025-11-27 01:45:23,690 - INFO - Epoch 1 Step 2530 (Global: 6530): loss=1.7020, ppl=5.48, grad_norm=1.52, lr=3.67e-06, throughput=2262 tok/s
2025-11-27 01:48:57,051 - INFO - Epoch 1 Step 2540 (Global: 6540): loss=1.6842, ppl=5.39, grad_norm=1.38, lr=3.66e-06, throughput=2250 tok/s
2025-11-27 01:52:27,567 - INFO - Epoch 1 Step 2550 (Global: 6550): loss=1.6493, ppl=5.20, grad_norm=1.91, lr=3.64e-06, throughput=2280 tok/s
2025-11-27 01:56:01,798 - INFO - Epoch 1 Step 2560 (Global: 6560): loss=1.7493, ppl=5.75, grad_norm=2.08, lr=3.63e-06, throughput=2241 tok/s
2025-11-27 01:59:33,645 - INFO - Epoch 1 Step 2570 (Global: 6570): loss=1.3693, ppl=3.93, grad_norm=1.52, lr=3.61e-06, throughput=2266 tok/s
2025-11-27 02:03:05,696 - INFO - Epoch 1 Step 2580 (Global: 6580): loss=1.6137, ppl=5.02, grad_norm=2.17, lr=3.59e-06, throughput=2264 tok/s
2025-11-27 02:06:38,011 - INFO - Epoch 1 Step 2590 (Global: 6590): loss=1.5096, ppl=4.52, grad_norm=1.23, lr=3.58e-06, throughput=2261 tok/s
2025-11-27 02:10:09,728 - INFO - Epoch 1 Step 2600 (Global: 6600): loss=1.5652, ppl=4.78, grad_norm=1.26, lr=3.56e-06, throughput=2267 tok/s
2025-11-27 02:13:40,824 - INFO - Epoch 1 Step 2610 (Global: 6610): loss=1.6559, ppl=5.24, grad_norm=1.44, lr=3.55e-06, throughput=2274 tok/s
2025-11-27 02:17:13,358 - INFO - Epoch 1 Step 2620 (Global: 6620): loss=1.5663, ppl=4.79, grad_norm=1.62, lr=3.53e-06, throughput=2258 tok/s
2025-11-27 02:20:48,082 - INFO - Epoch 1 Step 2630 (Global: 6630): loss=1.5840, ppl=4.87, grad_norm=1.52, lr=3.51e-06, throughput=2235 tok/s
2025-11-27 02:24:20,093 - INFO - Epoch 1 Step 2640 (Global: 6640): loss=2.0053, ppl=7.43, grad_norm=1.66, lr=3.50e-06, throughput=2264 tok/s
2025-11-27 02:27:51,743 - INFO - Epoch 1 Step 2650 (Global: 6650): loss=1.4028, ppl=4.07, grad_norm=1.44, lr=3.48e-06, throughput=2268 tok/s
2025-11-27 02:31:23,911 - INFO - Epoch 1 Step 2660 (Global: 6660): loss=1.4283, ppl=4.17, grad_norm=1.24, lr=3.47e-06, throughput=2262 tok/s
2025-11-27 02:34:56,040 - INFO - Epoch 1 Step 2670 (Global: 6670): loss=1.7759, ppl=5.91, grad_norm=1.23, lr=3.45e-06, throughput=2263 tok/s
2025-11-27 02:38:30,700 - INFO - Epoch 1 Step 2680 (Global: 6680): loss=1.7219, ppl=5.60, grad_norm=1.24, lr=3.43e-06, throughput=2236 tok/s
2025-11-27 02:42:02,147 - INFO - Epoch 1 Step 2690 (Global: 6690): loss=1.6588, ppl=5.25, grad_norm=1.44, lr=3.42e-06, throughput=2270 tok/s
2025-11-27 02:45:34,746 - INFO - Epoch 1 Step 2700 (Global: 6700): loss=1.5907, ppl=4.91, grad_norm=1.29, lr=3.40e-06, throughput=2258 tok/s
2025-11-27 02:49:07,484 - INFO - Epoch 1 Step 2710 (Global: 6710): loss=1.6153, ppl=5.03, grad_norm=1.61, lr=3.39e-06, throughput=2256 tok/s
2025-11-27 02:52:42,038 - INFO - Epoch 1 Step 2720 (Global: 6720): loss=1.7161, ppl=5.56, grad_norm=2.22, lr=3.37e-06, throughput=2237 tok/s
2025-11-27 02:56:17,254 - INFO - Epoch 1 Step 2730 (Global: 6730): loss=1.5639, ppl=4.78, grad_norm=1.35, lr=3.35e-06, throughput=2230 tok/s
2025-11-27 02:59:48,692 - INFO - Epoch 1 Step 2740 (Global: 6740): loss=1.6292, ppl=5.10, grad_norm=1.47, lr=3.34e-06, throughput=2270 tok/s
2025-11-27 03:03:23,083 - INFO - Epoch 1 Step 2750 (Global: 6750): loss=1.7152, ppl=5.56, grad_norm=1.44, lr=3.32e-06, throughput=2239 tok/s
2025-11-27 03:06:56,939 - INFO - Epoch 1 Step 2760 (Global: 6760): loss=1.4593, ppl=4.30, grad_norm=1.41, lr=3.31e-06, throughput=2245 tok/s
2025-11-27 03:10:29,418 - INFO - Epoch 1 Step 2770 (Global: 6770): loss=1.5058, ppl=4.51, grad_norm=1.57, lr=3.29e-06, throughput=2259 tok/s
2025-11-27 03:14:01,934 - INFO - Epoch 1 Step 2780 (Global: 6780): loss=1.6385, ppl=5.15, grad_norm=1.85, lr=3.28e-06, throughput=2259 tok/s
2025-11-27 03:17:33,346 - INFO - Epoch 1 Step 2790 (Global: 6790): loss=1.5304, ppl=4.62, grad_norm=1.12, lr=3.26e-06, throughput=2270 tok/s
2025-11-27 03:21:05,669 - INFO - Epoch 1 Step 2800 (Global: 6800): loss=1.6664, ppl=5.29, grad_norm=1.17, lr=3.24e-06, throughput=2261 tok/s
2025-11-27 03:24:40,485 - INFO - Epoch 1 Step 2810 (Global: 6810): loss=1.7402, ppl=5.70, grad_norm=1.38, lr=3.23e-06, throughput=2234 tok/s
2025-11-27 03:28:13,414 - INFO - Epoch 1 Step 2820 (Global: 6820): loss=1.6874, ppl=5.41, grad_norm=1.63, lr=3.21e-06, throughput=2254 tok/s
2025-11-27 03:31:45,484 - INFO - Epoch 1 Step 2830 (Global: 6830): loss=1.4830, ppl=4.41, grad_norm=2.45, lr=3.20e-06, throughput=2263 tok/s
2025-11-27 03:35:20,038 - INFO - Epoch 1 Step 2840 (Global: 6840): loss=1.7287, ppl=5.63, grad_norm=1.17, lr=3.18e-06, throughput=2237 tok/s
2025-11-27 03:38:51,740 - INFO - Epoch 1 Step 2850 (Global: 6850): loss=1.5791, ppl=4.85, grad_norm=1.16, lr=3.17e-06, throughput=2267 tok/s
2025-11-27 03:42:23,947 - INFO - Epoch 1 Step 2860 (Global: 6860): loss=1.6467, ppl=5.19, grad_norm=1.68, lr=3.15e-06, throughput=2262 tok/s
2025-11-27 03:45:58,573 - INFO - Epoch 1 Step 2870 (Global: 6870): loss=1.5430, ppl=4.68, grad_norm=1.31, lr=3.13e-06, throughput=2236 tok/s
2025-11-27 03:49:28,222 - INFO - Epoch 1 Step 2880 (Global: 6880): loss=1.4145, ppl=4.11, grad_norm=2.98, lr=3.12e-06, throughput=2290 tok/s
2025-11-27 03:53:02,378 - INFO - Epoch 1 Step 2890 (Global: 6890): loss=1.5964, ppl=4.94, grad_norm=1.98, lr=3.10e-06, throughput=2241 tok/s
2025-11-27 03:56:33,482 - INFO - Epoch 1 Step 2900 (Global: 6900): loss=1.5785, ppl=4.85, grad_norm=1.33, lr=3.09e-06, throughput=2274 tok/s
2025-11-27 04:00:05,916 - INFO - Epoch 1 Step 2910 (Global: 6910): loss=1.7358, ppl=5.67, grad_norm=1.74, lr=3.07e-06, throughput=2260 tok/s
2025-11-27 04:03:39,852 - INFO - Epoch 1 Step 2920 (Global: 6920): loss=1.6620, ppl=5.27, grad_norm=1.26, lr=3.06e-06, throughput=2244 tok/s
2025-11-27 04:07:11,250 - INFO - Epoch 1 Step 2930 (Global: 6930): loss=1.6128, ppl=5.02, grad_norm=1.31, lr=3.04e-06, throughput=2271 tok/s
2025-11-27 04:10:44,629 - INFO - Epoch 1 Step 2940 (Global: 6940): loss=1.7695, ppl=5.87, grad_norm=1.07, lr=3.03e-06, throughput=2250 tok/s
2025-11-27 04:14:19,848 - INFO - Epoch 1 Step 2950 (Global: 6950): loss=1.5935, ppl=4.92, grad_norm=1.42, lr=3.01e-06, throughput=2230 tok/s
2025-11-27 04:17:54,050 - INFO - Epoch 1 Step 2960 (Global: 6960): loss=1.3491, ppl=3.85, grad_norm=1.72, lr=3.00e-06, throughput=2241 tok/s
2025-11-27 04:21:27,378 - INFO - Epoch 1 Step 2970 (Global: 6970): loss=1.6235, ppl=5.07, grad_norm=1.40, lr=2.98e-06, throughput=2250 tok/s
2025-11-27 04:25:01,629 - INFO - Epoch 1 Step 2980 (Global: 6980): loss=1.8911, ppl=6.63, grad_norm=2.16, lr=2.96e-06, throughput=2240 tok/s
2025-11-27 04:28:34,000 - INFO - Epoch 1 Step 2990 (Global: 6990): loss=1.6364, ppl=5.14, grad_norm=3.81, lr=2.95e-06, throughput=2260 tok/s
2025-11-27 04:32:05,377 - INFO - Epoch 1 Step 3000 (Global: 7000): loss=1.7852, ppl=5.96, grad_norm=2.25, lr=2.93e-06, throughput=2271 tok/s
2025-11-27 04:35:41,360 - INFO - Epoch 1 Step 3010 (Global: 7010): loss=1.5919, ppl=4.91, grad_norm=1.64, lr=2.92e-06, throughput=2222 tok/s
2025-11-27 04:39:12,066 - INFO - Epoch 1 Step 3020 (Global: 7020): loss=1.4436, ppl=4.24, grad_norm=1.89, lr=2.90e-06, throughput=2278 tok/s
2025-11-27 04:42:44,384 - INFO - Epoch 1 Step 3030 (Global: 7030): loss=1.6126, ppl=5.02, grad_norm=1.38, lr=2.89e-06, throughput=2261 tok/s
2025-11-27 04:46:15,860 - INFO - Epoch 1 Step 3040 (Global: 7040): loss=1.6120, ppl=5.01, grad_norm=1.96, lr=2.87e-06, throughput=2270 tok/s
2025-11-27 04:49:49,439 - INFO - Epoch 1 Step 3050 (Global: 7050): loss=1.3577, ppl=3.89, grad_norm=1.26, lr=2.86e-06, throughput=2247 tok/s
2025-11-27 04:53:21,224 - INFO - Epoch 1 Step 3060 (Global: 7060): loss=1.7837, ppl=5.95, grad_norm=1.59, lr=2.84e-06, throughput=2266 tok/s
2025-11-27 04:56:57,455 - INFO - Epoch 1 Step 3070 (Global: 7070): loss=1.7341, ppl=5.66, grad_norm=1.16, lr=2.83e-06, throughput=2220 tok/s
2025-11-27 05:00:28,194 - INFO - Epoch 1 Step 3080 (Global: 7080): loss=1.4054, ppl=4.08, grad_norm=1.04, lr=2.81e-06, throughput=2278 tok/s
2025-11-27 05:04:03,004 - INFO - Epoch 1 Step 3090 (Global: 7090): loss=1.6133, ppl=5.02, grad_norm=1.49, lr=2.80e-06, throughput=2235 tok/s
2025-11-27 05:07:35,155 - INFO - Epoch 1 Step 3100 (Global: 7100): loss=1.5820, ppl=4.86, grad_norm=1.10, lr=2.78e-06, throughput=2263 tok/s
2025-11-27 05:11:09,449 - INFO - Epoch 1 Step 3110 (Global: 7110): loss=1.5124, ppl=4.54, grad_norm=1.25, lr=2.77e-06, throughput=2240 tok/s
2025-11-27 05:14:44,271 - INFO - Epoch 1 Step 3120 (Global: 7120): loss=1.7204, ppl=5.59, grad_norm=1.40, lr=2.75e-06, throughput=2234 tok/s
2025-11-27 05:18:14,957 - INFO - Epoch 1 Step 3130 (Global: 7130): loss=1.5938, ppl=4.92, grad_norm=1.75, lr=2.74e-06, throughput=2278 tok/s
2025-11-27 05:21:49,940 - INFO - Epoch 1 Step 3140 (Global: 7140): loss=1.6331, ppl=5.12, grad_norm=1.55, lr=2.72e-06, throughput=2233 tok/s
2025-11-27 05:25:21,455 - INFO - Epoch 1 Step 3150 (Global: 7150): loss=1.7229, ppl=5.60, grad_norm=1.48, lr=2.71e-06, throughput=2269 tok/s
2025-11-27 05:28:56,025 - INFO - Epoch 1 Step 3160 (Global: 7160): loss=1.6770, ppl=5.35, grad_norm=1.52, lr=2.69e-06, throughput=2237 tok/s
2025-11-27 05:32:29,444 - INFO - Epoch 1 Step 3170 (Global: 7170): loss=1.6378, ppl=5.14, grad_norm=1.28, lr=2.68e-06, throughput=2249 tok/s
2025-11-27 05:36:03,748 - INFO - Epoch 1 Step 3180 (Global: 7180): loss=1.4781, ppl=4.38, grad_norm=1.46, lr=2.66e-06, throughput=2240 tok/s
2025-11-27 05:39:36,263 - INFO - Epoch 1 Step 3190 (Global: 7190): loss=1.6835, ppl=5.38, grad_norm=2.48, lr=2.65e-06, throughput=2259 tok/s
2025-11-27 05:43:07,326 - INFO - Epoch 1 Step 3200 (Global: 7200): loss=1.5887, ppl=4.90, grad_norm=1.26, lr=2.63e-06, throughput=2274 tok/s
2025-11-27 05:46:42,343 - INFO - Epoch 1 Step 3210 (Global: 7210): loss=1.7665, ppl=5.85, grad_norm=2.20, lr=2.62e-06, throughput=2232 tok/s
2025-11-27 05:50:17,497 - INFO - Epoch 1 Step 3220 (Global: 7220): loss=1.7307, ppl=5.64, grad_norm=1.27, lr=2.60e-06, throughput=2231 tok/s
2025-11-27 05:53:48,942 - INFO - Epoch 1 Step 3230 (Global: 7230): loss=1.5067, ppl=4.51, grad_norm=1.71, lr=2.59e-06, throughput=2270 tok/s
2025-11-27 05:57:18,000 - INFO - Epoch 1 Step 3240 (Global: 7240): loss=1.5121, ppl=4.54, grad_norm=1.45, lr=2.58e-06, throughput=2296 tok/s
2025-11-27 06:00:45,481 - INFO - Epoch 1 Step 3250 (Global: 7250): loss=1.7415, ppl=5.71, grad_norm=1.89, lr=2.56e-06, throughput=2313 tok/s
2025-11-27 06:04:11,378 - INFO - Epoch 1 Step 3260 (Global: 7260): loss=1.5100, ppl=4.53, grad_norm=1.92, lr=2.55e-06, throughput=2331 tok/s
2025-11-27 06:07:37,626 - INFO - Epoch 1 Step 3270 (Global: 7270): loss=1.7311, ppl=5.65, grad_norm=1.06, lr=2.53e-06, throughput=2327 tok/s
2025-11-27 06:11:02,626 - INFO - Epoch 1 Step 3280 (Global: 7280): loss=1.4662, ppl=4.33, grad_norm=1.81, lr=2.52e-06, throughput=2342 tok/s
2025-11-27 06:14:27,117 - INFO - Epoch 1 Step 3290 (Global: 7290): loss=1.4278, ppl=4.17, grad_norm=1.48, lr=2.50e-06, throughput=2347 tok/s
2025-11-27 06:17:52,734 - INFO - Epoch 1 Step 3300 (Global: 7300): loss=1.6611, ppl=5.26, grad_norm=1.73, lr=2.49e-06, throughput=2334 tok/s
2025-11-27 06:21:18,277 - INFO - Epoch 1 Step 3310 (Global: 7310): loss=1.8076, ppl=6.10, grad_norm=1.72, lr=2.47e-06, throughput=2335 tok/s
2025-11-27 06:24:42,730 - INFO - Epoch 1 Step 3320 (Global: 7320): loss=1.6935, ppl=5.44, grad_norm=1.54, lr=2.46e-06, throughput=2348 tok/s
2025-11-27 06:28:07,968 - INFO - Epoch 1 Step 3330 (Global: 7330): loss=1.7430, ppl=5.71, grad_norm=1.32, lr=2.44e-06, throughput=2339 tok/s
2025-11-27 06:31:33,086 - INFO - Epoch 1 Step 3340 (Global: 7340): loss=1.5017, ppl=4.49, grad_norm=2.42, lr=2.43e-06, throughput=2340 tok/s
2025-11-27 06:34:57,839 - INFO - Epoch 1 Step 3350 (Global: 7350): loss=1.5907, ppl=4.91, grad_norm=1.55, lr=2.42e-06, throughput=2344 tok/s
2025-11-27 06:38:23,287 - INFO - Epoch 1 Step 3360 (Global: 7360): loss=1.6143, ppl=5.02, grad_norm=1.73, lr=2.40e-06, throughput=2336 tok/s
2025-11-27 06:41:48,191 - INFO - Epoch 1 Step 3370 (Global: 7370): loss=1.7264, ppl=5.62, grad_norm=1.26, lr=2.39e-06, throughput=2343 tok/s
2025-11-27 06:45:12,994 - INFO - Epoch 1 Step 3380 (Global: 7380): loss=1.4600, ppl=4.31, grad_norm=1.06, lr=2.37e-06, throughput=2344 tok/s
2025-11-27 06:48:37,156 - INFO - Epoch 1 Step 3390 (Global: 7390): loss=1.8470, ppl=6.34, grad_norm=1.35, lr=2.36e-06, throughput=2351 tok/s
2025-11-27 06:52:01,075 - INFO - Epoch 1 Step 3400 (Global: 7400): loss=1.5430, ppl=4.68, grad_norm=3.88, lr=2.34e-06, throughput=2354 tok/s
2025-11-27 06:55:25,850 - INFO - Epoch 1 Step 3410 (Global: 7410): loss=1.6049, ppl=4.98, grad_norm=1.44, lr=2.33e-06, throughput=2344 tok/s
2025-11-27 06:58:49,637 - INFO - Epoch 1 Step 3420 (Global: 7420): loss=1.7217, ppl=5.59, grad_norm=1.66, lr=2.32e-06, throughput=2355 tok/s
2025-11-27 07:02:19,931 - INFO - Epoch 1 Step 3430 (Global: 7430): loss=1.9944, ppl=7.35, grad_norm=1.33, lr=2.30e-06, throughput=2283 tok/s
2025-11-27 07:05:49,217 - INFO - Epoch 1 Step 3440 (Global: 7440): loss=1.6351, ppl=5.13, grad_norm=1.51, lr=2.29e-06, throughput=2294 tok/s
2025-11-27 07:09:18,712 - INFO - Epoch 1 Step 3450 (Global: 7450): loss=1.6559, ppl=5.24, grad_norm=1.51, lr=2.27e-06, throughput=2291 tok/s
2025-11-27 07:12:49,970 - INFO - Epoch 1 Step 3460 (Global: 7460): loss=1.6631, ppl=5.28, grad_norm=1.40, lr=2.26e-06, throughput=2272 tok/s
2025-11-27 07:16:21,851 - INFO - Epoch 1 Step 3470 (Global: 7470): loss=1.7262, ppl=5.62, grad_norm=1.09, lr=2.25e-06, throughput=2265 tok/s
2025-11-27 07:19:49,361 - INFO - Epoch 1 Step 3480 (Global: 7480): loss=1.5154, ppl=4.55, grad_norm=1.38, lr=2.23e-06, throughput=2313 tok/s
2025-11-27 07:23:22,965 - INFO - Epoch 1 Step 3490 (Global: 7490): loss=1.8209, ppl=6.18, grad_norm=1.34, lr=2.22e-06, throughput=2247 tok/s
2025-11-27 07:26:50,157 - INFO - Epoch 1 Step 3500 (Global: 7500): loss=1.6126, ppl=5.02, grad_norm=1.37, lr=2.20e-06, throughput=2317 tok/s
2025-11-27 07:30:20,635 - INFO - Epoch 1 Step 3510 (Global: 7510): loss=1.7271, ppl=5.62, grad_norm=1.79, lr=2.19e-06, throughput=2281 tok/s
2025-11-27 07:33:50,580 - INFO - Epoch 1 Step 3520 (Global: 7520): loss=1.4353, ppl=4.20, grad_norm=1.56, lr=2.18e-06, throughput=2286 tok/s
2025-11-27 07:37:15,090 - INFO - Epoch 1 Step 3530 (Global: 7530): loss=1.6192, ppl=5.05, grad_norm=1.27, lr=2.16e-06, throughput=2347 tok/s
2025-11-27 07:40:39,104 - INFO - Epoch 1 Step 3540 (Global: 7540): loss=1.3397, ppl=3.82, grad_norm=1.90, lr=2.15e-06, throughput=2353 tok/s
2025-11-27 07:44:03,219 - INFO - Epoch 1 Step 3550 (Global: 7550): loss=1.4825, ppl=4.40, grad_norm=1.41, lr=2.14e-06, throughput=2352 tok/s
2025-11-27 07:47:28,483 - INFO - Epoch 1 Step 3560 (Global: 7560): loss=1.7056, ppl=5.50, grad_norm=2.42, lr=2.12e-06, throughput=2338 tok/s
2025-11-27 07:50:53,076 - INFO - Epoch 1 Step 3570 (Global: 7570): loss=1.6305, ppl=5.11, grad_norm=2.27, lr=2.11e-06, throughput=2346 tok/s
2025-11-27 07:54:17,314 - INFO - Epoch 1 Step 3580 (Global: 7580): loss=1.5124, ppl=4.54, grad_norm=1.38, lr=2.09e-06, throughput=2350 tok/s
2025-11-27 07:57:41,930 - INFO - Epoch 1 Step 3590 (Global: 7590): loss=1.7213, ppl=5.59, grad_norm=1.35, lr=2.08e-06, throughput=2346 tok/s
2025-11-27 08:01:06,471 - INFO - Epoch 1 Step 3600 (Global: 7600): loss=1.4019, ppl=4.06, grad_norm=1.80, lr=2.07e-06, throughput=2347 tok/s
2025-11-27 08:04:31,537 - INFO - Epoch 1 Step 3610 (Global: 7610): loss=1.6517, ppl=5.22, grad_norm=1.52, lr=2.05e-06, throughput=2341 tok/s
2025-11-27 08:07:55,379 - INFO - Epoch 1 Step 3620 (Global: 7620): loss=1.4607, ppl=4.31, grad_norm=1.21, lr=2.04e-06, throughput=2355 tok/s
2025-11-27 08:11:19,959 - INFO - Epoch 1 Step 3630 (Global: 7630): loss=1.5729, ppl=4.82, grad_norm=1.47, lr=2.03e-06, throughput=2346 tok/s
2025-11-27 08:14:44,291 - INFO - Epoch 1 Step 3640 (Global: 7640): loss=1.4847, ppl=4.41, grad_norm=1.61, lr=2.01e-06, throughput=2349 tok/s
2025-11-27 08:18:07,792 - INFO - Epoch 1 Step 3650 (Global: 7650): loss=1.4906, ppl=4.44, grad_norm=1.46, lr=2.00e-06, throughput=2359 tok/s
2025-11-27 08:21:33,483 - INFO - Epoch 1 Step 3660 (Global: 7660): loss=1.6911, ppl=5.43, grad_norm=1.34, lr=1.99e-06, throughput=2334 tok/s
2025-11-27 08:25:04,883 - INFO - Epoch 1 Step 3670 (Global: 7670): loss=1.6647, ppl=5.28, grad_norm=1.20, lr=1.97e-06, throughput=2271 tok/s
2025-11-27 08:28:36,231 - INFO - Epoch 1 Step 3680 (Global: 7680): loss=1.5911, ppl=4.91, grad_norm=1.55, lr=1.96e-06, throughput=2271 tok/s
2025-11-27 08:32:08,692 - INFO - Epoch 1 Step 3690 (Global: 7690): loss=1.7500, ppl=5.75, grad_norm=1.76, lr=1.95e-06, throughput=2259 tok/s
2025-11-27 08:35:39,741 - INFO - Epoch 1 Step 3700 (Global: 7700): loss=1.6109, ppl=5.01, grad_norm=1.62, lr=1.93e-06, throughput=2274 tok/s
2025-11-27 08:39:10,504 - INFO - Epoch 1 Step 3710 (Global: 7710): loss=1.5751, ppl=4.83, grad_norm=1.41, lr=1.92e-06, throughput=2277 tok/s
2025-11-27 08:42:40,977 - INFO - Epoch 1 Step 3720 (Global: 7720): loss=1.7817, ppl=5.94, grad_norm=2.14, lr=1.91e-06, throughput=2281 tok/s
2025-11-27 08:46:12,148 - INFO - Epoch 1 Step 3730 (Global: 7730): loss=1.5511, ppl=4.72, grad_norm=1.33, lr=1.89e-06, throughput=2273 tok/s
2025-11-27 08:49:42,209 - INFO - Epoch 1 Step 3740 (Global: 7740): loss=1.5275, ppl=4.61, grad_norm=1.43, lr=1.88e-06, throughput=2285 tok/s
2025-11-27 08:53:08,202 - INFO - Epoch 1 Step 3750 (Global: 7750): loss=1.3528, ppl=3.87, grad_norm=1.44, lr=1.87e-06, throughput=2330 tok/s
2025-11-27 08:56:39,746 - INFO - Epoch 1 Step 3760 (Global: 7760): loss=1.6997, ppl=5.47, grad_norm=1.44, lr=1.85e-06, throughput=2269 tok/s
2025-11-27 09:00:11,055 - INFO - Epoch 1 Step 3770 (Global: 7770): loss=1.6531, ppl=5.22, grad_norm=2.16, lr=1.84e-06, throughput=2272 tok/s
2025-11-27 09:03:45,342 - INFO - Epoch 1 Step 3780 (Global: 7780): loss=1.6415, ppl=5.16, grad_norm=1.21, lr=1.83e-06, throughput=2240 tok/s
2025-11-27 09:07:18,453 - INFO - Epoch 1 Step 3790 (Global: 7790): loss=1.6298, ppl=5.10, grad_norm=1.71, lr=1.82e-06, throughput=2252 tok/s
2025-11-27 09:10:48,890 - INFO - Epoch 1 Step 3800 (Global: 7800): loss=1.7036, ppl=5.49, grad_norm=1.50, lr=1.80e-06, throughput=2281 tok/s
2025-11-27 09:14:19,119 - INFO - Epoch 1 Step 3810 (Global: 7810): loss=1.7550, ppl=5.78, grad_norm=2.45, lr=1.79e-06, throughput=2283 tok/s
2025-11-27 09:17:51,891 - INFO - Epoch 1 Step 3820 (Global: 7820): loss=1.6258, ppl=5.08, grad_norm=1.13, lr=1.78e-06, throughput=2256 tok/s
2025-11-27 09:21:23,338 - INFO - Epoch 1 Step 3830 (Global: 7830): loss=1.6043, ppl=4.97, grad_norm=1.38, lr=1.76e-06, throughput=2270 tok/s
2025-11-27 09:24:54,182 - INFO - Epoch 1 Step 3840 (Global: 7840): loss=1.8708, ppl=6.49, grad_norm=2.42, lr=1.75e-06, throughput=2277 tok/s
2025-11-27 09:28:24,902 - INFO - Epoch 1 Step 3850 (Global: 7850): loss=1.5286, ppl=4.61, grad_norm=1.38, lr=1.74e-06, throughput=2278 tok/s
2025-11-27 09:31:54,887 - INFO - Epoch 1 Step 3860 (Global: 7860): loss=1.3482, ppl=3.85, grad_norm=1.13, lr=1.73e-06, throughput=2286 tok/s
2025-11-27 09:35:24,821 - INFO - Epoch 1 Step 3870 (Global: 7870): loss=1.6552, ppl=5.23, grad_norm=1.18, lr=1.71e-06, throughput=2286 tok/s
2025-11-27 09:38:53,605 - INFO - Epoch 1 Step 3880 (Global: 7880): loss=1.5843, ppl=4.88, grad_norm=1.59, lr=1.70e-06, throughput=2299 tok/s
2025-11-27 09:42:18,640 - INFO - Epoch 1 Step 3890 (Global: 7890): loss=1.5446, ppl=4.69, grad_norm=1.52, lr=1.69e-06, throughput=2341 tok/s
2025-11-27 09:45:52,026 - INFO - Epoch 1 Step 3900 (Global: 7900): loss=1.3641, ppl=3.91, grad_norm=1.57, lr=1.68e-06, throughput=2249 tok/s
2025-11-27 09:49:23,985 - INFO - Epoch 1 Step 3910 (Global: 7910): loss=1.6889, ppl=5.41, grad_norm=1.37, lr=1.66e-06, throughput=2265 tok/s
2025-11-27 09:52:53,669 - INFO - Epoch 1 Step 3920 (Global: 7920): loss=1.4384, ppl=4.21, grad_norm=1.15, lr=1.65e-06, throughput=2289 tok/s
2025-11-27 09:56:25,184 - INFO - Epoch 1 Step 3930 (Global: 7930): loss=1.6867, ppl=5.40, grad_norm=1.60, lr=1.64e-06, throughput=2269 tok/s
2025-11-27 09:59:57,979 - INFO - Epoch 1 Step 3940 (Global: 7940): loss=1.4542, ppl=4.28, grad_norm=1.25, lr=1.63e-06, throughput=2256 tok/s
2025-11-27 10:03:28,185 - INFO - Epoch 1 Step 3950 (Global: 7950): loss=1.5389, ppl=4.66, grad_norm=1.52, lr=1.61e-06, throughput=2283 tok/s
2025-11-27 10:06:58,826 - INFO - Epoch 1 Step 3960 (Global: 7960): loss=1.5950, ppl=4.93, grad_norm=1.35, lr=1.60e-06, throughput=2279 tok/s
2025-11-27 10:10:30,538 - INFO - Epoch 1 Step 3970 (Global: 7970): loss=1.7780, ppl=5.92, grad_norm=1.70, lr=1.59e-06, throughput=2267 tok/s
2025-11-27 10:14:02,617 - INFO - Epoch 1 Step 3980 (Global: 7980): loss=1.5373, ppl=4.65, grad_norm=2.06, lr=1.58e-06, throughput=2263 tok/s
2025-11-27 10:17:29,544 - INFO - Epoch 1 Step 3990 (Global: 7990): loss=1.7778, ppl=5.92, grad_norm=1.57, lr=1.56e-06, throughput=2320 tok/s
2025-11-27 10:20:54,130 - INFO - Epoch 1 Step 4000 (Global: 8000): loss=1.8192, ppl=6.17, grad_norm=1.26, lr=1.55e-06, throughput=2346 tok/s
2025-11-27 10:20:54,131 - INFO -
Running validation at step 8000...
2025-11-27 10:32:54,275 - INFO - Validation loss: 1.6272, perplexity: 5.09
2025-11-27 10:32:54,276 - INFO -
======================================================================
2025-11-27 10:32:54,276 - INFO - Qualitative Evaluation Samples:
2025-11-27 10:32:54,277 - INFO - ======================================================================
2025-11-27 10:32:54,277 - INFO -
Sample 1 (ID: sample_141920_chunk_1):
2025-11-27 10:32:54,277 - INFO - Context: [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-27 10:32:54,277 - INFO - Generated: ' to the band\'s previous work, saying that "it\'s a little more experimental, a little more experimental than the last two, but it\'s still the same Codex and Keys, and it\'s still the same Codex and Keys...'
2025-11-27 10:32:54,278 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-27 10:32:54,278 - INFO - ----------------------------------------------------------------------
2025-11-27 10:32:54,279 - INFO -
Sample 2 (ID: sample_170543_chunk_2):
2025-11-27 10:32:54,279 - INFO - Context: [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-27 10:32:54,279 - INFO - Generated: 'aternalistic fraternal organizations. The Order of the Arrow was founded in 1926, and the first national convention was held in 1927. The Order of the Arrow was founded in 1928, and the first national...'
2025-11-27 10:32:54,280 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-27 10:32:54,280 - INFO - ----------------------------------------------------------------------
2025-11-27 10:32:54,281 - INFO -
Sample 3 (ID: sample_107152_chunk_9):
2025-11-27 10:32:54,281 - INFO - Context: [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-27 10:32:54,282 - INFO - Generated: " be defeated by Oga. Teimou's shadow group then defeated the Red Tails and the Shingetsu Teimou's shadow group. Teimou's shadow group then defeated the Red Tails and the Shingetsu Teimou's shadow grou..."
2025-11-27 10:32:54,282 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-27 10:32:54,282 - INFO - ----------------------------------------------------------------------
2025-11-27 10:32:54,283 - INFO -
Sample 4 (ID: sample_069148_chunk_0):
2025-11-27 10:32:54,283 - INFO - Context: [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-27 10:32:54,284 - INFO - Generated: '-01-01 | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff | 0x0000-0x00ff |...'
2025-11-27 10:32:54,284 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...'
2025-11-27 10:32:54,285 - INFO - ----------------------------------------------------------------------
2025-11-27 10:32:54,285 - INFO -
Sample 5 (ID: sample_103176_chunk_4):
2025-11-27 10:32:54,285 - INFO - Context: [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-27 10:32:54,286 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 | August 30, 2011 | PlayStation 3 | EA Tiburon ...'
2025-11-27 10:32:54,286 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...'
2025-11-27 10:32:54,286 - INFO - ----------------------------------------------------------------------
2025-11-27 10:32:54,288 - INFO -
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_8000.jsonl
2025-11-27 10:35:24,351 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-27 10:35:24,371 - INFO - New best validation loss: 1.6272, perplexity: 5.09
2025-11-27 10:38:47,829 - INFO - Epoch 1 Step 4010 (Global: 8010): loss=1.6169, ppl=5.04, grad_norm=1.58, lr=1.54e-06, throughput=2360 tok/s
2025-11-27 10:42:12,374 - INFO - Epoch 1 Step 4020 (Global: 8020): loss=1.5769, ppl=4.84, grad_norm=2.05, lr=1.53e-06, throughput=2347 tok/s
2025-11-27 10:45:34,363 - INFO - Epoch 1 Step 4030 (Global: 8030): loss=1.8082, ppl=6.10, grad_norm=1.34, lr=1.52e-06, throughput=2376 tok/s
2025-11-27 10:49:03,119 - INFO - Epoch 1 Step 4040 (Global: 8040): loss=1.7761, ppl=5.91, grad_norm=1.43, lr=1.50e-06, throughput=2299 tok/s
2025-11-27 10:52:34,907 - INFO - Epoch 1 Step 4050 (Global: 8050): loss=1.6383, ppl=5.15, grad_norm=1.64, lr=1.49e-06, throughput=2266 tok/s
2025-11-27 10:56:06,357 - INFO - Epoch 1 Step 4060 (Global: 8060): loss=1.5201, ppl=4.57, grad_norm=2.16, lr=1.48e-06, throughput=2270 tok/s
2025-11-27 10:59:37,783 - INFO - Epoch 1 Step 4070 (Global: 8070): loss=1.7167, ppl=5.57, grad_norm=1.34, lr=1.47e-06, throughput=2270 tok/s
2025-11-27 11:03:08,268 - INFO - Epoch 1 Step 4080 (Global: 8080): loss=1.5123, ppl=4.54, grad_norm=1.17, lr=1.46e-06, throughput=2280 tok/s
2025-11-27 11:06:38,312 - INFO - Epoch 1 Step 4090 (Global: 8090): loss=1.5805, ppl=4.86, grad_norm=1.23, lr=1.44e-06, throughput=2285 tok/s
2025-11-27 11:10:10,480 - INFO - Epoch 1 Step 4100 (Global: 8100): loss=1.5481, ppl=4.70, grad_norm=1.41, lr=1.43e-06, throughput=2262 tok/s
2025-11-27 11:13:40,862 - INFO - Epoch 1 Step 4110 (Global: 8110): loss=1.6666, ppl=5.29, grad_norm=1.17, lr=1.42e-06, throughput=2282 tok/s
2025-11-27 11:17:11,528 - INFO - Epoch 1 Step 4120 (Global: 8120): loss=1.5787, ppl=4.85, grad_norm=1.76, lr=1.41e-06, throughput=2278 tok/s
2025-11-27 11:20:39,851 - INFO - Epoch 1 Step 4130 (Global: 8130): loss=1.7777, ppl=5.92, grad_norm=1.86, lr=1.40e-06, throughput=2304 tok/s
2025-11-27 11:24:09,760 - INFO - Epoch 1 Step 4140 (Global: 8140): loss=1.8461, ppl=6.34, grad_norm=1.44, lr=1.39e-06, throughput=2287 tok/s
2025-11-27 11:27:39,304 - INFO - Epoch 1 Step 4150 (Global: 8150): loss=1.6895, ppl=5.42, grad_norm=2.03, lr=1.37e-06, throughput=2291 tok/s
2025-11-27 11:31:09,428 - INFO - Epoch 1 Step 4160 (Global: 8160): loss=1.5404, ppl=4.67, grad_norm=1.09, lr=1.36e-06, throughput=2284 tok/s
2025-11-27 11:34:40,033 - INFO - Epoch 1 Step 4170 (Global: 8170): loss=1.6484, ppl=5.20, grad_norm=1.38, lr=1.35e-06, throughput=2279 tok/s
2025-11-27 11:38:11,850 - INFO - Epoch 1 Step 4180 (Global: 8180): loss=1.5886, ppl=4.90, grad_norm=1.46, lr=1.34e-06, throughput=2266 tok/s
2025-11-27 11:41:41,933 - INFO - Epoch 1 Step 4190 (Global: 8190): loss=1.7589, ppl=5.81, grad_norm=1.58, lr=1.33e-06, throughput=2285 tok/s
2025-11-27 11:45:10,504 - INFO - Epoch 1 Step 4200 (Global: 8200): loss=1.5511, ppl=4.72, grad_norm=1.42, lr=1.32e-06, throughput=2301 tok/s
2025-11-27 11:48:37,068 - INFO - Epoch 1 Step 4210 (Global: 8210): loss=2.0304, ppl=7.62, grad_norm=1.53, lr=1.31e-06, throughput=2324 tok/s
2025-11-27 11:52:03,812 - INFO - Epoch 1 Step 4220 (Global: 8220): loss=1.8178, ppl=6.16, grad_norm=1.40, lr=1.29e-06, throughput=2322 tok/s
2025-11-27 11:55:29,639 - INFO - Epoch 1 Step 4230 (Global: 8230): loss=1.6033, ppl=4.97, grad_norm=1.45, lr=1.28e-06, throughput=2332 tok/s
2025-11-27 11:58:55,103 - INFO - Epoch 1 Step 4240 (Global: 8240): loss=1.5554, ppl=4.74, grad_norm=1.13, lr=1.27e-06, throughput=2336 tok/s
2025-11-27 12:02:19,766 - INFO - Epoch 1 Step 4250 (Global: 8250): loss=1.3641, ppl=3.91, grad_norm=1.24, lr=1.26e-06, throughput=2345 tok/s
2025-11-27 12:05:44,995 - INFO - Epoch 1 Step 4260 (Global: 8260): loss=1.6582, ppl=5.25, grad_norm=1.56, lr=1.25e-06, throughput=2339 tok/s
2025-11-27 12:09:09,473 - INFO - Epoch 1 Step 4270 (Global: 8270): loss=1.6418, ppl=5.16, grad_norm=1.34, lr=1.24e-06, throughput=2347 tok/s
2025-11-27 12:12:31,949 - INFO - Epoch 1 Step 4280 (Global: 8280): loss=1.7062, ppl=5.51, grad_norm=1.27, lr=1.23e-06, throughput=2371 tok/s
2025-11-27 12:15:54,938 - INFO - Epoch 1 Step 4290 (Global: 8290): loss=1.7156, ppl=5.56, grad_norm=1.49, lr=1.22e-06, throughput=2365 tok/s
2025-11-27 12:19:17,487 - INFO - Epoch 1 Step 4300 (Global: 8300): loss=1.5761, ppl=4.84, grad_norm=1.45, lr=1.21e-06, throughput=2370 tok/s
2025-11-27 12:22:40,527 - INFO - Epoch 1 Step 4310 (Global: 8310): loss=1.7613, ppl=5.82, grad_norm=1.15, lr=1.20e-06, throughput=2364 tok/s
2025-11-27 12:26:04,423 - INFO - Epoch 1 Step 4320 (Global: 8320): loss=1.5934, ppl=4.92, grad_norm=1.23, lr=1.18e-06, throughput=2354 tok/s
2025-11-27 12:29:32,426 - INFO - Epoch 1 Step 4330 (Global: 8330): loss=1.5424, ppl=4.68, grad_norm=1.31, lr=1.17e-06, throughput=2308 tok/s
2025-11-27 12:32:59,198 - INFO - Epoch 1 Step 4340 (Global: 8340): loss=1.6180, ppl=5.04, grad_norm=1.55, lr=1.16e-06, throughput=2321 tok/s
2025-11-27 12:36:24,747 - INFO - Epoch 1 Step 4350 (Global: 8350): loss=1.9459, ppl=7.00, grad_norm=1.33, lr=1.15e-06, throughput=2335 tok/s
2025-11-27 12:39:51,139 - INFO - Epoch 1 Step 4360 (Global: 8360): loss=1.5864, ppl=4.89, grad_norm=1.27, lr=1.14e-06, throughput=2326 tok/s
2025-11-27 12:43:15,943 - INFO - Epoch 1 Step 4370 (Global: 8370): loss=1.5319, ppl=4.63, grad_norm=1.53, lr=1.13e-06, throughput=2344 tok/s
2025-11-27 12:46:39,322 - INFO - Epoch 1 Step 4380 (Global: 8380): loss=1.6704, ppl=5.31, grad_norm=1.16, lr=1.12e-06, throughput=2360 tok/s
2025-11-27 12:50:02,945 - INFO - Epoch 1 Step 4390 (Global: 8390): loss=1.6581, ppl=5.25, grad_norm=1.27, lr=1.11e-06, throughput=2357 tok/s
2025-11-27 12:53:27,610 - INFO - Epoch 1 Step 4400 (Global: 8400): loss=1.4405, ppl=4.22, grad_norm=1.10, lr=1.10e-06, throughput=2345 tok/s
2025-11-27 12:56:50,335 - INFO - Epoch 1 Step 4410 (Global: 8410): loss=1.3959, ppl=4.04, grad_norm=4.31, lr=1.09e-06, throughput=2368 tok/s
2025-11-27 13:00:21,053 - INFO - Epoch 1 Step 4420 (Global: 8420): loss=1.3936, ppl=4.03, grad_norm=1.73, lr=1.08e-06, throughput=2278 tok/s
2025-11-27 13:03:56,224 - INFO - Epoch 1 Step 4430 (Global: 8430): loss=1.5997, ppl=4.95, grad_norm=2.05, lr=1.07e-06, throughput=2231 tok/s
2025-11-27 13:07:31,974 - INFO - Epoch 1 Step 4440 (Global: 8440): loss=1.6549, ppl=5.23, grad_norm=1.55, lr=1.06e-06, throughput=2225 tok/s
2025-11-27 13:11:05,763 - INFO - Epoch 1 Step 4450 (Global: 8450): loss=1.5972, ppl=4.94, grad_norm=1.37, lr=1.05e-06, throughput=2245 tok/s
2025-11-27 13:14:37,380 - INFO - Epoch 1 Step 4460 (Global: 8460): loss=1.4822, ppl=4.40, grad_norm=2.23, lr=1.04e-06, throughput=2268 tok/s
2025-11-27 13:18:09,818 - INFO - Epoch 1 Step 4470 (Global: 8470): loss=1.3327, ppl=3.79, grad_norm=1.17, lr=1.03e-06, throughput=2259 tok/s
2025-11-27 13:21:40,540 - INFO - Epoch 1 Step 4480 (Global: 8480): loss=1.6039, ppl=4.97, grad_norm=1.55, lr=1.02e-06, throughput=2278 tok/s
2025-11-27 13:25:13,910 - INFO - Epoch 1 Step 4490 (Global: 8490): loss=1.5592, ppl=4.75, grad_norm=1.16, lr=1.01e-06, throughput=2250 tok/s
2025-11-27 13:28:45,359 - INFO - Epoch 1 Step 4500 (Global: 8500): loss=1.5205, ppl=4.57, grad_norm=1.27, lr=9.96e-07, throughput=2270 tok/s
2025-11-27 13:32:18,632 - INFO - Epoch 1 Step 4510 (Global: 8510): loss=1.6607, ppl=5.26, grad_norm=2.19, lr=9.86e-07, throughput=2251 tok/s
2025-11-27 13:35:54,203 - INFO - Epoch 1 Step 4520 (Global: 8520): loss=1.5474, ppl=4.70, grad_norm=1.21, lr=9.76e-07, throughput=2227 tok/s
2025-11-27 13:39:28,969 - INFO - Epoch 1 Step 4530 (Global: 8530): loss=1.7638, ppl=5.83, grad_norm=1.96, lr=9.67e-07, throughput=2235 tok/s
2025-11-27 13:43:05,159 - INFO - Epoch 1 Step 4540 (Global: 8540): loss=1.6483, ppl=5.20, grad_norm=1.55, lr=9.57e-07, throughput=2220 tok/s
2025-11-27 13:46:38,594 - INFO - Epoch 1 Step 4550 (Global: 8550): loss=1.7897, ppl=5.99, grad_norm=1.45, lr=9.47e-07, throughput=2249 tok/s
2025-11-27 13:50:12,398 - INFO - Epoch 1 Step 4560 (Global: 8560): loss=1.6006, ppl=4.96, grad_norm=1.30, lr=9.37e-07, throughput=2245 tok/s
2025-11-27 13:53:46,524 - INFO - Epoch 1 Step 4570 (Global: 8570): loss=1.5500, ppl=4.71, grad_norm=1.49, lr=9.27e-07, throughput=2242 tok/s
2025-11-27 13:57:19,414 - INFO - Epoch 1 Step 4580 (Global: 8580): loss=1.8392, ppl=6.29, grad_norm=1.34, lr=9.18e-07, throughput=2255 tok/s
2025-11-27 14:00:52,756 - INFO - Epoch 1 Step 4590 (Global: 8590): loss=1.3857, ppl=4.00, grad_norm=1.46, lr=9.08e-07, throughput=2250 tok/s
2025-11-27 14:04:26,845 - INFO - Epoch 1 Step 4600 (Global: 8600): loss=1.3973, ppl=4.04, grad_norm=1.28, lr=8.98e-07, throughput=2242 tok/s
2025-11-27 14:08:00,326 - INFO - Epoch 1 Step 4610 (Global: 8610): loss=1.4807, ppl=4.40, grad_norm=2.14, lr=8.89e-07, throughput=2248 tok/s
2025-11-27 14:11:37,103 - INFO - Epoch 1 Step 4620 (Global: 8620): loss=1.4324, ppl=4.19, grad_norm=2.70, lr=8.79e-07, throughput=2214 tok/s
2025-11-27 14:15:10,056 - INFO - Epoch 1 Step 4630 (Global: 8630): loss=1.7810, ppl=5.94, grad_norm=1.35, lr=8.70e-07, throughput=2254 tok/s
2025-11-27 14:18:41,040 - INFO - Epoch 1 Step 4640 (Global: 8640): loss=1.2881, ppl=3.63, grad_norm=1.95, lr=8.60e-07, throughput=2275 tok/s
2025-11-27 14:22:16,435 - INFO - Epoch 1 Step 4650 (Global: 8650): loss=1.6443, ppl=5.18, grad_norm=1.48, lr=8.51e-07, throughput=2228 tok/s
2025-11-27 14:25:48,432 - INFO - Epoch 1 Step 4660 (Global: 8660): loss=1.4110, ppl=4.10, grad_norm=1.92, lr=8.42e-07, throughput=2264 tok/s
2025-11-27 14:29:23,294 - INFO - Epoch 1 Step 4670 (Global: 8670): loss=1.4625, ppl=4.32, grad_norm=1.20, lr=8.32e-07, throughput=2234 tok/s
2025-11-27 14:32:57,336 - INFO - Epoch 1 Step 4680 (Global: 8680): loss=1.6921, ppl=5.43, grad_norm=1.73, lr=8.23e-07, throughput=2243 tok/s
2025-11-27 14:36:30,703 - INFO - Epoch 1 Step 4690 (Global: 8690): loss=1.4586, ppl=4.30, grad_norm=1.23, lr=8.14e-07, throughput=2250 tok/s
2025-11-27 14:40:02,647 - INFO - Epoch 1 Step 4700 (Global: 8700): loss=1.8357, ppl=6.27, grad_norm=1.52, lr=8.05e-07, throughput=2265 tok/s
2025-11-27 14:43:36,865 - INFO - Epoch 1 Step 4710 (Global: 8710): loss=1.4507, ppl=4.27, grad_norm=2.09, lr=7.96e-07, throughput=2241 tok/s
2025-11-27 14:47:07,839 - INFO - Epoch 1 Step 4720 (Global: 8720): loss=1.6126, ppl=5.02, grad_norm=2.44, lr=7.87e-07, throughput=2275 tok/s
2025-11-27 14:50:42,135 - INFO - Epoch 1 Step 4730 (Global: 8730): loss=1.5844, ppl=4.88, grad_norm=2.08, lr=7.78e-07, throughput=2240 tok/s
2025-11-27 14:54:16,557 - INFO - Epoch 1 Step 4740 (Global: 8740): loss=1.3759, ppl=3.96, grad_norm=1.50, lr=7.69e-07, throughput=2239 tok/s
2025-11-27 14:57:48,939 - INFO - Epoch 1 Step 4750 (Global: 8750): loss=1.6488, ppl=5.20, grad_norm=1.63, lr=7.60e-07, throughput=2260 tok/s
2025-11-27 15:01:24,002 - INFO - Epoch 1 Step 4760 (Global: 8760): loss=1.7100, ppl=5.53, grad_norm=1.40, lr=7.51e-07, throughput=2232 tok/s
2025-11-27 15:04:55,790 - INFO - Epoch 1 Step 4770 (Global: 8770): loss=1.5890, ppl=4.90, grad_norm=1.30, lr=7.42e-07, throughput=2266 tok/s
2025-11-27 15:08:30,219 - INFO - Epoch 1 Step 4780 (Global: 8780): loss=1.6121, ppl=5.01, grad_norm=1.54, lr=7.33e-07, throughput=2239 tok/s
2025-11-27 15:12:02,797 - INFO - Epoch 1 Step 4790 (Global: 8790): loss=1.6103, ppl=5.00, grad_norm=1.26, lr=7.25e-07, throughput=2258 tok/s
2025-11-27 15:15:38,173 - INFO - Epoch 1 Step 4800 (Global: 8800): loss=1.3870, ppl=4.00, grad_norm=1.17, lr=7.16e-07, throughput=2229 tok/s
2025-11-27 15:19:10,323 - INFO - Epoch 1 Step 4810 (Global: 8810): loss=1.8532, ppl=6.38, grad_norm=1.17, lr=7.07e-07, throughput=2263 tok/s
2025-11-27 15:22:46,125 - INFO - Epoch 1 Step 4820 (Global: 8820): loss=1.5743, ppl=4.83, grad_norm=1.46, lr=6.99e-07, throughput=2224 tok/s
2025-11-27 15:26:21,093 - INFO - Epoch 1 Step 4830 (Global: 8830): loss=1.5621, ppl=4.77, grad_norm=1.52, lr=6.90e-07, throughput=2233 tok/s
2025-11-27 15:29:53,346 - INFO - Epoch 1 Step 4840 (Global: 8840): loss=1.5694, ppl=4.80, grad_norm=2.22, lr=6.82e-07, throughput=2261 tok/s
2025-11-27 15:33:29,510 - INFO - Epoch 1 Step 4850 (Global: 8850): loss=1.7044, ppl=5.50, grad_norm=1.42, lr=6.74e-07, throughput=2221 tok/s
2025-11-27 15:37:03,147 - INFO - Epoch 1 Step 4860 (Global: 8860): loss=1.5101, ppl=4.53, grad_norm=1.35, lr=6.65e-07, throughput=2247 tok/s
2025-11-27 15:40:36,832 - INFO - Epoch 1 Step 4870 (Global: 8870): loss=1.8138, ppl=6.13, grad_norm=1.19, lr=6.57e-07, throughput=2246 tok/s
2025-11-27 15:44:09,397 - INFO - Epoch 1 Step 4880 (Global: 8880): loss=1.5810, ppl=4.86, grad_norm=1.39, lr=6.49e-07, throughput=2258 tok/s
2025-11-27 15:47:45,166 - INFO - Epoch 1 Step 4890 (Global: 8890): loss=1.6636, ppl=5.28, grad_norm=1.36, lr=6.40e-07, throughput=2225 tok/s
2025-11-27 15:51:15,830 - INFO - Epoch 1 Step 4900 (Global: 8900): loss=1.6452, ppl=5.18, grad_norm=1.58, lr=6.32e-07, throughput=2279 tok/s
2025-11-27 15:54:49,104 - INFO - Epoch 1 Step 4910 (Global: 8910): loss=1.7205, ppl=5.59, grad_norm=1.78, lr=6.24e-07, throughput=2251 tok/s
2025-11-27 15:58:21,882 - INFO - Epoch 1 Step 4920 (Global: 8920): loss=1.6994, ppl=5.47, grad_norm=1.44, lr=6.16e-07, throughput=2256 tok/s
2025-11-27 16:01:56,198 - INFO - Epoch 1 Step 4930 (Global: 8930): loss=1.5468, ppl=4.70, grad_norm=2.00, lr=6.08e-07, throughput=2240 tok/s
2025-11-27 16:05:30,658 - INFO - Epoch 1 Step 4940 (Global: 8940): loss=1.5956, ppl=4.93, grad_norm=1.38, lr=6.00e-07, throughput=2238 tok/s
2025-11-27 16:09:01,476 - INFO - Epoch 1 Step 4950 (Global: 8950): loss=1.3347, ppl=3.80, grad_norm=1.47, lr=5.92e-07, throughput=2277 tok/s
2025-11-27 16:12:35,661 - INFO - Epoch 1 Step 4960 (Global: 8960): loss=1.5259, ppl=4.60, grad_norm=1.63, lr=5.84e-07, throughput=2241 tok/s
2025-11-27 16:16:07,366 - INFO - Epoch 1 Step 4970 (Global: 8970): loss=1.6130, ppl=5.02, grad_norm=1.55, lr=5.76e-07, throughput=2267 tok/s
2025-11-27 16:19:42,070 - INFO - Epoch 1 Step 4980 (Global: 8980): loss=1.6231, ppl=5.07, grad_norm=1.24, lr=5.68e-07, throughput=2236 tok/s
2025-11-27 16:23:13,213 - INFO - Epoch 1 Step 4990 (Global: 8990): loss=1.3964, ppl=4.04, grad_norm=1.48, lr=5.61e-07, throughput=2273 tok/s
2025-11-27 16:26:46,175 - INFO - Epoch 1 Step 5000 (Global: 9000): loss=1.3289, ppl=3.78, grad_norm=1.29, lr=5.53e-07, throughput=2254 tok/s
2025-11-27 16:30:21,009 - INFO - Epoch 1 Step 5010 (Global: 9010): loss=1.7395, ppl=5.69, grad_norm=1.27, lr=5.45e-07, throughput=2234 tok/s
2025-11-27 16:33:51,669 - INFO - Epoch 1 Step 5020 (Global: 9020): loss=1.5989, ppl=4.95, grad_norm=1.40, lr=5.38e-07, throughput=2279 tok/s
2025-11-27 16:37:24,749 - INFO - Epoch 1 Step 5030 (Global: 9030): loss=1.5521, ppl=4.72, grad_norm=1.20, lr=5.30e-07, throughput=2253 tok/s
2025-11-27 16:40:59,702 - INFO - Epoch 1 Step 5040 (Global: 9040): loss=1.6465, ppl=5.19, grad_norm=2.00, lr=5.23e-07, throughput=2233 tok/s
2025-11-27 16:44:32,529 - INFO - Epoch 1 Step 5050 (Global: 9050): loss=1.8447, ppl=6.33, grad_norm=1.61, lr=5.15e-07, throughput=2255 tok/s
2025-11-27 16:48:04,774 - INFO - Epoch 1 Step 5060 (Global: 9060): loss=1.6904, ppl=5.42, grad_norm=1.17, lr=5.08e-07, throughput=2262 tok/s
2025-11-27 16:51:39,208 - INFO - Epoch 1 Step 5070 (Global: 9070): loss=1.5692, ppl=4.80, grad_norm=1.60, lr=5.01e-07, throughput=2238 tok/s
2025-11-27 16:55:12,500 - INFO - Epoch 1 Step 5080 (Global: 9080): loss=1.4458, ppl=4.25, grad_norm=1.13, lr=4.93e-07, throughput=2250 tok/s
2025-11-27 16:58:44,072 - INFO - Epoch 1 Step 5090 (Global: 9090): loss=1.4829, ppl=4.41, grad_norm=1.19, lr=4.86e-07, throughput=2269 tok/s
2025-11-27 17:02:17,357 - INFO - Epoch 1 Step 5100 (Global: 9100): loss=1.6237, ppl=5.07, grad_norm=1.68, lr=4.79e-07, throughput=2251 tok/s
2025-11-27 17:05:51,243 - INFO - Epoch 1 Step 5110 (Global: 9110): loss=1.6518, ppl=5.22, grad_norm=1.37, lr=4.72e-07, throughput=2244 tok/s
2025-11-27 17:09:24,468 - INFO - Epoch 1 Step 5120 (Global: 9120): loss=1.4125, ppl=4.11, grad_norm=1.54, lr=4.65e-07, throughput=2251 tok/s
2025-11-27 17:12:55,464 - INFO - Epoch 1 Step 5130 (Global: 9130): loss=1.4459, ppl=4.25, grad_norm=1.84, lr=4.58e-07, throughput=2275 tok/s
2025-11-27 17:16:27,875 - INFO - Epoch 1 Step 5140 (Global: 9140): loss=1.5491, ppl=4.71, grad_norm=1.63, lr=4.51e-07, throughput=2260 tok/s
2025-11-27 17:20:00,915 - INFO - Epoch 1 Step 5150 (Global: 9150): loss=1.4207, ppl=4.14, grad_norm=1.54, lr=4.44e-07, throughput=2253 tok/s
2025-11-27 17:23:32,991 - INFO - Epoch 1 Step 5160 (Global: 9160): loss=1.8070, ppl=6.09, grad_norm=1.30, lr=4.37e-07, throughput=2263 tok/s
2025-11-27 17:27:01,914 - INFO - Epoch 1 Step 5170 (Global: 9170): loss=1.5653, ppl=4.78, grad_norm=1.12, lr=4.30e-07, throughput=2298 tok/s
2025-11-27 17:30:33,418 - INFO - Epoch 1 Step 5180 (Global: 9180): loss=1.4778, ppl=4.38, grad_norm=1.46, lr=4.23e-07, throughput=2269 tok/s
2025-11-27 17:34:04,641 - INFO - Epoch 1 Step 5190 (Global: 9190): loss=1.8118, ppl=6.12, grad_norm=1.39, lr=4.17e-07, throughput=2272 tok/s
2025-11-27 17:37:37,933 - INFO - Epoch 1 Step 5200 (Global: 9200): loss=1.7735, ppl=5.89, grad_norm=1.95, lr=4.10e-07, throughput=2250 tok/s
2025-11-27 17:41:09,322 - INFO - Epoch 1 Step 5210 (Global: 9210): loss=1.8240, ppl=6.20, grad_norm=1.12, lr=4.03e-07, throughput=2271 tok/s
2025-11-27 17:44:39,621 - INFO - Epoch 1 Step 5220 (Global: 9220): loss=1.7174, ppl=5.57, grad_norm=1.45, lr=3.97e-07, throughput=2282 tok/s
2025-11-27 17:48:12,323 - INFO - Epoch 1 Step 5230 (Global: 9230): loss=1.3940, ppl=4.03, grad_norm=1.73, lr=3.90e-07, throughput=2257 tok/s
2025-11-27 17:51:45,668 - INFO - Epoch 1 Step 5240 (Global: 9240): loss=1.5710, ppl=4.81, grad_norm=2.22, lr=3.84e-07, throughput=2250 tok/s
2025-11-27 17:55:16,907 - INFO - Epoch 1 Step 5250 (Global: 9250): loss=1.5609, ppl=4.76, grad_norm=1.78, lr=3.77e-07, throughput=2272 tok/s
2025-11-27 17:58:49,802 - INFO - Epoch 1 Step 5260 (Global: 9260): loss=1.5694, ppl=4.80, grad_norm=1.62, lr=3.71e-07, throughput=2255 tok/s
2025-11-27 18:02:24,012 - INFO - Epoch 1 Step 5270 (Global: 9270): loss=1.5508, ppl=4.72, grad_norm=1.15, lr=3.65e-07, throughput=2241 tok/s
2025-11-27 18:05:57,942 - INFO - Epoch 1 Step 5280 (Global: 9280): loss=1.5768, ppl=4.84, grad_norm=1.30, lr=3.58e-07, throughput=2244 tok/s
2025-11-27 18:09:31,469 - INFO - Epoch 1 Step 5290 (Global: 9290): loss=1.4960, ppl=4.46, grad_norm=1.20, lr=3.52e-07, throughput=2248 tok/s
2025-11-27 18:13:03,849 - INFO - Epoch 1 Step 5300 (Global: 9300): loss=1.4750, ppl=4.37, grad_norm=1.53, lr=3.46e-07, throughput=2260 tok/s
2025-11-27 18:16:34,927 - INFO - Epoch 1 Step 5310 (Global: 9310): loss=1.3901, ppl=4.02, grad_norm=1.44, lr=3.40e-07, throughput=2274 tok/s
2025-11-27 18:20:08,431 - INFO - Epoch 1 Step 5320 (Global: 9320): loss=1.5485, ppl=4.70, grad_norm=1.69, lr=3.34e-07, throughput=2248 tok/s
2025-11-27 18:23:41,502 - INFO - Epoch 1 Step 5330 (Global: 9330): loss=1.6942, ppl=5.44, grad_norm=1.37, lr=3.28e-07, throughput=2253 tok/s
2025-11-27 18:27:15,281 - INFO - Epoch 1 Step 5340 (Global: 9340): loss=1.5931, ppl=4.92, grad_norm=1.87, lr=3.22e-07, throughput=2245 tok/s
2025-11-27 18:30:47,817 - INFO - Epoch 1 Step 5350 (Global: 9350): loss=1.5405, ppl=4.67, grad_norm=1.24, lr=3.16e-07, throughput=2258 tok/s
2025-11-27 18:34:18,938 - INFO - Epoch 1 Step 5360 (Global: 9360): loss=1.3512, ppl=3.86, grad_norm=1.48, lr=3.10e-07, throughput=2274 tok/s
2025-11-27 18:37:50,990 - INFO - Epoch 1 Step 5370 (Global: 9370): loss=1.5083, ppl=4.52, grad_norm=1.63, lr=3.05e-07, throughput=2264 tok/s
2025-11-27 18:41:24,277 - INFO - Epoch 1 Step 5380 (Global: 9380): loss=1.7409, ppl=5.70, grad_norm=2.09, lr=2.99e-07, throughput=2251 tok/s
2025-11-27 18:44:57,980 - INFO - Epoch 1 Step 5390 (Global: 9390): loss=1.3469, ppl=3.85, grad_norm=1.70, lr=2.93e-07, throughput=2246 tok/s
2025-11-27 18:48:30,379 - INFO - Epoch 1 Step 5400 (Global: 9400): loss=1.5145, ppl=4.55, grad_norm=1.27, lr=2.88e-07, throughput=2260 tok/s
2025-11-27 18:52:03,238 - INFO - Epoch 1 Step 5410 (Global: 9410): loss=1.6546, ppl=5.23, grad_norm=1.77, lr=2.82e-07, throughput=2255 tok/s
2025-11-27 18:55:34,767 - INFO - Epoch 1 Step 5420 (Global: 9420): loss=1.5002, ppl=4.48, grad_norm=1.41, lr=2.76e-07, throughput=2269 tok/s
2025-11-27 18:59:05,365 - INFO - Epoch 1 Step 5430 (Global: 9430): loss=1.7195, ppl=5.58, grad_norm=2.06, lr=2.71e-07, throughput=2279 tok/s
2025-11-27 19:02:32,445 - INFO - Epoch 1 Step 5440 (Global: 9440): loss=1.5165, ppl=4.56, grad_norm=1.55, lr=2.66e-07, throughput=2318 tok/s
2025-11-27 19:05:57,046 - INFO - Epoch 1 Step 5450 (Global: 9450): loss=1.5921, ppl=4.91, grad_norm=1.29, lr=2.60e-07, throughput=2346 tok/s
2025-11-27 19:09:21,278 - INFO - Epoch 1 Step 5460 (Global: 9460): loss=1.8472, ppl=6.34, grad_norm=1.35, lr=2.55e-07, throughput=2350 tok/s
2025-11-27 19:12:46,679 - INFO - Epoch 1 Step 5470 (Global: 9470): loss=1.6808, ppl=5.37, grad_norm=1.72, lr=2.50e-07, throughput=2337 tok/s
2025-11-27 19:16:10,957 - INFO - Epoch 1 Step 5480 (Global: 9480): loss=1.6737, ppl=5.33, grad_norm=1.27, lr=2.44e-07, throughput=2350 tok/s
2025-11-27 19:19:35,185 - INFO - Epoch 1 Step 5490 (Global: 9490): loss=1.8276, ppl=6.22, grad_norm=1.32, lr=2.39e-07, throughput=2350 tok/s
2025-11-27 19:22:59,284 - INFO - Epoch 1 Step 5500 (Global: 9500): loss=1.5507, ppl=4.71, grad_norm=1.49, lr=2.34e-07, throughput=2352 tok/s
2025-11-27 19:26:23,626 - INFO - Epoch 1 Step 5510 (Global: 9510): loss=1.6971, ppl=5.46, grad_norm=1.20, lr=2.29e-07, throughput=2349 tok/s
2025-11-27 19:29:51,020 - INFO - Epoch 1 Step 5520 (Global: 9520): loss=1.8513, ppl=6.37, grad_norm=1.38, lr=2.24e-07, throughput=2314 tok/s
2025-11-27 19:33:26,812 - INFO - Epoch 1 Step 5530 (Global: 9530): loss=1.6369, ppl=5.14, grad_norm=1.30, lr=2.19e-07, throughput=2224 tok/s
2025-11-27 19:37:01,127 - INFO - Epoch 1 Step 5540 (Global: 9540): loss=1.6972, ppl=5.46, grad_norm=1.37, lr=2.14e-07, throughput=2240 tok/s
2025-11-27 19:40:33,251 - INFO - Epoch 1 Step 5550 (Global: 9550): loss=1.6032, ppl=4.97, grad_norm=1.09, lr=2.10e-07, throughput=2263 tok/s
2025-11-27 19:44:05,945 - INFO - Epoch 1 Step 5560 (Global: 9560): loss=1.6616, ppl=5.27, grad_norm=1.48, lr=2.05e-07, throughput=2257 tok/s
2025-11-27 19:47:37,637 - INFO - Epoch 1 Step 5570 (Global: 9570): loss=1.9210, ppl=6.83, grad_norm=1.33, lr=2.00e-07, throughput=2267 tok/s
2025-11-27 19:51:10,833 - INFO - Epoch 1 Step 5580 (Global: 9580): loss=1.5385, ppl=4.66, grad_norm=1.73, lr=1.95e-07, throughput=2251 tok/s
2025-11-27 19:54:44,814 - INFO - Epoch 1 Step 5590 (Global: 9590): loss=1.6455, ppl=5.18, grad_norm=2.11, lr=1.91e-07, throughput=2243 tok/s
2025-11-27 19:58:17,420 - INFO - Epoch 1 Step 5600 (Global: 9600): loss=1.5719, ppl=4.82, grad_norm=1.27, lr=1.86e-07, throughput=2258 tok/s
2025-11-27 20:01:49,899 - INFO - Epoch 1 Step 5610 (Global: 9610): loss=1.6549, ppl=5.23, grad_norm=2.44, lr=1.82e-07, throughput=2259 tok/s
2025-11-27 20:05:23,285 - INFO - Epoch 1 Step 5620 (Global: 9620): loss=1.8481, ppl=6.35, grad_norm=1.28, lr=1.77e-07, throughput=2249 tok/s
2025-11-27 20:09:00,581 - INFO - Epoch 1 Step 5630 (Global: 9630): loss=1.8733, ppl=6.51, grad_norm=1.73, lr=1.73e-07, throughput=2209 tok/s
2025-11-27 20:12:29,542 - INFO - Epoch 1 Step 5640 (Global: 9640): loss=1.7226, ppl=5.60, grad_norm=1.57, lr=1.68e-07, throughput=2297 tok/s
2025-11-27 20:15:59,749 - INFO - Epoch 1 Step 5650 (Global: 9650): loss=1.5968, ppl=4.94, grad_norm=1.52, lr=1.64e-07, throughput=2283 tok/s
2025-11-27 20:19:27,760 - INFO - Epoch 1 Step 5660 (Global: 9660): loss=1.7941, ppl=6.01, grad_norm=1.27, lr=1.60e-07, throughput=2308 tok/s
2025-11-27 20:22:56,487 - INFO - Epoch 1 Step 5670 (Global: 9670): loss=1.7409, ppl=5.70, grad_norm=1.79, lr=1.56e-07, throughput=2300 tok/s
2025-11-27 20:26:23,063 - INFO - Epoch 1 Step 5680 (Global: 9680): loss=1.4597, ppl=4.30, grad_norm=1.11, lr=1.52e-07, throughput=2324 tok/s
2025-11-27 20:29:58,805 - INFO - Epoch 1 Step 5690 (Global: 9690): loss=1.5771, ppl=4.84, grad_norm=1.23, lr=1.48e-07, throughput=2225 tok/s
2025-11-27 20:33:25,268 - INFO - Epoch 1 Step 5700 (Global: 9700): loss=1.7536, ppl=5.78, grad_norm=1.37, lr=1.44e-07, throughput=2325 tok/s
2025-11-27 20:36:51,193 - INFO - Epoch 1 Step 5710 (Global: 9710): loss=1.7731, ppl=5.89, grad_norm=1.96, lr=1.40e-07, throughput=2331 tok/s
2025-11-27 20:40:18,492 - INFO - Epoch 1 Step 5720 (Global: 9720): loss=1.5060, ppl=4.51, grad_norm=1.91, lr=1.36e-07, throughput=2316 tok/s
2025-11-27 20:43:45,709 - INFO - Epoch 1 Step 5730 (Global: 9730): loss=1.7986, ppl=6.04, grad_norm=1.66, lr=1.32e-07, throughput=2316 tok/s
2025-11-27 20:47:12,315 - INFO - Epoch 1 Step 5740 (Global: 9740): loss=1.5726, ppl=4.82, grad_norm=1.30, lr=1.28e-07, throughput=2323 tok/s
2025-11-27 20:50:39,756 - INFO - Epoch 1 Step 5750 (Global: 9750): loss=1.5318, ppl=4.63, grad_norm=2.17, lr=1.24e-07, throughput=2314 tok/s
2025-11-27 20:54:05,181 - INFO - Epoch 1 Step 5760 (Global: 9760): loss=1.6843, ppl=5.39, grad_norm=1.51, lr=1.21e-07, throughput=2337 tok/s
2025-11-27 20:57:31,667 - INFO - Epoch 1 Step 5770 (Global: 9770): loss=1.5998, ppl=4.95, grad_norm=1.59, lr=1.17e-07, throughput=2325 tok/s
2025-11-27 21:00:59,084 - INFO - Epoch 1 Step 5780 (Global: 9780): loss=1.7273, ppl=5.63, grad_norm=1.91, lr=1.13e-07, throughput=2314 tok/s
2025-11-27 21:04:25,574 - INFO - Epoch 1 Step 5790 (Global: 9790): loss=1.7644, ppl=5.84, grad_norm=1.48, lr=1.10e-07, throughput=2325 tok/s
2025-11-27 21:07:51,160 - INFO - Epoch 1 Step 5800 (Global: 9800): loss=1.7346, ppl=5.67, grad_norm=1.80, lr=1.06e-07, throughput=2335 tok/s
2025-11-27 21:11:17,308 - INFO - Epoch 1 Step 5810 (Global: 9810): loss=1.5369, ppl=4.65, grad_norm=1.71, lr=1.03e-07, throughput=2328 tok/s
2025-11-27 21:14:42,597 - INFO - Epoch 1 Step 5820 (Global: 9820): loss=1.7478, ppl=5.74, grad_norm=1.62, lr=9.97e-08, throughput=2338 tok/s
2025-11-27 21:18:08,923 - INFO - Epoch 1 Step 5830 (Global: 9830): loss=1.5048, ppl=4.50, grad_norm=1.41, lr=9.64e-08, throughput=2326 tok/s
2025-11-27 21:21:34,472 - INFO - Epoch 1 Step 5840 (Global: 9840): loss=1.6680, ppl=5.30, grad_norm=1.82, lr=9.32e-08, throughput=2335 tok/s
2025-11-27 21:24:56,657 - INFO - Epoch 1 Step 5850 (Global: 9850): loss=1.6059, ppl=4.98, grad_norm=1.17, lr=9.00e-08, throughput=2374 tok/s
2025-11-27 21:28:20,460 - INFO - Epoch 1 Step 5860 (Global: 9860): loss=1.6501, ppl=5.21, grad_norm=1.49, lr=8.68e-08, throughput=2355 tok/s
2025-11-27 21:31:45,765 - INFO - Epoch 1 Step 5870 (Global: 9870): loss=1.3523, ppl=3.87, grad_norm=1.23, lr=8.37e-08, throughput=2338 tok/s
2025-11-27 21:35:18,286 - INFO - Epoch 1 Step 5880 (Global: 9880): loss=1.5160, ppl=4.55, grad_norm=1.43, lr=8.07e-08, throughput=2259 tok/s
2025-11-27 21:38:51,300 - INFO - Epoch 1 Step 5890 (Global: 9890): loss=1.5679, ppl=4.80, grad_norm=1.58, lr=7.77e-08, throughput=2253 tok/s
2025-11-27 21:42:28,975 - INFO - Epoch 1 Step 5900 (Global: 9900): loss=1.3739, ppl=3.95, grad_norm=1.41, lr=7.48e-08, throughput=2205 tok/s
2025-11-27 21:46:03,161 - INFO - Epoch 1 Step 5910 (Global: 9910): loss=1.5115, ppl=4.53, grad_norm=1.31, lr=7.20e-08, throughput=2241 tok/s
2025-11-27 21:49:39,458 - INFO - Epoch 1 Step 5920 (Global: 9920): loss=1.4874, ppl=4.43, grad_norm=1.63, lr=6.92e-08, throughput=2219 tok/s
2025-11-27 21:53:14,104 - INFO - Epoch 1 Step 5930 (Global: 9930): loss=1.7029, ppl=5.49, grad_norm=2.80, lr=6.64e-08, throughput=2236 tok/s
2025-11-27 21:56:46,292 - INFO - Epoch 1 Step 5940 (Global: 9940): loss=1.4983, ppl=4.47, grad_norm=1.58, lr=6.37e-08, throughput=2262 tok/s
2025-11-27 22:00:21,581 - INFO - Epoch 1 Step 5950 (Global: 9950): loss=1.6700, ppl=5.31, grad_norm=1.46, lr=6.11e-08, throughput=2230 tok/s
2025-11-27 22:03:56,980 - INFO - Epoch 1 Step 5960 (Global: 9960): loss=1.7459, ppl=5.73, grad_norm=1.41, lr=5.85e-08, throughput=2228 tok/s
2025-11-27 22:07:31,743 - INFO - Epoch 1 Step 5970 (Global: 9970): loss=1.7514, ppl=5.76, grad_norm=1.21, lr=5.60e-08, throughput=2235 tok/s
2025-11-27 22:11:09,650 - INFO - Epoch 1 Step 5980 (Global: 9980): loss=1.6806, ppl=5.37, grad_norm=1.30, lr=5.35e-08, throughput=2203 tok/s
2025-11-27 22:14:43,468 - INFO - Epoch 1 Step 5990 (Global: 9990): loss=1.5732, ppl=4.82, grad_norm=1.39, lr=5.11e-08, throughput=2245 tok/s
2025-11-27 22:18:19,058 - INFO - Epoch 1 Step 6000 (Global: 10000): loss=1.7657, ppl=5.85, grad_norm=1.66, lr=4.87e-08, throughput=2226 tok/s
2025-11-27 22:18:19,059 - INFO -
Running validation at step 10000...
2025-11-27 22:30:37,913 - INFO - Validation loss: 1.6258, perplexity: 5.08
2025-11-27 22:30:37,914 - INFO -
======================================================================
2025-11-27 22:30:37,914 - INFO - Qualitative Evaluation Samples:
2025-11-27 22:30:37,914 - INFO - ======================================================================
2025-11-27 22:30:37,915 - INFO -
Sample 1 (ID: sample_141920_chunk_1):
2025-11-27 22:30:37,915 - INFO - Context: [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-27 22:30:37,916 - INFO - Generated: ' to the band\'s previous work, stating that "it\'s a little more experimental, a little more experimental than the last two, but it\'s still the same Codex and Keys, and it\'s still the same Codex and Key...'
2025-11-27 22:30:37,916 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-27 22:30:37,916 - INFO - ----------------------------------------------------------------------
2025-11-27 22:30:37,917 - INFO -
Sample 2 (ID: sample_170543_chunk_2):
2025-11-27 22:30:37,917 - INFO - Context: [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-27 22:30:37,917 - INFO - Generated: 'aternalistic fraternal organizations, which were often seen as "white" and "Anglo" in nature. The Order of the Arrow was founded in 1920 by the Grand Lodge of the United States of America, and the fir...'
2025-11-27 22:30:37,918 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-27 22:30:37,918 - INFO - ----------------------------------------------------------------------
2025-11-27 22:30:37,919 - INFO -
Sample 3 (ID: sample_107152_chunk_9):
2025-11-27 22:30:37,919 - INFO - Context: [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-27 22:30:37,920 - INFO - Generated: " be defeated by Oga. Teimou's shadow group is then defeated by Oga and Miki, and the four fighters are taken to the Shingetsu Temple to be killed by the Shingetsu's own shadow group. Teimou's shadow g..."
2025-11-27 22:30:37,920 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-27 22:30:37,920 - INFO - ----------------------------------------------------------------------
2025-11-27 22:30:37,921 - INFO -
Sample 4 (ID: sample_069148_chunk_0):
2025-11-27 22:30:37,921 - INFO - Context: [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-27 22:30:37,921 - INFO - Generated: '-01-01 | 1 | 1 | 1 | 1 | 1 |\n| 1.0.0 | U+0B01..0B03, 0B05..0B0C, 0B0F..0B10, 0B13..0B28, 0B2A..0B30, 0B32..0B33, 0B36..0B39, 0B3C..0B43, 0...'
2025-11-27 22:30:37,922 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...'
2025-11-27 22:30:37,922 - INFO - ----------------------------------------------------------------------
2025-11-27 22:30:37,923 - INFO -
Sample 5 (ID: sample_103176_chunk_4):
2025-11-27 22:30:37,923 - INFO - Context: [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-27 22:30:37,923 - INFO - Generated: '1 | BlackBerry PlayBook | EA Tiburon | [ 150 ] |\n| Madden NFL 12 | August 30, 2011 | PlayStation 3 | EA Tibur...'
2025-11-27 22:30:37,924 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...'
2025-11-27 22:30:37,924 - INFO - ----------------------------------------------------------------------
2025-11-27 22:30:37,925 - INFO -
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_10000.jsonl
2025-11-27 22:34:00,082 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-27 22:34:00,099 - INFO - New best validation loss: 1.6258, perplexity: 5.08
2025-11-27 22:37:34,452 - INFO - Epoch 1 Step 6010 (Global: 10010): loss=1.7488, ppl=5.75, grad_norm=1.53, lr=4.64e-08, throughput=2240 tok/s
2025-11-27 22:41:12,002 - INFO - Epoch 1 Step 6020 (Global: 10020): loss=1.6746, ppl=5.34, grad_norm=1.66, lr=4.42e-08, throughput=2206 tok/s
2025-11-27 22:44:46,127 - INFO - Epoch 1 Step 6030 (Global: 10030): loss=1.6560, ppl=5.24, grad_norm=2.31, lr=4.20e-08, throughput=2242 tok/s
2025-11-27 22:48:22,442 - INFO - Epoch 1 Step 6040 (Global: 10040): loss=1.8665, ppl=6.47, grad_norm=1.38, lr=3.98e-08, throughput=2219 tok/s
2025-11-27 22:51:57,206 - INFO - Epoch 1 Step 6050 (Global: 10050): loss=1.7092, ppl=5.52, grad_norm=1.77, lr=3.78e-08, throughput=2235 tok/s
2025-11-27 22:55:31,548 - INFO - Epoch 1 Step 6060 (Global: 10060): loss=1.4928, ppl=4.45, grad_norm=1.16, lr=3.57e-08, throughput=2239 tok/s
2025-11-27 22:59:08,999 - INFO - Epoch 1 Step 6070 (Global: 10070): loss=1.6630, ppl=5.28, grad_norm=1.54, lr=3.38e-08, throughput=2207 tok/s
2025-11-27 23:02:43,245 - INFO - Epoch 1 Step 6080 (Global: 10080): loss=1.5916, ppl=4.91, grad_norm=1.35, lr=3.18e-08, throughput=2240 tok/s
2025-11-27 23:06:19,421 - INFO - Epoch 1 Step 6090 (Global: 10090): loss=1.6389, ppl=5.15, grad_norm=1.30, lr=3.00e-08, throughput=2220 tok/s
2025-11-27 23:09:54,624 - INFO - Epoch 1 Step 6100 (Global: 10100): loss=1.6017, ppl=4.96, grad_norm=1.44, lr=2.82e-08, throughput=2230 tok/s
2025-11-27 23:13:30,259 - INFO - Epoch 1 Step 6110 (Global: 10110): loss=1.6520, ppl=5.22, grad_norm=1.27, lr=2.64e-08, throughput=2226 tok/s
2025-11-27 23:17:04,305 - INFO - Epoch 1 Step 6120 (Global: 10120): loss=1.5504, ppl=4.71, grad_norm=1.36, lr=2.47e-08, throughput=2243 tok/s
2025-11-27 23:20:36,135 - INFO - Epoch 1 Step 6130 (Global: 10130): loss=1.9117, ppl=6.76, grad_norm=2.11, lr=2.31e-08, throughput=2266 tok/s
2025-11-27 23:24:14,085 - INFO - Epoch 1 Step 6140 (Global: 10140): loss=1.5247, ppl=4.59, grad_norm=1.74, lr=2.15e-08, throughput=2202 tok/s
2025-11-27 23:27:53,121 - INFO - Epoch 1 Step 6150 (Global: 10150): loss=1.7456, ppl=5.73, grad_norm=1.70, lr=2.00e-08, throughput=2191 tok/s
2025-11-27 23:31:32,824 - INFO - Epoch 1 Step 6160 (Global: 10160): loss=1.7847, ppl=5.96, grad_norm=1.80, lr=1.85e-08, throughput=2185 tok/s
2025-11-27 23:34:59,952 - INFO - Epoch 1 Step 6170 (Global: 10170): loss=1.7517, ppl=5.76, grad_norm=1.42, lr=1.71e-08, throughput=2317 tok/s
2025-11-27 23:38:27,852 - INFO - Epoch 1 Step 6180 (Global: 10180): loss=1.5839, ppl=4.87, grad_norm=2.78, lr=1.58e-08, throughput=2309 tok/s
2025-11-27 23:41:55,534 - INFO - Epoch 1 Step 6190 (Global: 10190): loss=1.7643, ppl=5.84, grad_norm=1.53, lr=1.45e-08, throughput=2311 tok/s
2025-11-27 23:45:22,593 - INFO - Epoch 1 Step 6200 (Global: 10200): loss=1.7653, ppl=5.84, grad_norm=2.02, lr=1.32e-08, throughput=2318 tok/s
2025-11-27 23:48:49,265 - INFO - Epoch 1 Step 6210 (Global: 10210): loss=1.4867, ppl=4.42, grad_norm=1.44, lr=1.20e-08, throughput=2323 tok/s
2025-11-27 23:52:15,136 - INFO - Epoch 1 Step 6220 (Global: 10220): loss=1.5656, ppl=4.79, grad_norm=1.30, lr=1.09e-08, throughput=2332 tok/s
2025-11-27 23:55:42,499 - INFO - Epoch 1 Step 6230 (Global: 10230): loss=1.6320, ppl=5.11, grad_norm=1.91, lr=9.81e-09, throughput=2315 tok/s
2025-11-27 23:59:10,145 - INFO - Epoch 1 Step 6240 (Global: 10240): loss=1.6798, ppl=5.36, grad_norm=1.40, lr=8.79e-09, throughput=2312 tok/s
2025-11-28 00:02:37,306 - INFO - Epoch 1 Step 6250 (Global: 10250): loss=1.6438, ppl=5.17, grad_norm=1.42, lr=7.83e-09, throughput=2317 tok/s
2025-11-28 00:06:05,018 - INFO - Epoch 1 Step 6260 (Global: 10260): loss=1.8199, ppl=6.17, grad_norm=1.22, lr=6.92e-09, throughput=2311 tok/s
2025-11-28 00:09:31,536 - INFO - Epoch 1 Step 6270 (Global: 10270): loss=1.7332, ppl=5.66, grad_norm=1.90, lr=6.06e-09, throughput=2324 tok/s
2025-11-28 00:12:58,319 - INFO - Epoch 1 Step 6280 (Global: 10280): loss=1.7339, ppl=5.66, grad_norm=1.45, lr=5.27e-09, throughput=2321 tok/s
2025-11-28 00:16:25,951 - INFO - Epoch 1 Step 6290 (Global: 10290): loss=1.7052, ppl=5.50, grad_norm=1.55, lr=4.53e-09, throughput=2312 tok/s
2025-11-28 00:19:54,748 - INFO - Epoch 1 Step 6300 (Global: 10300): loss=1.7043, ppl=5.50, grad_norm=2.03, lr=3.84e-09, throughput=2299 tok/s
2025-11-28 00:23:22,238 - INFO - Epoch 1 Step 6310 (Global: 10310): loss=1.7098, ppl=5.53, grad_norm=2.17, lr=3.21e-09, throughput=2313 tok/s
2025-11-28 00:26:48,410 - INFO - Epoch 1 Step 6320 (Global: 10320): loss=1.6309, ppl=5.11, grad_norm=1.38, lr=2.64e-09, throughput=2328 tok/s
2025-11-28 00:30:15,949 - INFO - Epoch 1 Step 6330 (Global: 10330): loss=1.7720, ppl=5.88, grad_norm=1.57, lr=2.12e-09, throughput=2313 tok/s
2025-11-28 00:33:44,204 - INFO - Epoch 1 Step 6340 (Global: 10340): loss=1.6188, ppl=5.05, grad_norm=1.33, lr=1.66e-09, throughput=2305 tok/s
2025-11-28 00:37:12,443 - INFO - Epoch 1 Step 6350 (Global: 10350): loss=1.6697, ppl=5.31, grad_norm=1.48, lr=1.26e-09, throughput=2305 tok/s
2025-11-28 00:40:39,111 - INFO - Epoch 1 Step 6360 (Global: 10360): loss=1.6755, ppl=5.34, grad_norm=1.23, lr=9.12e-10, throughput=2323 tok/s
2025-11-28 00:44:06,723 - INFO - Epoch 1 Step 6370 (Global: 10370): loss=1.5299, ppl=4.62, grad_norm=1.95, lr=6.20e-10, throughput=2312 tok/s
2025-11-28 00:47:30,860 - INFO - Epoch 1 Step 6380 (Global: 10380): loss=1.5091, ppl=4.52, grad_norm=1.87, lr=3.84e-10, throughput=2351 tok/s
2025-11-28 00:50:56,526 - INFO - Epoch 1 Step 6390 (Global: 10390): loss=1.3304, ppl=3.78, grad_norm=1.04, lr=2.05e-10, throughput=2334 tok/s
2025-11-28 00:54:20,885 - INFO - Epoch 1 Step 6400 (Global: 10400): loss=1.5872, ppl=4.89, grad_norm=4.62, lr=8.11e-11, throughput=2349 tok/s
2025-11-28 00:57:45,721 - INFO - Epoch 1 Step 6410 (Global: 10410): loss=1.6529, ppl=5.22, grad_norm=1.38, lr=1.38e-11, throughput=2343 tok/s
2025-11-28 01:00:00,874 - INFO - Flushing 4 remainder batches from gradient accumulation
2025-11-28 01:00:00,877 - INFO - Rescaling gradients by 1.50x (compensating for 4/6 batches)
2025-11-28 01:00:01,218 - INFO - Remainder batch: loss=1.6686, ppl=5.30, grad_norm=1.38
2025-11-28 01:00:01,242 - INFO - Epoch 1 training: loss=1.6270, ppl=5.09, grad_norm=1.65, throughput=2247 tok/s (137071.0s total)
2025-11-28 01:00:01,250 - INFO -
Running final validation...
2025-11-28 01:11:53,249 - INFO - Validation loss: 1.6258, perplexity: 5.08
2025-11-28 01:11:53,250 - INFO -
======================================================================
2025-11-28 01:11:53,250 - INFO - Qualitative Evaluation Samples:
2025-11-28 01:11:53,250 - INFO - ======================================================================
2025-11-28 01:11:53,250 - INFO -
Sample 1 (ID: sample_141920_chunk_1):
2025-11-28 01:11:53,250 - INFO - Context: [Image: sample_141920_chunk_1] + "
Free OCR."
2025-11-28 01:11:53,251 - INFO - Generated: ' to the band\'s previous work, stating that "it\'s a little more experimental, a little more experimental than the last two, but it\'s still the same Codex and Keys, and it\'s still the same Codex and Key...'
2025-11-28 01:11:53,251 - INFO - Ground Truth: ' negatively to Death Cab for Cutie\'s earlier work, writing "...even when the band revisits past glories on Codes and Keys\' few highlights, Death Cab weirdly sound like they are imitating themselves." ...'
2025-11-28 01:11:53,251 - INFO - ----------------------------------------------------------------------
2025-11-28 01:11:53,251 - INFO -
Sample 2 (ID: sample_170543_chunk_2):
2025-11-28 01:11:53,251 - INFO - Context: [Image: sample_170543_chunk_2] + "
Free OCR."
2025-11-28 01:11:53,252 - INFO - Generated: "aternalistic fraternities, and the Order of the Arrow was no exception. The Order's Native American-themed chapters were founded in the 1930s, and the Order's first chapter was established in 1934. Th..."
2025-11-28 01:11:53,252 - INFO - Ground Truth: 'aternal organizations in drawing motifs from an idealized past but it was unusual among college honorary societies in its use of Native American themes. White Americans had been masquerading as Indian...'
2025-11-28 01:11:53,252 - INFO - ----------------------------------------------------------------------
2025-11-28 01:11:53,253 - INFO -
Sample 3 (ID: sample_107152_chunk_9):
2025-11-28 01:11:53,253 - INFO - Context: [Image: sample_107152_chunk_9] + "
Free OCR."
2025-11-28 01:11:53,254 - INFO - Generated: " be defeated by Oga and Mikii. Teimou's shadow group then defeated the Red Tails, and Oga and Mikii were able to get back at Teimou. Teimou's shadow group then defeated the Red Tails, and Oga and Miki..."
2025-11-28 01:11:53,254 - INFO - Ground Truth: " find Oga already there and be badly beaten again. They are nevertheless taken in by Kunieda's grandfather and begin their training alongside Oga, in what seems to be a temporary truce.\nKotaro Mikagam..."
2025-11-28 01:11:53,255 - INFO - ----------------------------------------------------------------------
2025-11-28 01:11:53,255 - INFO -
Sample 4 (ID: sample_069148_chunk_0):
2025-11-28 01:11:53,256 - INFO - Context: [Image: sample_069148_chunk_0] + "
Free OCR."
2025-11-28 01:11:53,256 - INFO - Generated: '-01-01 | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, ...'
2025-11-28 01:11:53,257 - INFO - Ground Truth: '-056 | | | Whistler, Ken, Indic Charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam ...'
2025-11-28 01:11:53,257 - INFO - ----------------------------------------------------------------------
2025-11-28 01:11:53,257 - INFO -
Sample 5 (ID: sample_103176_chunk_4):
2025-11-28 01:11:53,258 - INFO - Context: [Image: sample_103176_chunk_4] + "
Free OCR."
2025-11-28 01:11:53,258 - INFO - Generated: '1 | PlayStation 3 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 | August 30, 2011 | PlayStation 3 | EA Tiburon ...'
2025-11-28 01:11:53,259 - INFO - Ground Truth: '1 | PlayStation 2 | EA Tiburon | [ 150 ] |\n| Madden NFL 12 ...'
2025-11-28 01:11:53,259 - INFO - ----------------------------------------------------------------------
2025-11-28 01:11:53,260 - INFO -
Qualitative samples saved to: outputs/production_vision_base_lm_20251123_003859/qualitative_step_10417.jsonl
2025-11-28 01:14:49,678 - INFO - Saved checkpoint to outputs/production_vision_base_lm_20251123_003859/best_checkpoint.pt
2025-11-28 01:14:49,689 - INFO - New best validation loss: 1.6258, perplexity: 5.08
2025-11-28 01:14:49,691 - INFO -
Training complete!
2025-11-28 01:14:49,691 - INFO - Final checkpoint is best, created symlink to save space (~2GB saved)
2025-11-28 01:14:49,691 - INFO - Best validation loss: 1.6258, perplexity: 5.08
2025-11-28 01:14:49,692 - INFO - Checkpoints saved to outputs/production_vision_base_lm_20251123_003859
2025-11-28 01:14:50,370 - INFO - W&B run finished