2025-11-16 23:50:41,595 - INFO - Starting training with args: Namespace(regime='conv1d_residual', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035', objective='reconstruction', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='small', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt=None, train_encoder=False, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=250, conv_kernel=5, timestamp='20251116_235035', batch_size=4, gradient_accumulation_steps=12, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=500, initial_validation=True, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name=None, resume_from_checkpoint=None, init_from_checkpoint=None, aux_loss_weight=0.5, num_workers=8, prefetch_factor=32, seed=None, eval_seed=42, device='cuda', compile=True) 2025-11-16 23:50:41,595 - INFO - Auto-generated W&B run name: production_conv1d_residual_r250_k5_reconstruction_20251116_235035 2025-11-16 23:50:42,718 - INFO - Initialized W&B run: vision-compression-2/production_conv1d_residual_r250_k5_reconstruction_20251116_235035 (ID: 52qk5aob) 2025-11-16 23:50:42,718 - INFO - Loading model and tokenizer... 2025-11-16 23:50:52,195 - INFO - Compiling model with torch.compile... 2025-11-16 23:50:52,195 - INFO - Note: First forward pass will compile (may take several minutes) 2025-11-16 23:50:53,039 - INFO - Created Conv1D Residual Pyramid Compression trainer 2025-11-16 23:50:53,040 - INFO - Architecture: Residual blocks with skip connections 2025-11-16 23:50:53,040 - INFO - Kernel size: 5 2025-11-16 23:50:53,040 - INFO - Compression: 1000 → 251 tokens (4.00x) 2025-11-16 23:50:53,040 - INFO - Training objective: reconstruction 2025-11-16 23:50:53,070 - INFO - Logged parameter counts to W&B: total=3,362,332,160, trainable=2,960,962,560, encoder=26,225,920, decoder=2,934,736,640 2025-11-16 23:50:53,070 - INFO - Loading training data from data/training/splits_510k/train.jsonl 2025-11-16 23:54:00,323 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl 2025-11-16 23:54:00,326 - INFO - Conv1d_residual regime: using full 1000-token context 2025-11-16 23:54:00,327 - INFO - Loading validation data from data/training/splits_510k/val.jsonl 2025-11-16 23:54:03,100 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl 2025-11-16 23:54:03,101 - INFO - Validation conv1d_residual regime: using full 1000-token context 2025-11-16 23:54:03,137 - INFO - Created AdamW optimizer with lr=0.0001, fused=True 2025-11-16 23:54:03,138 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 2025-11-16 23:54:03,138 - INFO - Starting training loop... 2025-11-16 23:54:03,138 - INFO - ====================================================================== 2025-11-16 23:54:03,138 - INFO - Running initial validation (before any training)... 2025-11-16 23:54:03,138 - INFO - ====================================================================== 2025-11-17 00:02:03,324 - INFO - Validation loss: 10.6572, perplexity: 42499.17 2025-11-17 00:02:03,325 - INFO - Qualitative metrics (n=5): 2025-11-17 00:02:03,325 - INFO - BLEU: 0.0000 2025-11-17 00:02:03,325 - INFO - METEOR: 0.0000 2025-11-17 00:02:03,325 - INFO - Edit Distance: 1.0000 2025-11-17 00:02:03,325 - INFO - F-measure: 0.0000 2025-11-17 00:02:03,325 - INFO - ====================================================================== 2025-11-17 00:02:03,325 - INFO - Qualitative Evaluation Samples: 2025-11-17 00:02:03,325 - INFO - ====================================================================== 2025-11-17 00:02:03,326 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 00:02:03,326 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 00:02:03,326 - INFO - Generated: '!' 2025-11-17 00:02:03,326 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 00:02:03,326 - INFO - ---------------------------------------------------------------------- 2025-11-17 00:02:03,326 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 00:02:03,326 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 00:02:03,327 - INFO - Generated: '!' 2025-11-17 00:02:03,327 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 00:02:03,327 - INFO - ---------------------------------------------------------------------- 2025-11-17 00:02:03,327 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 00:02:03,327 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 00:02:03,327 - INFO - Generated: '!' 2025-11-17 00:02:03,327 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 00:02:03,327 - INFO - ---------------------------------------------------------------------- 2025-11-17 00:02:03,327 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 00:02:03,327 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 00:02:03,328 - INFO - Generated: '!' 2025-11-17 00:02:03,328 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 00:02:03,328 - INFO - ---------------------------------------------------------------------- 2025-11-17 00:02:03,328 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 00:02:03,328 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 00:02:03,328 - INFO - Generated: '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!' 2025-11-17 00:02:03,328 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 00:02:03,328 - INFO - ---------------------------------------------------------------------- 2025-11-17 00:02:03,329 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_0.jsonl 2025-11-17 00:02:04,447 - INFO - Initial validation - Loss: 10.6572, Perplexity: 42499.17 2025-11-17 00:02:04,447 - INFO - ====================================================================== 2025-11-17 00:02:04,448 - INFO - ====================================================================== 2025-11-17 00:02:04,448 - INFO - Epoch 1/1 2025-11-17 00:02:04,448 - INFO - ====================================================================== 2025-11-17 00:02:37,066 - INFO - Effective context tokens (per-sample): 252 | Compression ratio: 3.97x 2025-11-17 00:02:37,066 - INFO - Target tokens per sample: 1000 2025-11-17 00:05:00,262 - INFO - Epoch 1 Step 10 (Global: 10): loss=2.4309, ppl=11.37, grad_norm=25.88, lr=1.09e-05, throughput=2730 tok/s 2025-11-17 00:07:22,258 - INFO - Epoch 1 Step 20 (Global: 20): loss=1.9917, ppl=7.33, grad_norm=1.49, lr=1.17e-05, throughput=3380 tok/s 2025-11-17 00:09:54,150 - INFO - Epoch 1 Step 30 (Global: 30): loss=1.9489, ppl=7.02, grad_norm=1.15, lr=1.26e-05, throughput=3160 tok/s 2025-11-17 00:12:16,750 - INFO - Epoch 1 Step 40 (Global: 40): loss=1.8774, ppl=6.54, grad_norm=1.13, lr=1.35e-05, throughput=3366 tok/s 2025-11-17 00:14:48,000 - INFO - Epoch 1 Step 50 (Global: 50): loss=1.8697, ppl=6.49, grad_norm=1.22, lr=1.43e-05, throughput=3174 tok/s 2025-11-17 00:17:10,415 - INFO - Epoch 1 Step 60 (Global: 60): loss=1.9046, ppl=6.72, grad_norm=1.20, lr=1.52e-05, throughput=3370 tok/s 2025-11-17 00:19:32,339 - INFO - Epoch 1 Step 70 (Global: 70): loss=1.8731, ppl=6.51, grad_norm=1.20, lr=1.61e-05, throughput=3382 tok/s 2025-11-17 00:21:55,943 - INFO - Epoch 1 Step 80 (Global: 80): loss=1.9056, ppl=6.72, grad_norm=1.16, lr=1.69e-05, throughput=3343 tok/s 2025-11-17 00:24:27,980 - INFO - Epoch 1 Step 90 (Global: 90): loss=1.6964, ppl=5.45, grad_norm=1.18, lr=1.78e-05, throughput=3157 tok/s 2025-11-17 00:26:50,043 - INFO - Epoch 1 Step 100 (Global: 100): loss=1.8775, ppl=6.54, grad_norm=1.20, lr=1.86e-05, throughput=3379 tok/s 2025-11-17 00:29:14,019 - INFO - Epoch 1 Step 110 (Global: 110): loss=2.0547, ppl=7.80, grad_norm=1.14, lr=1.95e-05, throughput=3334 tok/s 2025-11-17 00:31:36,791 - INFO - Epoch 1 Step 120 (Global: 120): loss=1.8170, ppl=6.15, grad_norm=1.17, lr=2.04e-05, throughput=3362 tok/s 2025-11-17 00:34:09,638 - INFO - Epoch 1 Step 130 (Global: 130): loss=1.8966, ppl=6.66, grad_norm=1.10, lr=2.12e-05, throughput=3140 tok/s 2025-11-17 00:36:31,824 - INFO - Epoch 1 Step 140 (Global: 140): loss=2.0792, ppl=8.00, grad_norm=1.30, lr=2.21e-05, throughput=3376 tok/s 2025-11-17 00:39:03,749 - INFO - Epoch 1 Step 150 (Global: 150): loss=1.8420, ppl=6.31, grad_norm=1.20, lr=2.30e-05, throughput=3160 tok/s 2025-11-17 00:41:27,005 - INFO - Epoch 1 Step 160 (Global: 160): loss=1.8060, ppl=6.09, grad_norm=1.19, lr=2.38e-05, throughput=3351 tok/s 2025-11-17 00:43:59,339 - INFO - Epoch 1 Step 170 (Global: 170): loss=1.9059, ppl=6.73, grad_norm=1.19, lr=2.47e-05, throughput=3151 tok/s 2025-11-17 00:46:19,961 - INFO - Epoch 1 Step 180 (Global: 180): loss=1.8541, ppl=6.39, grad_norm=1.29, lr=2.56e-05, throughput=3413 tok/s 2025-11-17 00:48:40,856 - INFO - Epoch 1 Step 190 (Global: 190): loss=1.9251, ppl=6.86, grad_norm=1.20, lr=2.64e-05, throughput=3407 tok/s 2025-11-17 00:51:02,074 - INFO - Epoch 1 Step 200 (Global: 200): loss=1.9112, ppl=6.76, grad_norm=1.23, lr=2.73e-05, throughput=3399 tok/s 2025-11-17 00:53:23,400 - INFO - Epoch 1 Step 210 (Global: 210): loss=1.7017, ppl=5.48, grad_norm=1.20, lr=2.82e-05, throughput=3396 tok/s 2025-11-17 00:55:53,660 - INFO - Epoch 1 Step 220 (Global: 220): loss=1.9477, ppl=7.01, grad_norm=1.18, lr=2.90e-05, throughput=3195 tok/s 2025-11-17 00:58:14,611 - INFO - Epoch 1 Step 230 (Global: 230): loss=1.8638, ppl=6.45, grad_norm=1.20, lr=2.99e-05, throughput=3406 tok/s 2025-11-17 01:00:35,979 - INFO - Epoch 1 Step 240 (Global: 240): loss=1.9968, ppl=7.37, grad_norm=1.21, lr=3.07e-05, throughput=3395 tok/s 2025-11-17 01:03:06,091 - INFO - Epoch 1 Step 250 (Global: 250): loss=1.9771, ppl=7.22, grad_norm=1.24, lr=3.16e-05, throughput=3198 tok/s 2025-11-17 01:05:28,683 - INFO - Epoch 1 Step 260 (Global: 260): loss=1.8688, ppl=6.48, grad_norm=1.19, lr=3.25e-05, throughput=3366 tok/s 2025-11-17 01:07:50,111 - INFO - Epoch 1 Step 270 (Global: 270): loss=1.8527, ppl=6.38, grad_norm=1.20, lr=3.33e-05, throughput=3394 tok/s 2025-11-17 01:10:21,582 - INFO - Epoch 1 Step 280 (Global: 280): loss=1.9209, ppl=6.83, grad_norm=1.16, lr=3.42e-05, throughput=3169 tok/s 2025-11-17 01:12:43,663 - INFO - Epoch 1 Step 290 (Global: 290): loss=1.8424, ppl=6.31, grad_norm=1.29, lr=3.51e-05, throughput=3378 tok/s 2025-11-17 01:15:14,477 - INFO - Epoch 1 Step 300 (Global: 300): loss=1.9188, ppl=6.81, grad_norm=1.30, lr=3.59e-05, throughput=3183 tok/s 2025-11-17 01:17:35,820 - INFO - Epoch 1 Step 310 (Global: 310): loss=1.9178, ppl=6.81, grad_norm=1.26, lr=3.68e-05, throughput=3396 tok/s 2025-11-17 01:19:57,050 - INFO - Epoch 1 Step 320 (Global: 320): loss=1.7867, ppl=5.97, grad_norm=1.40, lr=3.77e-05, throughput=3399 tok/s 2025-11-17 01:22:17,486 - INFO - Epoch 1 Step 330 (Global: 330): loss=1.8215, ppl=6.18, grad_norm=1.23, lr=3.85e-05, throughput=3418 tok/s 2025-11-17 01:24:48,690 - INFO - Epoch 1 Step 340 (Global: 340): loss=1.7872, ppl=5.97, grad_norm=1.20, lr=3.94e-05, throughput=3175 tok/s 2025-11-17 01:27:09,921 - INFO - Epoch 1 Step 350 (Global: 350): loss=2.0035, ppl=7.41, grad_norm=1.20, lr=4.03e-05, throughput=3399 tok/s 2025-11-17 01:29:31,213 - INFO - Epoch 1 Step 360 (Global: 360): loss=1.8507, ppl=6.36, grad_norm=1.38, lr=4.11e-05, throughput=3397 tok/s 2025-11-17 01:32:02,873 - INFO - Epoch 1 Step 370 (Global: 370): loss=1.7528, ppl=5.77, grad_norm=1.23, lr=4.20e-05, throughput=3165 tok/s 2025-11-17 01:34:24,618 - INFO - Epoch 1 Step 380 (Global: 380): loss=1.8836, ppl=6.58, grad_norm=1.26, lr=4.29e-05, throughput=3386 tok/s 2025-11-17 01:36:46,459 - INFO - Epoch 1 Step 390 (Global: 390): loss=1.8352, ppl=6.27, grad_norm=1.22, lr=4.37e-05, throughput=3384 tok/s 2025-11-17 01:39:17,845 - INFO - Epoch 1 Step 400 (Global: 400): loss=1.9997, ppl=7.39, grad_norm=1.19, lr=4.46e-05, throughput=3171 tok/s 2025-11-17 01:41:38,990 - INFO - Epoch 1 Step 410 (Global: 410): loss=1.9730, ppl=7.19, grad_norm=1.34, lr=4.54e-05, throughput=3401 tok/s 2025-11-17 01:44:09,491 - INFO - Epoch 1 Step 420 (Global: 420): loss=1.8929, ppl=6.64, grad_norm=1.17, lr=4.63e-05, throughput=3189 tok/s 2025-11-17 01:46:30,977 - INFO - Epoch 1 Step 430 (Global: 430): loss=2.0080, ppl=7.45, grad_norm=1.37, lr=4.72e-05, throughput=3393 tok/s 2025-11-17 01:48:51,584 - INFO - Epoch 1 Step 440 (Global: 440): loss=1.9850, ppl=7.28, grad_norm=1.34, lr=4.80e-05, throughput=3414 tok/s 2025-11-17 01:51:12,367 - INFO - Epoch 1 Step 450 (Global: 450): loss=1.9028, ppl=6.70, grad_norm=1.16, lr=4.89e-05, throughput=3410 tok/s 2025-11-17 01:53:43,253 - INFO - Epoch 1 Step 460 (Global: 460): loss=2.1278, ppl=8.40, grad_norm=1.23, lr=4.98e-05, throughput=3181 tok/s 2025-11-17 01:56:04,348 - INFO - Epoch 1 Step 470 (Global: 470): loss=1.8422, ppl=6.31, grad_norm=1.16, lr=5.06e-05, throughput=3402 tok/s 2025-11-17 01:58:24,835 - INFO - Epoch 1 Step 480 (Global: 480): loss=1.9039, ppl=6.71, grad_norm=1.14, lr=5.15e-05, throughput=3417 tok/s 2025-11-17 02:00:55,198 - INFO - Epoch 1 Step 490 (Global: 490): loss=1.9041, ppl=6.71, grad_norm=1.20, lr=5.24e-05, throughput=3192 tok/s 2025-11-17 02:03:16,388 - INFO - Epoch 1 Step 500 (Global: 500): loss=1.7743, ppl=5.90, grad_norm=1.17, lr=5.32e-05, throughput=3400 tok/s 2025-11-17 02:03:16,390 - INFO - Running validation at step 500... 2025-11-17 02:11:00,329 - INFO - Validation loss: 1.8838, perplexity: 6.58 2025-11-17 02:11:00,330 - INFO - Qualitative metrics (n=5): 2025-11-17 02:11:00,330 - INFO - BLEU: 0.0000 2025-11-17 02:11:00,330 - INFO - METEOR: 0.1318 2025-11-17 02:11:00,330 - INFO - Edit Distance: 0.7718 2025-11-17 02:11:00,330 - INFO - F-measure: 0.0785 2025-11-17 02:11:00,330 - INFO - ====================================================================== 2025-11-17 02:11:00,330 - INFO - Qualitative Evaluation Samples: 2025-11-17 02:11:00,330 - INFO - ====================================================================== 2025-11-17 02:11:00,330 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 02:11:00,331 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 02:11:00,331 - INFO - Generated: '# The Last of Us (2013 film)\nThe Last of Us is a 2013 American post-apocalyptic action film directed by Neil Druckmann and written by Christopher McQuarrie. It stars Pedro Pascal, Bella Heathcote, and...' 2025-11-17 02:11:00,331 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 02:11:00,331 - INFO - ---------------------------------------------------------------------- 2025-11-17 02:11:00,331 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 02:11:00,331 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 02:11:00,331 - INFO - Generated: '# 2010–11 in the United Kingdom\nEvents from the year 2010 in the United Kingdom.\n\n## Incumbents\n- Monarch: Queen Elizabeth II (since 1952)\n- Prime Minister: David Cameron (since 2010)\n- Lord Chancello...' 2025-11-17 02:11:00,331 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 02:11:00,331 - INFO - ---------------------------------------------------------------------- 2025-11-17 02:11:00,331 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 02:11:00,331 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 02:11:00,332 - INFO - Generated: '# 2016 in Japan\nEvents in the year 2016 in Japan.\n\n## Incumbents\n- Emperor Akihito (Emperor Emeritus) (Emperor of Japan)\n- Prime Minister of Japan (since 2012)\n- Chief Justice of Japan (since 2012)\n- ...' 2025-11-17 02:11:00,332 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 02:11:00,332 - INFO - ---------------------------------------------------------------------- 2025-11-17 02:11:00,332 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 02:11:00,332 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 02:11:00,332 - INFO - Generated: "# 2016–17 FC Kaiserslautern season\nThe 2016–17 season was FC Kaiserslautern's 100th season in existence. The club was promoted to the Bundesliga for the first time in its history after finishing in 3r..." 2025-11-17 02:11:00,332 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 02:11:00,332 - INFO - ---------------------------------------------------------------------- 2025-11-17 02:11:00,332 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 02:11:00,332 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 02:11:00,333 - INFO - Generated: ' | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |...' 2025-11-17 02:11:00,333 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 02:11:00,333 - INFO - ---------------------------------------------------------------------- 2025-11-17 02:11:00,334 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_500.jsonl 2025-11-17 02:11:35,072 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-17 02:11:35,077 - INFO - New best validation loss: 1.8838, perplexity: 6.58 2025-11-17 02:13:57,072 - INFO - Epoch 1 Step 510 (Global: 510): loss=1.7318, ppl=5.65, grad_norm=1.37, lr=5.41e-05, throughput=3381 tok/s 2025-11-17 02:16:18,369 - INFO - Epoch 1 Step 520 (Global: 520): loss=1.9084, ppl=6.74, grad_norm=1.18, lr=5.50e-05, throughput=3397 tok/s 2025-11-17 02:18:49,212 - INFO - Epoch 1 Step 530 (Global: 530): loss=1.8740, ppl=6.51, grad_norm=1.12, lr=5.58e-05, throughput=3182 tok/s 2025-11-17 02:21:10,916 - INFO - Epoch 1 Step 540 (Global: 540): loss=1.7769, ppl=5.91, grad_norm=1.53, lr=5.67e-05, throughput=3387 tok/s 2025-11-17 02:23:32,840 - INFO - Epoch 1 Step 550 (Global: 550): loss=1.9729, ppl=7.19, grad_norm=1.27, lr=5.76e-05, throughput=3382 tok/s 2025-11-17 02:26:04,100 - INFO - Epoch 1 Step 560 (Global: 560): loss=1.8192, ppl=6.17, grad_norm=1.20, lr=5.84e-05, throughput=3173 tok/s 2025-11-17 02:28:25,731 - INFO - Epoch 1 Step 570 (Global: 570): loss=1.9026, ppl=6.70, grad_norm=1.24, lr=5.93e-05, throughput=3389 tok/s 2025-11-17 02:30:47,697 - INFO - Epoch 1 Step 580 (Global: 580): loss=1.8377, ppl=6.28, grad_norm=1.20, lr=6.01e-05, throughput=3381 tok/s 2025-11-17 02:33:18,849 - INFO - Epoch 1 Step 590 (Global: 590): loss=1.8460, ppl=6.33, grad_norm=1.22, lr=6.10e-05, throughput=3176 tok/s 2025-11-17 02:35:40,169 - INFO - Epoch 1 Step 600 (Global: 600): loss=1.7976, ppl=6.03, grad_norm=1.34, lr=6.19e-05, throughput=3397 tok/s 2025-11-17 02:38:11,322 - INFO - Epoch 1 Step 610 (Global: 610): loss=2.0045, ppl=7.42, grad_norm=1.35, lr=6.27e-05, throughput=3176 tok/s 2025-11-17 02:40:32,613 - INFO - Epoch 1 Step 620 (Global: 620): loss=1.6846, ppl=5.39, grad_norm=1.48, lr=6.36e-05, throughput=3397 tok/s 2025-11-17 02:42:53,468 - INFO - Epoch 1 Step 630 (Global: 630): loss=1.9332, ppl=6.91, grad_norm=1.28, lr=6.45e-05, throughput=3408 tok/s 2025-11-17 02:45:14,903 - INFO - Epoch 1 Step 640 (Global: 640): loss=1.9750, ppl=7.21, grad_norm=1.21, lr=6.53e-05, throughput=3394 tok/s 2025-11-17 02:47:46,956 - INFO - Epoch 1 Step 650 (Global: 650): loss=1.8472, ppl=6.34, grad_norm=1.23, lr=6.62e-05, throughput=3157 tok/s 2025-11-17 02:50:08,681 - INFO - Epoch 1 Step 660 (Global: 660): loss=1.9914, ppl=7.33, grad_norm=1.21, lr=6.71e-05, throughput=3387 tok/s 2025-11-17 02:52:31,096 - INFO - Epoch 1 Step 670 (Global: 670): loss=1.8841, ppl=6.58, grad_norm=1.24, lr=6.79e-05, throughput=3371 tok/s 2025-11-17 02:55:03,094 - INFO - Epoch 1 Step 680 (Global: 680): loss=2.0017, ppl=7.40, grad_norm=1.21, lr=6.88e-05, throughput=3158 tok/s 2025-11-17 02:57:24,400 - INFO - Epoch 1 Step 690 (Global: 690): loss=1.7651, ppl=5.84, grad_norm=1.20, lr=6.97e-05, throughput=3397 tok/s 2025-11-17 02:59:46,291 - INFO - Epoch 1 Step 700 (Global: 700): loss=1.9649, ppl=7.13, grad_norm=1.13, lr=7.05e-05, throughput=3383 tok/s 2025-11-17 03:02:16,137 - INFO - Epoch 1 Step 710 (Global: 710): loss=1.7933, ppl=6.01, grad_norm=1.10, lr=7.14e-05, throughput=3203 tok/s 2025-11-17 03:04:37,002 - INFO - Epoch 1 Step 720 (Global: 720): loss=1.8414, ppl=6.31, grad_norm=1.24, lr=7.22e-05, throughput=3408 tok/s 2025-11-17 03:07:08,252 - INFO - Epoch 1 Step 730 (Global: 730): loss=1.9155, ppl=6.79, grad_norm=1.20, lr=7.31e-05, throughput=3174 tok/s 2025-11-17 03:09:29,269 - INFO - Epoch 1 Step 740 (Global: 740): loss=1.8867, ppl=6.60, grad_norm=1.19, lr=7.40e-05, throughput=3404 tok/s 2025-11-17 03:11:50,614 - INFO - Epoch 1 Step 750 (Global: 750): loss=1.7265, ppl=5.62, grad_norm=1.11, lr=7.48e-05, throughput=3396 tok/s 2025-11-17 03:14:11,543 - INFO - Epoch 1 Step 760 (Global: 760): loss=1.8790, ppl=6.55, grad_norm=1.13, lr=7.57e-05, throughput=3406 tok/s 2025-11-17 03:16:42,635 - INFO - Epoch 1 Step 770 (Global: 770): loss=2.0240, ppl=7.57, grad_norm=1.21, lr=7.66e-05, throughput=3177 tok/s 2025-11-17 03:19:03,302 - INFO - Epoch 1 Step 780 (Global: 780): loss=1.9837, ppl=7.27, grad_norm=1.16, lr=7.74e-05, throughput=3412 tok/s 2025-11-17 03:21:25,021 - INFO - Epoch 1 Step 790 (Global: 790): loss=1.7853, ppl=5.96, grad_norm=1.12, lr=7.83e-05, throughput=3387 tok/s 2025-11-17 03:23:45,965 - INFO - Epoch 1 Step 800 (Global: 800): loss=1.9914, ppl=7.33, grad_norm=1.29, lr=7.92e-05, throughput=3406 tok/s 2025-11-17 03:26:16,701 - INFO - Epoch 1 Step 810 (Global: 810): loss=1.9804, ppl=7.25, grad_norm=1.19, lr=8.00e-05, throughput=3184 tok/s 2025-11-17 03:28:38,331 - INFO - Epoch 1 Step 820 (Global: 820): loss=1.6523, ppl=5.22, grad_norm=1.27, lr=8.09e-05, throughput=3389 tok/s 2025-11-17 03:31:09,140 - INFO - Epoch 1 Step 830 (Global: 830): loss=2.0088, ppl=7.45, grad_norm=5.19, lr=8.18e-05, throughput=3183 tok/s 2025-11-17 03:33:30,397 - INFO - Epoch 1 Step 840 (Global: 840): loss=1.7244, ppl=5.61, grad_norm=21.38, lr=8.26e-05, throughput=3398 tok/s 2025-11-17 03:36:00,418 - INFO - Epoch 1 Step 850 (Global: 850): loss=0.7787, ppl=2.18, grad_norm=9.38, lr=8.35e-05, throughput=3200 tok/s 2025-11-17 03:38:21,233 - INFO - Epoch 1 Step 860 (Global: 860): loss=0.3081, ppl=1.36, grad_norm=2.50, lr=8.44e-05, throughput=3409 tok/s 2025-11-17 03:40:42,217 - INFO - Epoch 1 Step 870 (Global: 870): loss=0.1614, ppl=1.18, grad_norm=1.48, lr=8.52e-05, throughput=3405 tok/s 2025-11-17 03:43:03,313 - INFO - Epoch 1 Step 880 (Global: 880): loss=0.1052, ppl=1.11, grad_norm=1.34, lr=8.61e-05, throughput=3402 tok/s 2025-11-17 03:45:25,208 - INFO - Epoch 1 Step 890 (Global: 890): loss=0.0798, ppl=1.08, grad_norm=1.05, lr=8.69e-05, throughput=3383 tok/s 2025-11-17 03:47:56,443 - INFO - Epoch 1 Step 900 (Global: 900): loss=0.0535, ppl=1.05, grad_norm=0.79, lr=8.78e-05, throughput=3174 tok/s 2025-11-17 03:50:17,971 - INFO - Epoch 1 Step 910 (Global: 910): loss=0.0462, ppl=1.05, grad_norm=0.73, lr=8.87e-05, throughput=3392 tok/s 2025-11-17 03:52:39,161 - INFO - Epoch 1 Step 920 (Global: 920): loss=0.0381, ppl=1.04, grad_norm=0.58, lr=8.95e-05, throughput=3400 tok/s 2025-11-17 03:55:10,092 - INFO - Epoch 1 Step 930 (Global: 930): loss=0.0339, ppl=1.03, grad_norm=0.58, lr=9.04e-05, throughput=3180 tok/s 2025-11-17 03:57:31,153 - INFO - Epoch 1 Step 940 (Global: 940): loss=0.0272, ppl=1.03, grad_norm=0.48, lr=9.13e-05, throughput=3403 tok/s 2025-11-17 03:59:52,254 - INFO - Epoch 1 Step 950 (Global: 950): loss=0.0237, ppl=1.02, grad_norm=0.48, lr=9.21e-05, throughput=3402 tok/s 2025-11-17 04:02:22,919 - INFO - Epoch 1 Step 960 (Global: 960): loss=0.0227, ppl=1.02, grad_norm=0.39, lr=9.30e-05, throughput=3186 tok/s 2025-11-17 04:04:43,576 - INFO - Epoch 1 Step 970 (Global: 970): loss=0.0209, ppl=1.02, grad_norm=0.37, lr=9.39e-05, throughput=3413 tok/s 2025-11-17 04:07:14,196 - INFO - Epoch 1 Step 980 (Global: 980): loss=0.0171, ppl=1.02, grad_norm=0.36, lr=9.47e-05, throughput=3187 tok/s 2025-11-17 04:09:35,226 - INFO - Epoch 1 Step 990 (Global: 990): loss=0.0205, ppl=1.02, grad_norm=0.39, lr=9.56e-05, throughput=3404 tok/s 2025-11-17 04:11:55,970 - INFO - Epoch 1 Step 1000 (Global: 1000): loss=0.0160, ppl=1.02, grad_norm=0.39, lr=9.65e-05, throughput=3411 tok/s 2025-11-17 04:11:55,972 - INFO - Running validation at step 1000... 2025-11-17 04:19:42,141 - INFO - Validation loss: 0.0150, perplexity: 1.02 2025-11-17 04:19:42,142 - INFO - Qualitative metrics (n=5): 2025-11-17 04:19:42,142 - INFO - BLEU: 0.9654 2025-11-17 04:19:42,142 - INFO - METEOR: 0.9887 2025-11-17 04:19:42,142 - INFO - Edit Distance: 0.0089 2025-11-17 04:19:42,143 - INFO - F-measure: 0.9807 2025-11-17 04:19:42,143 - INFO - ====================================================================== 2025-11-17 04:19:42,143 - INFO - Qualitative Evaluation Samples: 2025-11-17 04:19:42,143 - INFO - ====================================================================== 2025-11-17 04:19:42,143 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 04:19:42,143 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 04:19:42,143 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as you-were. But it\'s n...' 2025-11-17 04:19:42,143 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 04:19:42,143 - INFO - ---------------------------------------------------------------------- 2025-11-17 04:19:42,143 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 04:19:42,143 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 04:19:42,144 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 04:19:42,144 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 04:19:42,144 - INFO - ---------------------------------------------------------------------- 2025-11-17 04:19:42,144 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 04:19:42,144 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 04:19:42,144 - INFO - Generated: ' at the meeting Layheaded meth. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and ...' 2025-11-17 04:19:42,144 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 04:19:42,144 - INFO - ---------------------------------------------------------------------- 2025-11-17 04:19:42,144 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 04:19:42,144 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 04:19:42,144 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 04:19:42,145 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 04:19:42,145 - INFO - ---------------------------------------------------------------------- 2025-11-17 04:19:42,145 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 04:19:42,145 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 04:19:42,145 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 04:19:42,145 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 04:19:42,145 - INFO - ---------------------------------------------------------------------- 2025-11-17 04:19:42,147 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_1000.jsonl 2025-11-17 04:20:26,008 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-17 04:20:26,021 - INFO - New best validation loss: 0.0150, perplexity: 1.02 2025-11-17 04:23:02,976 - INFO - Epoch 1 Step 1010 (Global: 1010): loss=0.0121, ppl=1.01, grad_norm=0.31, lr=9.73e-05, throughput=3059 tok/s 2025-11-17 04:25:25,341 - INFO - Epoch 1 Step 1020 (Global: 1020): loss=0.0120, ppl=1.01, grad_norm=0.31, lr=9.82e-05, throughput=3372 tok/s 2025-11-17 04:27:46,529 - INFO - Epoch 1 Step 1030 (Global: 1030): loss=0.0096, ppl=1.01, grad_norm=0.29, lr=9.90e-05, throughput=3400 tok/s 2025-11-17 04:30:07,791 - INFO - Epoch 1 Step 1040 (Global: 1040): loss=0.0122, ppl=1.01, grad_norm=0.30, lr=9.99e-05, throughput=3398 tok/s 2025-11-17 04:32:38,773 - INFO - Epoch 1 Step 1050 (Global: 1050): loss=0.0140, ppl=1.01, grad_norm=0.47, lr=1.00e-04, throughput=3179 tok/s 2025-11-17 04:35:00,673 - INFO - Epoch 1 Step 1060 (Global: 1060): loss=0.0170, ppl=1.02, grad_norm=0.40, lr=1.00e-04, throughput=3383 tok/s 2025-11-17 04:37:20,925 - INFO - Epoch 1 Step 1070 (Global: 1070): loss=0.0128, ppl=1.01, grad_norm=0.32, lr=1.00e-04, throughput=3422 tok/s 2025-11-17 04:39:41,010 - INFO - Epoch 1 Step 1080 (Global: 1080): loss=0.0122, ppl=1.01, grad_norm=0.34, lr=1.00e-04, throughput=3427 tok/s 2025-11-17 04:42:11,194 - INFO - Epoch 1 Step 1090 (Global: 1090): loss=0.0085, ppl=1.01, grad_norm=0.27, lr=1.00e-04, throughput=3196 tok/s 2025-11-17 04:44:31,926 - INFO - Epoch 1 Step 1100 (Global: 1100): loss=0.0072, ppl=1.01, grad_norm=0.25, lr=1.00e-04, throughput=3411 tok/s 2025-11-17 04:47:01,042 - INFO - Epoch 1 Step 1110 (Global: 1110): loss=0.0076, ppl=1.01, grad_norm=0.23, lr=1.00e-04, throughput=3219 tok/s 2025-11-17 04:49:20,966 - INFO - Epoch 1 Step 1120 (Global: 1120): loss=0.0076, ppl=1.01, grad_norm=0.24, lr=1.00e-04, throughput=3430 tok/s 2025-11-17 04:51:51,363 - INFO - Epoch 1 Step 1130 (Global: 1130): loss=0.0068, ppl=1.01, grad_norm=0.22, lr=1.00e-04, throughput=3192 tok/s 2025-11-17 04:54:11,471 - INFO - Epoch 1 Step 1140 (Global: 1140): loss=0.0069, ppl=1.01, grad_norm=0.24, lr=1.00e-04, throughput=3426 tok/s 2025-11-17 04:56:31,597 - INFO - Epoch 1 Step 1150 (Global: 1150): loss=0.0086, ppl=1.01, grad_norm=0.26, lr=1.00e-04, throughput=3426 tok/s 2025-11-17 04:58:51,723 - INFO - Epoch 1 Step 1160 (Global: 1160): loss=0.0073, ppl=1.01, grad_norm=0.25, lr=1.00e-04, throughput=3426 tok/s 2025-11-17 05:01:12,575 - INFO - Epoch 1 Step 1170 (Global: 1170): loss=0.0061, ppl=1.01, grad_norm=0.26, lr=1.00e-04, throughput=3408 tok/s 2025-11-17 05:03:43,151 - INFO - Epoch 1 Step 1180 (Global: 1180): loss=0.0062, ppl=1.01, grad_norm=0.29, lr=9.99e-05, throughput=3188 tok/s 2025-11-17 05:06:03,952 - INFO - Epoch 1 Step 1190 (Global: 1190): loss=0.0062, ppl=1.01, grad_norm=0.27, lr=9.99e-05, throughput=3409 tok/s 2025-11-17 05:08:25,008 - INFO - Epoch 1 Step 1200 (Global: 1200): loss=0.0055, ppl=1.01, grad_norm=0.22, lr=9.99e-05, throughput=3403 tok/s 2025-11-17 05:10:55,655 - INFO - Epoch 1 Step 1210 (Global: 1210): loss=0.0096, ppl=1.01, grad_norm=0.59, lr=9.99e-05, throughput=3186 tok/s 2025-11-17 05:13:16,665 - INFO - Epoch 1 Step 1220 (Global: 1220): loss=0.0067, ppl=1.01, grad_norm=0.24, lr=9.99e-05, throughput=3404 tok/s 2025-11-17 05:15:37,322 - INFO - Epoch 1 Step 1230 (Global: 1230): loss=0.0052, ppl=1.01, grad_norm=0.23, lr=9.99e-05, throughput=3413 tok/s 2025-11-17 05:18:07,284 - INFO - Epoch 1 Step 1240 (Global: 1240): loss=0.0073, ppl=1.01, grad_norm=0.38, lr=9.99e-05, throughput=3201 tok/s 2025-11-17 05:20:28,167 - INFO - Epoch 1 Step 1250 (Global: 1250): loss=0.0065, ppl=1.01, grad_norm=0.24, lr=9.99e-05, throughput=3407 tok/s 2025-11-17 05:22:58,175 - INFO - Epoch 1 Step 1260 (Global: 1260): loss=0.0052, ppl=1.01, grad_norm=0.18, lr=9.99e-05, throughput=3200 tok/s 2025-11-17 05:25:18,891 - INFO - Epoch 1 Step 1270 (Global: 1270): loss=0.0038, ppl=1.00, grad_norm=0.17, lr=9.99e-05, throughput=3411 tok/s 2025-11-17 05:27:39,111 - INFO - Epoch 1 Step 1280 (Global: 1280): loss=0.0056, ppl=1.01, grad_norm=0.21, lr=9.98e-05, throughput=3423 tok/s 2025-11-17 05:29:59,580 - INFO - Epoch 1 Step 1290 (Global: 1290): loss=0.0037, ppl=1.00, grad_norm=0.17, lr=9.98e-05, throughput=3417 tok/s 2025-11-17 05:32:30,026 - INFO - Epoch 1 Step 1300 (Global: 1300): loss=0.0037, ppl=1.00, grad_norm=0.18, lr=9.98e-05, throughput=3191 tok/s 2025-11-17 05:34:50,777 - INFO - Epoch 1 Step 1310 (Global: 1310): loss=0.0033, ppl=1.00, grad_norm=0.16, lr=9.98e-05, throughput=3410 tok/s 2025-11-17 05:37:11,497 - INFO - Epoch 1 Step 1320 (Global: 1320): loss=0.0034, ppl=1.00, grad_norm=0.16, lr=9.98e-05, throughput=3411 tok/s 2025-11-17 05:39:42,088 - INFO - Epoch 1 Step 1330 (Global: 1330): loss=0.0029, ppl=1.00, grad_norm=0.15, lr=9.98e-05, throughput=3188 tok/s 2025-11-17 05:42:02,615 - INFO - Epoch 1 Step 1340 (Global: 1340): loss=0.0060, ppl=1.01, grad_norm=0.23, lr=9.97e-05, throughput=3416 tok/s 2025-11-17 05:44:23,185 - INFO - Epoch 1 Step 1350 (Global: 1350): loss=0.0040, ppl=1.00, grad_norm=0.17, lr=9.97e-05, throughput=3415 tok/s 2025-11-17 05:46:53,217 - INFO - Epoch 1 Step 1360 (Global: 1360): loss=0.0037, ppl=1.00, grad_norm=0.18, lr=9.97e-05, throughput=3199 tok/s 2025-11-17 05:49:13,547 - INFO - Epoch 1 Step 1370 (Global: 1370): loss=0.0030, ppl=1.00, grad_norm=0.15, lr=9.97e-05, throughput=3421 tok/s 2025-11-17 05:51:43,477 - INFO - Epoch 1 Step 1380 (Global: 1380): loss=0.0045, ppl=1.00, grad_norm=0.21, lr=9.97e-05, throughput=3202 tok/s 2025-11-17 05:54:03,562 - INFO - Epoch 1 Step 1390 (Global: 1390): loss=0.0031, ppl=1.00, grad_norm=0.16, lr=9.97e-05, throughput=3427 tok/s 2025-11-17 05:56:23,668 - INFO - Epoch 1 Step 1400 (Global: 1400): loss=0.0037, ppl=1.00, grad_norm=0.20, lr=9.96e-05, throughput=3426 tok/s 2025-11-17 05:58:44,033 - INFO - Epoch 1 Step 1410 (Global: 1410): loss=0.0022, ppl=1.00, grad_norm=0.15, lr=9.96e-05, throughput=3420 tok/s 2025-11-17 06:01:13,644 - INFO - Epoch 1 Step 1420 (Global: 1420): loss=0.0036, ppl=1.00, grad_norm=0.16, lr=9.96e-05, throughput=3208 tok/s 2025-11-17 06:03:34,041 - INFO - Epoch 1 Step 1430 (Global: 1430): loss=0.0028, ppl=1.00, grad_norm=0.14, lr=9.96e-05, throughput=3419 tok/s 2025-11-17 06:05:54,674 - INFO - Epoch 1 Step 1440 (Global: 1440): loss=0.0029, ppl=1.00, grad_norm=0.20, lr=9.96e-05, throughput=3413 tok/s 2025-11-17 06:08:15,533 - INFO - Epoch 1 Step 1450 (Global: 1450): loss=0.0037, ppl=1.00, grad_norm=0.17, lr=9.95e-05, throughput=3408 tok/s 2025-11-17 06:10:46,600 - INFO - Epoch 1 Step 1460 (Global: 1460): loss=0.0037, ppl=1.00, grad_norm=0.26, lr=9.95e-05, throughput=3177 tok/s 2025-11-17 06:13:07,685 - INFO - Epoch 1 Step 1470 (Global: 1470): loss=0.0040, ppl=1.00, grad_norm=0.19, lr=9.95e-05, throughput=3402 tok/s 2025-11-17 06:15:38,095 - INFO - Epoch 1 Step 1480 (Global: 1480): loss=0.0029, ppl=1.00, grad_norm=0.17, lr=9.95e-05, throughput=3191 tok/s 2025-11-17 06:17:58,713 - INFO - Epoch 1 Step 1490 (Global: 1490): loss=0.0043, ppl=1.00, grad_norm=0.27, lr=9.94e-05, throughput=3414 tok/s 2025-11-17 06:20:29,603 - INFO - Epoch 1 Step 1500 (Global: 1500): loss=0.0053, ppl=1.01, grad_norm=0.24, lr=9.94e-05, throughput=3181 tok/s 2025-11-17 06:20:29,605 - INFO - Running validation at step 1500... 2025-11-17 06:28:08,250 - INFO - Validation loss: 0.0032, perplexity: 1.00 2025-11-17 06:28:08,251 - INFO - Qualitative metrics (n=5): 2025-11-17 06:28:08,251 - INFO - BLEU: 0.9923 2025-11-17 06:28:08,251 - INFO - METEOR: 0.9972 2025-11-17 06:28:08,251 - INFO - Edit Distance: 0.0036 2025-11-17 06:28:08,251 - INFO - F-measure: 0.9950 2025-11-17 06:28:08,252 - INFO - ====================================================================== 2025-11-17 06:28:08,252 - INFO - Qualitative Evaluation Samples: 2025-11-17 06:28:08,252 - INFO - ====================================================================== 2025-11-17 06:28:08,252 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 06:28:08,252 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 06:28:08,252 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 06:28:08,252 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 06:28:08,252 - INFO - ---------------------------------------------------------------------- 2025-11-17 06:28:08,252 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 06:28:08,252 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 06:28:08,253 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 06:28:08,253 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 06:28:08,253 - INFO - ---------------------------------------------------------------------- 2025-11-17 06:28:08,253 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 06:28:08,253 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 06:28:08,253 - INFO - Generated: ' at the meeting Layheaded mia. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 06:28:08,253 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 06:28:08,253 - INFO - ---------------------------------------------------------------------- 2025-11-17 06:28:08,254 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 06:28:08,254 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 06:28:08,254 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 06:28:08,254 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 06:28:08,254 - INFO - ---------------------------------------------------------------------- 2025-11-17 06:28:08,254 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 06:28:08,255 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 06:28:08,255 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 06:28:08,255 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 06:28:08,255 - INFO - ---------------------------------------------------------------------- 2025-11-17 06:28:08,256 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_1500.jsonl 2025-11-17 06:28:52,451 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-17 06:28:52,466 - INFO - New best validation loss: 0.0032, perplexity: 1.00 2025-11-17 06:31:12,798 - INFO - Epoch 1 Step 1510 (Global: 1510): loss=0.0038, ppl=1.00, grad_norm=0.18, lr=9.94e-05, throughput=3421 tok/s 2025-11-17 06:33:33,914 - INFO - Epoch 1 Step 1520 (Global: 1520): loss=0.0024, ppl=1.00, grad_norm=0.13, lr=9.94e-05, throughput=3402 tok/s 2025-11-17 06:36:04,678 - INFO - Epoch 1 Step 1530 (Global: 1530): loss=0.0043, ppl=1.00, grad_norm=0.23, lr=9.93e-05, throughput=3184 tok/s 2025-11-17 06:38:25,782 - INFO - Epoch 1 Step 1540 (Global: 1540): loss=0.0044, ppl=1.00, grad_norm=0.20, lr=9.93e-05, throughput=3402 tok/s 2025-11-17 06:40:46,627 - INFO - Epoch 1 Step 1550 (Global: 1550): loss=0.0041, ppl=1.00, grad_norm=0.16, lr=9.93e-05, throughput=3408 tok/s 2025-11-17 06:43:07,190 - INFO - Epoch 1 Step 1560 (Global: 1560): loss=0.0021, ppl=1.00, grad_norm=0.14, lr=9.92e-05, throughput=3415 tok/s 2025-11-17 06:45:37,931 - INFO - Epoch 1 Step 1570 (Global: 1570): loss=0.0020, ppl=1.00, grad_norm=0.11, lr=9.92e-05, throughput=3184 tok/s 2025-11-17 06:47:58,432 - INFO - Epoch 1 Step 1580 (Global: 1580): loss=0.0031, ppl=1.00, grad_norm=0.14, lr=9.92e-05, throughput=3416 tok/s 2025-11-17 06:50:18,884 - INFO - Epoch 1 Step 1590 (Global: 1590): loss=0.0043, ppl=1.00, grad_norm=0.24, lr=9.92e-05, throughput=3418 tok/s 2025-11-17 06:52:49,837 - INFO - Epoch 1 Step 1600 (Global: 1600): loss=0.0023, ppl=1.00, grad_norm=0.15, lr=9.91e-05, throughput=3180 tok/s 2025-11-17 06:55:10,397 - INFO - Epoch 1 Step 1610 (Global: 1610): loss=0.0022, ppl=1.00, grad_norm=0.15, lr=9.91e-05, throughput=3415 tok/s 2025-11-17 06:57:31,127 - INFO - Epoch 1 Step 1620 (Global: 1620): loss=0.0028, ppl=1.00, grad_norm=0.16, lr=9.91e-05, throughput=3411 tok/s 2025-11-17 07:00:01,981 - INFO - Epoch 1 Step 1630 (Global: 1630): loss=0.0025, ppl=1.00, grad_norm=0.14, lr=9.90e-05, throughput=3182 tok/s 2025-11-17 07:02:22,990 - INFO - Epoch 1 Step 1640 (Global: 1640): loss=0.0034, ppl=1.00, grad_norm=0.19, lr=9.90e-05, throughput=3404 tok/s 2025-11-17 07:04:53,220 - INFO - Epoch 1 Step 1650 (Global: 1650): loss=0.0018, ppl=1.00, grad_norm=0.12, lr=9.90e-05, throughput=3195 tok/s 2025-11-17 07:07:28,019 - INFO - Epoch 1 Step 1660 (Global: 1660): loss=0.0023, ppl=1.00, grad_norm=0.15, lr=9.89e-05, throughput=3101 tok/s 2025-11-17 07:11:15,144 - INFO - Epoch 1 Step 1670 (Global: 1670): loss=0.0032, ppl=1.00, grad_norm=0.14, lr=9.89e-05, throughput=2113 tok/s 2025-11-17 07:14:06,319 - INFO - Epoch 1 Step 1680 (Global: 1680): loss=0.0022, ppl=1.00, grad_norm=0.14, lr=9.89e-05, throughput=2804 tok/s 2025-11-17 07:17:36,401 - INFO - Epoch 1 Step 1690 (Global: 1690): loss=0.0024, ppl=1.00, grad_norm=0.12, lr=9.88e-05, throughput=2285 tok/s 2025-11-17 07:20:15,181 - INFO - Epoch 1 Step 1700 (Global: 1700): loss=0.0028, ppl=1.00, grad_norm=0.15, lr=9.88e-05, throughput=3023 tok/s 2025-11-17 07:22:53,908 - INFO - Epoch 1 Step 1710 (Global: 1710): loss=0.0023, ppl=1.00, grad_norm=0.13, lr=9.87e-05, throughput=3024 tok/s 2025-11-17 07:25:46,113 - INFO - Epoch 1 Step 1720 (Global: 1720): loss=0.0038, ppl=1.00, grad_norm=0.21, lr=9.87e-05, throughput=2787 tok/s 2025-11-17 07:28:07,224 - INFO - Epoch 1 Step 1730 (Global: 1730): loss=0.0020, ppl=1.00, grad_norm=0.13, lr=9.87e-05, throughput=3402 tok/s 2025-11-17 07:30:28,084 - INFO - Epoch 1 Step 1740 (Global: 1740): loss=0.0017, ppl=1.00, grad_norm=0.11, lr=9.86e-05, throughput=3408 tok/s 2025-11-17 07:32:59,092 - INFO - Epoch 1 Step 1750 (Global: 1750): loss=0.0019, ppl=1.00, grad_norm=0.13, lr=9.86e-05, throughput=3179 tok/s 2025-11-17 07:35:33,228 - INFO - Epoch 1 Step 1760 (Global: 1760): loss=0.0024, ppl=1.00, grad_norm=0.14, lr=9.86e-05, throughput=3114 tok/s 2025-11-17 07:38:04,297 - INFO - Epoch 1 Step 1770 (Global: 1770): loss=0.0024, ppl=1.00, grad_norm=0.13, lr=9.85e-05, throughput=3177 tok/s 2025-11-17 07:40:25,384 - INFO - Epoch 1 Step 1780 (Global: 1780): loss=0.0021, ppl=1.00, grad_norm=0.12, lr=9.85e-05, throughput=3402 tok/s 2025-11-17 07:42:46,197 - INFO - Epoch 1 Step 1790 (Global: 1790): loss=0.0029, ppl=1.00, grad_norm=0.17, lr=9.84e-05, throughput=3409 tok/s 2025-11-17 07:45:06,824 - INFO - Epoch 1 Step 1800 (Global: 1800): loss=0.0018, ppl=1.00, grad_norm=0.16, lr=9.84e-05, throughput=3413 tok/s 2025-11-17 07:47:36,595 - INFO - Epoch 1 Step 1810 (Global: 1810): loss=0.0018, ppl=1.00, grad_norm=0.13, lr=9.83e-05, throughput=3205 tok/s 2025-11-17 07:49:58,104 - INFO - Epoch 1 Step 1820 (Global: 1820): loss=0.0025, ppl=1.00, grad_norm=0.17, lr=9.83e-05, throughput=3392 tok/s 2025-11-17 07:52:19,254 - INFO - Epoch 1 Step 1830 (Global: 1830): loss=0.0030, ppl=1.00, grad_norm=0.13, lr=9.83e-05, throughput=3401 tok/s 2025-11-17 07:54:40,311 - INFO - Epoch 1 Step 1840 (Global: 1840): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=9.82e-05, throughput=3403 tok/s 2025-11-17 07:57:11,148 - INFO - Epoch 1 Step 1850 (Global: 1850): loss=0.0020, ppl=1.00, grad_norm=0.13, lr=9.82e-05, throughput=3182 tok/s 2025-11-17 07:59:31,906 - INFO - Epoch 1 Step 1860 (Global: 1860): loss=0.0018, ppl=1.00, grad_norm=0.14, lr=9.81e-05, throughput=3410 tok/s 2025-11-17 08:02:02,505 - INFO - Epoch 1 Step 1870 (Global: 1870): loss=0.0016, ppl=1.00, grad_norm=0.13, lr=9.81e-05, throughput=3187 tok/s 2025-11-17 08:04:23,194 - INFO - Epoch 1 Step 1880 (Global: 1880): loss=0.0022, ppl=1.00, grad_norm=0.13, lr=9.80e-05, throughput=3412 tok/s 2025-11-17 08:06:43,940 - INFO - Epoch 1 Step 1890 (Global: 1890): loss=0.0019, ppl=1.00, grad_norm=0.11, lr=9.80e-05, throughput=3410 tok/s 2025-11-17 08:09:14,648 - INFO - Epoch 1 Step 1900 (Global: 1900): loss=0.0032, ppl=1.00, grad_norm=0.17, lr=9.79e-05, throughput=3185 tok/s 2025-11-17 08:11:35,804 - INFO - Epoch 1 Step 1910 (Global: 1910): loss=0.0018, ppl=1.00, grad_norm=0.12, lr=9.79e-05, throughput=3401 tok/s 2025-11-17 08:13:57,407 - INFO - Epoch 1 Step 1920 (Global: 1920): loss=0.0022, ppl=1.00, grad_norm=0.46, lr=9.78e-05, throughput=3390 tok/s 2025-11-17 08:16:18,691 - INFO - Epoch 1 Step 1930 (Global: 1930): loss=0.0028, ppl=1.00, grad_norm=0.15, lr=9.78e-05, throughput=3398 tok/s 2025-11-17 08:18:48,683 - INFO - Epoch 1 Step 1940 (Global: 1940): loss=0.0018, ppl=1.00, grad_norm=0.11, lr=9.77e-05, throughput=3200 tok/s 2025-11-17 08:21:09,600 - INFO - Epoch 1 Step 1950 (Global: 1950): loss=0.0020, ppl=1.00, grad_norm=0.14, lr=9.77e-05, throughput=3406 tok/s 2025-11-17 08:23:31,323 - INFO - Epoch 1 Step 1960 (Global: 1960): loss=0.0016, ppl=1.00, grad_norm=0.12, lr=9.76e-05, throughput=3387 tok/s 2025-11-17 08:26:01,759 - INFO - Epoch 1 Step 1970 (Global: 1970): loss=0.0016, ppl=1.00, grad_norm=0.15, lr=9.76e-05, throughput=3191 tok/s 2025-11-17 08:28:22,688 - INFO - Epoch 1 Step 1980 (Global: 1980): loss=0.0020, ppl=1.00, grad_norm=0.14, lr=9.75e-05, throughput=3406 tok/s 2025-11-17 08:30:43,514 - INFO - Epoch 1 Step 1990 (Global: 1990): loss=0.0017, ppl=1.00, grad_norm=0.12, lr=9.75e-05, throughput=3409 tok/s 2025-11-17 08:33:13,526 - INFO - Epoch 1 Step 2000 (Global: 2000): loss=0.0020, ppl=1.00, grad_norm=0.15, lr=9.74e-05, throughput=3200 tok/s 2025-11-17 08:33:13,528 - INFO - Running validation at step 2000... 2025-11-17 08:40:54,469 - INFO - Validation loss: 0.0019, perplexity: 1.00 2025-11-17 08:40:54,470 - INFO - Qualitative metrics (n=5): 2025-11-17 08:40:54,470 - INFO - BLEU: 1.0000 2025-11-17 08:40:54,470 - INFO - METEOR: 1.0000 2025-11-17 08:40:54,470 - INFO - Edit Distance: 0.0000 2025-11-17 08:40:54,470 - INFO - F-measure: 1.0000 2025-11-17 08:40:54,470 - INFO - ====================================================================== 2025-11-17 08:40:54,470 - INFO - Qualitative Evaluation Samples: 2025-11-17 08:40:54,471 - INFO - ====================================================================== 2025-11-17 08:40:54,471 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 08:40:54,471 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 08:40:54,471 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 08:40:54,471 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 08:40:54,471 - INFO - ---------------------------------------------------------------------- 2025-11-17 08:40:54,471 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 08:40:54,471 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 08:40:54,471 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 08:40:54,471 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 08:40:54,471 - INFO - ---------------------------------------------------------------------- 2025-11-17 08:40:54,472 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 08:40:54,472 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 08:40:54,472 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 08:40:54,472 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 08:40:54,472 - INFO - ---------------------------------------------------------------------- 2025-11-17 08:40:54,472 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 08:40:54,472 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 08:40:54,472 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 08:40:54,472 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 08:40:54,472 - INFO - ---------------------------------------------------------------------- 2025-11-17 08:40:54,473 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 08:40:54,473 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 08:40:54,473 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 08:40:54,473 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 08:40:54,473 - INFO - ---------------------------------------------------------------------- 2025-11-17 08:40:54,474 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_2000.jsonl 2025-11-17 08:41:36,100 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-17 08:41:36,116 - INFO - New best validation loss: 0.0019, perplexity: 1.00 2025-11-17 08:43:56,714 - INFO - Epoch 1 Step 2010 (Global: 2010): loss=0.0020, ppl=1.00, grad_norm=0.14, lr=9.74e-05, throughput=3414 tok/s 2025-11-17 08:46:27,193 - INFO - Epoch 1 Step 2020 (Global: 2020): loss=0.0021, ppl=1.00, grad_norm=0.13, lr=9.73e-05, throughput=3190 tok/s 2025-11-17 08:48:47,885 - INFO - Epoch 1 Step 2030 (Global: 2030): loss=0.0017, ppl=1.00, grad_norm=0.12, lr=9.73e-05, throughput=3412 tok/s 2025-11-17 08:51:18,121 - INFO - Epoch 1 Step 2040 (Global: 2040): loss=0.0015, ppl=1.00, grad_norm=0.11, lr=9.72e-05, throughput=3195 tok/s 2025-11-17 08:53:38,916 - INFO - Epoch 1 Step 2050 (Global: 2050): loss=0.0012, ppl=1.00, grad_norm=0.11, lr=9.72e-05, throughput=3409 tok/s 2025-11-17 08:55:59,922 - INFO - Epoch 1 Step 2060 (Global: 2060): loss=0.0026, ppl=1.00, grad_norm=0.15, lr=9.71e-05, throughput=3404 tok/s 2025-11-17 08:58:22,037 - INFO - Epoch 1 Step 2070 (Global: 2070): loss=0.0031, ppl=1.00, grad_norm=0.16, lr=9.71e-05, throughput=3378 tok/s 2025-11-17 09:00:53,133 - INFO - Epoch 1 Step 2080 (Global: 2080): loss=0.0011, ppl=1.00, grad_norm=0.11, lr=9.70e-05, throughput=3177 tok/s 2025-11-17 09:03:15,109 - INFO - Epoch 1 Step 2090 (Global: 2090): loss=0.0014, ppl=1.00, grad_norm=0.10, lr=9.69e-05, throughput=3381 tok/s 2025-11-17 09:05:36,472 - INFO - Epoch 1 Step 2100 (Global: 2100): loss=0.0014, ppl=1.00, grad_norm=0.10, lr=9.69e-05, throughput=3396 tok/s 2025-11-17 09:07:58,089 - INFO - Epoch 1 Step 2110 (Global: 2110): loss=0.0015, ppl=1.00, grad_norm=0.11, lr=9.68e-05, throughput=3389 tok/s 2025-11-17 09:10:29,260 - INFO - Epoch 1 Step 2120 (Global: 2120): loss=0.0031, ppl=1.00, grad_norm=0.19, lr=9.68e-05, throughput=3175 tok/s 2025-11-17 09:12:50,985 - INFO - Epoch 1 Step 2130 (Global: 2130): loss=0.0018, ppl=1.00, grad_norm=0.10, lr=9.67e-05, throughput=3387 tok/s 2025-11-17 09:15:21,887 - INFO - Epoch 1 Step 2140 (Global: 2140): loss=0.0021, ppl=1.00, grad_norm=0.13, lr=9.66e-05, throughput=3181 tok/s 2025-11-17 09:17:43,369 - INFO - Epoch 1 Step 2150 (Global: 2150): loss=0.0017, ppl=1.00, grad_norm=0.15, lr=9.66e-05, throughput=3393 tok/s 2025-11-17 09:20:06,282 - INFO - Epoch 1 Step 2160 (Global: 2160): loss=0.0012, ppl=1.00, grad_norm=0.10, lr=9.65e-05, throughput=3359 tok/s 2025-11-17 09:22:36,343 - INFO - Epoch 1 Step 2170 (Global: 2170): loss=0.0019, ppl=1.00, grad_norm=0.13, lr=9.65e-05, throughput=3199 tok/s 2025-11-17 09:24:57,138 - INFO - Epoch 1 Step 2180 (Global: 2180): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=9.64e-05, throughput=3409 tok/s 2025-11-17 09:27:18,384 - INFO - Epoch 1 Step 2190 (Global: 2190): loss=0.0015, ppl=1.00, grad_norm=0.13, lr=9.63e-05, throughput=3398 tok/s 2025-11-17 09:29:40,138 - INFO - Epoch 1 Step 2200 (Global: 2200): loss=0.0012, ppl=1.00, grad_norm=0.11, lr=9.63e-05, throughput=3386 tok/s 2025-11-17 09:32:11,149 - INFO - Epoch 1 Step 2210 (Global: 2210): loss=0.0021, ppl=1.00, grad_norm=0.13, lr=9.62e-05, throughput=3179 tok/s 2025-11-17 09:34:32,293 - INFO - Epoch 1 Step 2220 (Global: 2220): loss=0.0017, ppl=1.00, grad_norm=0.12, lr=9.61e-05, throughput=3401 tok/s 2025-11-17 09:36:53,330 - INFO - Epoch 1 Step 2230 (Global: 2230): loss=0.0018, ppl=1.00, grad_norm=0.12, lr=9.61e-05, throughput=3403 tok/s 2025-11-17 09:39:23,575 - INFO - Epoch 1 Step 2240 (Global: 2240): loss=0.0012, ppl=1.00, grad_norm=0.10, lr=9.60e-05, throughput=3195 tok/s 2025-11-17 09:41:44,067 - INFO - Epoch 1 Step 2250 (Global: 2250): loss=0.0016, ppl=1.00, grad_norm=0.12, lr=9.60e-05, throughput=3417 tok/s 2025-11-17 09:44:05,164 - INFO - Epoch 1 Step 2260 (Global: 2260): loss=0.0011, ppl=1.00, grad_norm=0.10, lr=9.59e-05, throughput=3402 tok/s 2025-11-17 09:46:36,386 - INFO - Epoch 1 Step 2270 (Global: 2270): loss=0.0014, ppl=1.00, grad_norm=0.11, lr=9.58e-05, throughput=3174 tok/s 2025-11-17 09:48:58,064 - INFO - Epoch 1 Step 2280 (Global: 2280): loss=0.0009, ppl=1.00, grad_norm=0.09, lr=9.58e-05, throughput=3388 tok/s 2025-11-17 09:51:29,028 - INFO - Epoch 1 Step 2290 (Global: 2290): loss=0.0019, ppl=1.00, grad_norm=0.13, lr=9.57e-05, throughput=3180 tok/s 2025-11-17 09:53:49,827 - INFO - Epoch 1 Step 2300 (Global: 2300): loss=0.0013, ppl=1.00, grad_norm=0.10, lr=9.56e-05, throughput=3409 tok/s 2025-11-17 09:56:10,336 - INFO - Epoch 1 Step 2310 (Global: 2310): loss=0.0013, ppl=1.00, grad_norm=0.10, lr=9.55e-05, throughput=3416 tok/s 2025-11-17 09:58:31,862 - INFO - Epoch 1 Step 2320 (Global: 2320): loss=0.0013, ppl=1.00, grad_norm=0.14, lr=9.55e-05, throughput=3392 tok/s 2025-11-17 10:01:02,657 - INFO - Epoch 1 Step 2330 (Global: 2330): loss=0.0011, ppl=1.00, grad_norm=0.11, lr=9.54e-05, throughput=3183 tok/s 2025-11-17 10:03:24,354 - INFO - Epoch 1 Step 2340 (Global: 2340): loss=0.0011, ppl=1.00, grad_norm=0.08, lr=9.53e-05, throughput=3388 tok/s 2025-11-17 10:05:45,215 - INFO - Epoch 1 Step 2350 (Global: 2350): loss=0.0015, ppl=1.00, grad_norm=0.15, lr=9.53e-05, throughput=3408 tok/s 2025-11-17 10:08:15,497 - INFO - Epoch 1 Step 2360 (Global: 2360): loss=0.0012, ppl=1.00, grad_norm=0.09, lr=9.52e-05, throughput=3194 tok/s 2025-11-17 10:10:36,010 - INFO - Epoch 1 Step 2370 (Global: 2370): loss=0.0011, ppl=1.00, grad_norm=0.10, lr=9.51e-05, throughput=3416 tok/s 2025-11-17 10:12:57,096 - INFO - Epoch 1 Step 2380 (Global: 2380): loss=0.0017, ppl=1.00, grad_norm=0.11, lr=9.51e-05, throughput=3402 tok/s 2025-11-17 10:15:29,825 - INFO - Epoch 1 Step 2390 (Global: 2390): loss=0.0016, ppl=1.00, grad_norm=0.15, lr=9.50e-05, throughput=3143 tok/s 2025-11-17 10:17:51,521 - INFO - Epoch 1 Step 2400 (Global: 2400): loss=0.0022, ppl=1.00, grad_norm=0.12, lr=9.49e-05, throughput=3388 tok/s 2025-11-17 10:20:22,367 - INFO - Epoch 1 Step 2410 (Global: 2410): loss=0.0013, ppl=1.00, grad_norm=0.10, lr=9.48e-05, throughput=3182 tok/s 2025-11-17 10:22:43,664 - INFO - Epoch 1 Step 2420 (Global: 2420): loss=0.0010, ppl=1.00, grad_norm=0.09, lr=9.48e-05, throughput=3397 tok/s 2025-11-17 10:25:04,120 - INFO - Epoch 1 Step 2430 (Global: 2430): loss=0.0014, ppl=1.00, grad_norm=0.14, lr=9.47e-05, throughput=3417 tok/s 2025-11-17 10:27:25,273 - INFO - Epoch 1 Step 2440 (Global: 2440): loss=0.0016, ppl=1.00, grad_norm=0.11, lr=9.46e-05, throughput=3401 tok/s 2025-11-17 10:29:56,999 - INFO - Epoch 1 Step 2450 (Global: 2450): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=9.45e-05, throughput=3164 tok/s 2025-11-17 10:32:18,005 - INFO - Epoch 1 Step 2460 (Global: 2460): loss=0.0018, ppl=1.00, grad_norm=0.20, lr=9.45e-05, throughput=3404 tok/s 2025-11-17 10:34:39,563 - INFO - Epoch 1 Step 2470 (Global: 2470): loss=0.0011, ppl=1.00, grad_norm=0.10, lr=9.44e-05, throughput=3391 tok/s 2025-11-17 10:37:00,434 - INFO - Epoch 1 Step 2480 (Global: 2480): loss=0.0016, ppl=1.00, grad_norm=0.10, lr=9.43e-05, throughput=3407 tok/s 2025-11-17 10:39:31,384 - INFO - Epoch 1 Step 2490 (Global: 2490): loss=0.0015, ppl=1.00, grad_norm=0.09, lr=9.42e-05, throughput=3180 tok/s 2025-11-17 10:41:53,377 - INFO - Epoch 1 Step 2500 (Global: 2500): loss=0.0016, ppl=1.00, grad_norm=0.15, lr=9.41e-05, throughput=3381 tok/s 2025-11-17 10:41:53,379 - INFO - Running validation at step 2500... 2025-11-17 10:49:42,539 - INFO - Validation loss: 0.0014, perplexity: 1.00 2025-11-17 10:49:42,540 - INFO - Qualitative metrics (n=5): 2025-11-17 10:49:42,540 - INFO - BLEU: 1.0000 2025-11-17 10:49:42,540 - INFO - METEOR: 1.0000 2025-11-17 10:49:42,540 - INFO - Edit Distance: 0.0000 2025-11-17 10:49:42,540 - INFO - F-measure: 1.0000 2025-11-17 10:49:42,540 - INFO - ====================================================================== 2025-11-17 10:49:42,540 - INFO - Qualitative Evaluation Samples: 2025-11-17 10:49:42,540 - INFO - ====================================================================== 2025-11-17 10:49:42,541 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 10:49:42,541 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 10:49:42,541 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 10:49:42,541 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 10:49:42,541 - INFO - ---------------------------------------------------------------------- 2025-11-17 10:49:42,541 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 10:49:42,541 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 10:49:42,541 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 10:49:42,541 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 10:49:42,541 - INFO - ---------------------------------------------------------------------- 2025-11-17 10:49:42,542 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 10:49:42,542 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 10:49:42,542 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 10:49:42,542 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 10:49:42,542 - INFO - ---------------------------------------------------------------------- 2025-11-17 10:49:42,542 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 10:49:42,542 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 10:49:42,542 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 10:49:42,542 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 10:49:42,543 - INFO - ---------------------------------------------------------------------- 2025-11-17 10:49:42,543 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 10:49:42,543 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 10:49:42,543 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 10:49:42,543 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 10:49:42,543 - INFO - ---------------------------------------------------------------------- 2025-11-17 10:49:42,544 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_2500.jsonl 2025-11-17 10:50:28,498 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-17 10:50:28,514 - INFO - New best validation loss: 0.0014, perplexity: 1.00 2025-11-17 10:52:59,839 - INFO - Epoch 1 Step 2510 (Global: 2510): loss=0.0009, ppl=1.00, grad_norm=0.07, lr=9.41e-05, throughput=3172 tok/s 2025-11-17 10:55:21,043 - INFO - Epoch 1 Step 2520 (Global: 2520): loss=0.0014, ppl=1.00, grad_norm=0.10, lr=9.40e-05, throughput=3399 tok/s 2025-11-17 10:57:41,994 - INFO - Epoch 1 Step 2530 (Global: 2530): loss=0.0016, ppl=1.00, grad_norm=0.11, lr=9.39e-05, throughput=3406 tok/s 2025-11-17 11:00:03,707 - INFO - Epoch 1 Step 2540 (Global: 2540): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=9.38e-05, throughput=3387 tok/s 2025-11-17 11:02:34,900 - INFO - Epoch 1 Step 2550 (Global: 2550): loss=0.0013, ppl=1.00, grad_norm=0.09, lr=9.37e-05, throughput=3175 tok/s 2025-11-17 11:04:57,572 - INFO - Epoch 1 Step 2560 (Global: 2560): loss=0.0013, ppl=1.00, grad_norm=0.11, lr=9.37e-05, throughput=3364 tok/s 2025-11-17 11:07:20,121 - INFO - Epoch 1 Step 2570 (Global: 2570): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=9.36e-05, throughput=3367 tok/s 2025-11-17 11:09:41,653 - INFO - Epoch 1 Step 2580 (Global: 2580): loss=0.0011, ppl=1.00, grad_norm=0.08, lr=9.35e-05, throughput=3392 tok/s 2025-11-17 11:12:13,151 - INFO - Epoch 1 Step 2590 (Global: 2590): loss=0.0015, ppl=1.00, grad_norm=0.13, lr=9.34e-05, throughput=3168 tok/s 2025-11-17 11:14:35,085 - INFO - Epoch 1 Step 2600 (Global: 2600): loss=0.0013, ppl=1.00, grad_norm=0.11, lr=9.33e-05, throughput=3382 tok/s 2025-11-17 11:17:07,145 - INFO - Epoch 1 Step 2610 (Global: 2610): loss=0.0011, ppl=1.00, grad_norm=0.10, lr=9.32e-05, throughput=3157 tok/s 2025-11-17 11:19:28,928 - INFO - Epoch 1 Step 2620 (Global: 2620): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=9.32e-05, throughput=3386 tok/s 2025-11-17 11:22:00,202 - INFO - Epoch 1 Step 2630 (Global: 2630): loss=0.0023, ppl=1.00, grad_norm=0.17, lr=9.31e-05, throughput=3173 tok/s 2025-11-17 11:24:21,705 - INFO - Epoch 1 Step 2640 (Global: 2640): loss=0.0023, ppl=1.00, grad_norm=0.16, lr=9.30e-05, throughput=3392 tok/s 2025-11-17 11:26:44,180 - INFO - Epoch 1 Step 2650 (Global: 2650): loss=0.0014, ppl=1.00, grad_norm=0.10, lr=9.29e-05, throughput=3369 tok/s 2025-11-17 11:29:07,438 - INFO - Epoch 1 Step 2660 (Global: 2660): loss=0.0015, ppl=1.00, grad_norm=0.14, lr=9.28e-05, throughput=3351 tok/s 2025-11-17 11:31:29,199 - INFO - Epoch 1 Step 2670 (Global: 2670): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=9.27e-05, throughput=3386 tok/s 2025-11-17 11:34:01,119 - INFO - Epoch 1 Step 2680 (Global: 2680): loss=0.0017, ppl=1.00, grad_norm=0.09, lr=9.26e-05, throughput=3160 tok/s 2025-11-17 11:36:23,897 - INFO - Epoch 1 Step 2690 (Global: 2690): loss=0.0025, ppl=1.00, grad_norm=0.15, lr=9.26e-05, throughput=3362 tok/s 2025-11-17 11:38:46,672 - INFO - Epoch 1 Step 2700 (Global: 2700): loss=0.0011, ppl=1.00, grad_norm=0.17, lr=9.25e-05, throughput=3362 tok/s 2025-11-17 11:41:18,483 - INFO - Epoch 1 Step 2710 (Global: 2710): loss=0.0013, ppl=1.00, grad_norm=0.10, lr=9.24e-05, throughput=3162 tok/s 2025-11-17 11:43:41,751 - INFO - Epoch 1 Step 2720 (Global: 2720): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=9.23e-05, throughput=3350 tok/s 2025-11-17 11:46:04,829 - INFO - Epoch 1 Step 2730 (Global: 2730): loss=0.0016, ppl=1.00, grad_norm=0.12, lr=9.22e-05, throughput=3355 tok/s 2025-11-17 11:48:35,026 - INFO - Epoch 1 Step 2740 (Global: 2740): loss=0.0027, ppl=1.00, grad_norm=0.25, lr=9.21e-05, throughput=3196 tok/s 2025-11-17 11:50:57,020 - INFO - Epoch 1 Step 2750 (Global: 2750): loss=0.0031, ppl=1.00, grad_norm=0.25, lr=9.20e-05, throughput=3381 tok/s 2025-11-17 11:53:27,903 - INFO - Epoch 1 Step 2760 (Global: 2760): loss=0.0037, ppl=1.00, grad_norm=0.30, lr=9.19e-05, throughput=3181 tok/s 2025-11-17 11:55:48,582 - INFO - Epoch 1 Step 2770 (Global: 2770): loss=0.0025, ppl=1.00, grad_norm=0.16, lr=9.18e-05, throughput=3412 tok/s 2025-11-17 11:58:09,702 - INFO - Epoch 1 Step 2780 (Global: 2780): loss=0.0031, ppl=1.00, grad_norm=0.16, lr=9.17e-05, throughput=3401 tok/s 2025-11-17 12:00:31,942 - INFO - Epoch 1 Step 2790 (Global: 2790): loss=0.0017, ppl=1.00, grad_norm=0.11, lr=9.17e-05, throughput=3375 tok/s 2025-11-17 12:03:04,037 - INFO - Epoch 1 Step 2800 (Global: 2800): loss=0.0016, ppl=1.00, grad_norm=0.09, lr=9.16e-05, throughput=3156 tok/s 2025-11-17 12:05:27,275 - INFO - Epoch 1 Step 2810 (Global: 2810): loss=0.0019, ppl=1.00, grad_norm=0.13, lr=9.15e-05, throughput=3351 tok/s 2025-11-17 12:07:48,806 - INFO - Epoch 1 Step 2820 (Global: 2820): loss=0.0021, ppl=1.00, grad_norm=0.19, lr=9.14e-05, throughput=3392 tok/s 2025-11-17 12:10:19,878 - INFO - Epoch 1 Step 2830 (Global: 2830): loss=0.0013, ppl=1.00, grad_norm=0.11, lr=9.13e-05, throughput=3177 tok/s 2025-11-17 12:12:40,622 - INFO - Epoch 1 Step 2840 (Global: 2840): loss=0.0020, ppl=1.00, grad_norm=0.15, lr=9.12e-05, throughput=3410 tok/s 2025-11-17 12:15:01,808 - INFO - Epoch 1 Step 2850 (Global: 2850): loss=0.0014, ppl=1.00, grad_norm=0.13, lr=9.11e-05, throughput=3400 tok/s 2025-11-17 12:17:34,711 - INFO - Epoch 1 Step 2860 (Global: 2860): loss=0.0010, ppl=1.00, grad_norm=0.12, lr=9.10e-05, throughput=3139 tok/s 2025-11-17 12:19:57,061 - INFO - Epoch 1 Step 2870 (Global: 2870): loss=0.0010, ppl=1.00, grad_norm=0.09, lr=9.09e-05, throughput=3372 tok/s 2025-11-17 12:22:28,827 - INFO - Epoch 1 Step 2880 (Global: 2880): loss=0.0018, ppl=1.00, grad_norm=0.16, lr=9.08e-05, throughput=3163 tok/s 2025-11-17 12:24:50,935 - INFO - Epoch 1 Step 2890 (Global: 2890): loss=0.0010, ppl=1.00, grad_norm=0.13, lr=9.07e-05, throughput=3378 tok/s 2025-11-17 12:27:13,940 - INFO - Epoch 1 Step 2900 (Global: 2900): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=9.06e-05, throughput=3357 tok/s 2025-11-17 12:29:37,515 - INFO - Epoch 1 Step 2910 (Global: 2910): loss=0.0010, ppl=1.00, grad_norm=0.12, lr=9.05e-05, throughput=3343 tok/s 2025-11-17 12:32:10,443 - INFO - Epoch 1 Step 2920 (Global: 2920): loss=0.0022, ppl=1.00, grad_norm=0.14, lr=9.04e-05, throughput=3139 tok/s 2025-11-17 12:34:34,167 - INFO - Epoch 1 Step 2930 (Global: 2930): loss=0.0014, ppl=1.00, grad_norm=0.10, lr=9.03e-05, throughput=3340 tok/s 2025-11-17 12:36:55,710 - INFO - Epoch 1 Step 2940 (Global: 2940): loss=0.0011, ppl=1.00, grad_norm=0.08, lr=9.02e-05, throughput=3391 tok/s 2025-11-17 12:39:19,064 - INFO - Epoch 1 Step 2950 (Global: 2950): loss=0.0016, ppl=1.00, grad_norm=0.12, lr=9.01e-05, throughput=3348 tok/s 2025-11-17 12:41:59,433 - INFO - Epoch 1 Step 2960 (Global: 2960): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=9.00e-05, throughput=2993 tok/s 2025-11-17 12:44:26,924 - INFO - Epoch 1 Step 2970 (Global: 2970): loss=0.0020, ppl=1.00, grad_norm=0.12, lr=8.99e-05, throughput=3255 tok/s 2025-11-17 12:47:12,061 - INFO - Epoch 1 Step 2980 (Global: 2980): loss=0.0016, ppl=1.00, grad_norm=0.13, lr=8.98e-05, throughput=2907 tok/s 2025-11-17 12:49:40,034 - INFO - Epoch 1 Step 2990 (Global: 2990): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=8.97e-05, throughput=3244 tok/s 2025-11-17 12:52:13,066 - INFO - Epoch 1 Step 3000 (Global: 3000): loss=0.0017, ppl=1.00, grad_norm=0.14, lr=8.96e-05, throughput=3137 tok/s 2025-11-17 12:52:13,070 - INFO - Running validation at step 3000... 2025-11-17 13:00:33,174 - INFO - Validation loss: 0.0012, perplexity: 1.00 2025-11-17 13:00:33,175 - INFO - Qualitative metrics (n=5): 2025-11-17 13:00:33,175 - INFO - BLEU: 1.0000 2025-11-17 13:00:33,175 - INFO - METEOR: 1.0000 2025-11-17 13:00:33,175 - INFO - Edit Distance: 0.0000 2025-11-17 13:00:33,176 - INFO - F-measure: 1.0000 2025-11-17 13:00:33,176 - INFO - ====================================================================== 2025-11-17 13:00:33,176 - INFO - Qualitative Evaluation Samples: 2025-11-17 13:00:33,176 - INFO - ====================================================================== 2025-11-17 13:00:33,176 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 13:00:33,177 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 13:00:33,177 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 13:00:33,177 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 13:00:33,177 - INFO - ---------------------------------------------------------------------- 2025-11-17 13:00:33,178 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 13:00:33,178 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 13:00:33,180 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 13:00:33,180 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 13:00:33,181 - INFO - ---------------------------------------------------------------------- 2025-11-17 13:00:33,181 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 13:00:33,181 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 13:00:33,181 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 13:00:33,181 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 13:00:33,182 - INFO - ---------------------------------------------------------------------- 2025-11-17 13:00:33,182 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 13:00:33,182 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 13:00:33,182 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 13:00:33,183 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 13:00:33,183 - INFO - ---------------------------------------------------------------------- 2025-11-17 13:00:33,183 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 13:00:33,183 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 13:00:33,184 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 13:00:33,184 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 13:00:33,184 - INFO - ---------------------------------------------------------------------- 2025-11-17 13:00:33,186 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_3000.jsonl 2025-11-17 13:01:15,443 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-17 13:01:15,454 - INFO - New best validation loss: 0.0012, perplexity: 1.00 2025-11-17 13:03:46,697 - INFO - Epoch 1 Step 3010 (Global: 3010): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=8.95e-05, throughput=3174 tok/s 2025-11-17 13:06:17,242 - INFO - Epoch 1 Step 3020 (Global: 3020): loss=0.0014, ppl=1.00, grad_norm=0.13, lr=8.94e-05, throughput=3189 tok/s 2025-11-17 13:09:01,481 - INFO - Epoch 1 Step 3030 (Global: 3030): loss=0.0016, ppl=1.00, grad_norm=0.11, lr=8.93e-05, throughput=2923 tok/s 2025-11-17 13:11:31,646 - INFO - Epoch 1 Step 3040 (Global: 3040): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=8.92e-05, throughput=3197 tok/s 2025-11-17 13:14:15,870 - INFO - Epoch 1 Step 3050 (Global: 3050): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=8.91e-05, throughput=2923 tok/s 2025-11-17 13:16:49,071 - INFO - Epoch 1 Step 3060 (Global: 3060): loss=0.0015, ppl=1.00, grad_norm=0.10, lr=8.90e-05, throughput=3133 tok/s 2025-11-17 13:19:21,015 - INFO - Epoch 1 Step 3070 (Global: 3070): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=8.89e-05, throughput=3159 tok/s 2025-11-17 13:21:48,539 - INFO - Epoch 1 Step 3080 (Global: 3080): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=8.88e-05, throughput=3254 tok/s 2025-11-17 13:24:25,284 - INFO - Epoch 1 Step 3090 (Global: 3090): loss=0.0013, ppl=1.00, grad_norm=0.12, lr=8.87e-05, throughput=3062 tok/s 2025-11-17 13:26:49,815 - INFO - Epoch 1 Step 3100 (Global: 3100): loss=0.0013, ppl=1.00, grad_norm=0.15, lr=8.86e-05, throughput=3321 tok/s 2025-11-17 13:29:17,449 - INFO - Epoch 1 Step 3110 (Global: 3110): loss=0.0011, ppl=1.00, grad_norm=0.08, lr=8.85e-05, throughput=3251 tok/s 2025-11-17 13:31:45,381 - INFO - Epoch 1 Step 3120 (Global: 3120): loss=0.0013, ppl=1.00, grad_norm=0.10, lr=8.84e-05, throughput=3245 tok/s 2025-11-17 13:34:25,249 - INFO - Epoch 1 Step 3130 (Global: 3130): loss=0.0007, ppl=1.00, grad_norm=0.10, lr=8.82e-05, throughput=3003 tok/s 2025-11-17 13:36:56,114 - INFO - Epoch 1 Step 3140 (Global: 3140): loss=0.0018, ppl=1.00, grad_norm=0.20, lr=8.81e-05, throughput=3182 tok/s 2025-11-17 13:39:30,070 - INFO - Epoch 1 Step 3150 (Global: 3150): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=8.80e-05, throughput=3118 tok/s 2025-11-17 13:41:50,572 - INFO - Epoch 1 Step 3160 (Global: 3160): loss=0.0027, ppl=1.00, grad_norm=0.17, lr=8.79e-05, throughput=3416 tok/s 2025-11-17 13:44:10,166 - INFO - Epoch 1 Step 3170 (Global: 3170): loss=0.0010, ppl=1.00, grad_norm=0.08, lr=8.78e-05, throughput=3439 tok/s 2025-11-17 13:46:42,594 - INFO - Epoch 1 Step 3180 (Global: 3180): loss=0.0008, ppl=1.00, grad_norm=0.09, lr=8.77e-05, throughput=3149 tok/s 2025-11-17 13:49:03,243 - INFO - Epoch 1 Step 3190 (Global: 3190): loss=0.0010, ppl=1.00, grad_norm=0.08, lr=8.76e-05, throughput=3413 tok/s 2025-11-17 13:51:22,720 - INFO - Epoch 1 Step 3200 (Global: 3200): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=8.75e-05, throughput=3441 tok/s 2025-11-17 13:53:41,257 - INFO - Epoch 1 Step 3210 (Global: 3210): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=8.74e-05, throughput=3465 tok/s 2025-11-17 13:56:12,019 - INFO - Epoch 1 Step 3220 (Global: 3220): loss=0.0014, ppl=1.00, grad_norm=0.10, lr=8.73e-05, throughput=3184 tok/s 2025-11-17 13:58:30,656 - INFO - Epoch 1 Step 3230 (Global: 3230): loss=0.0010, ppl=1.00, grad_norm=0.10, lr=8.71e-05, throughput=3462 tok/s 2025-11-17 14:00:54,342 - INFO - Epoch 1 Step 3240 (Global: 3240): loss=0.0012, ppl=1.00, grad_norm=0.09, lr=8.70e-05, throughput=3341 tok/s 2025-11-17 14:03:24,888 - INFO - Epoch 1 Step 3250 (Global: 3250): loss=0.0011, ppl=1.00, grad_norm=0.19, lr=8.69e-05, throughput=3188 tok/s 2025-11-17 14:05:43,126 - INFO - Epoch 1 Step 3260 (Global: 3260): loss=0.0021, ppl=1.00, grad_norm=0.16, lr=8.68e-05, throughput=3472 tok/s 2025-11-17 14:08:01,797 - INFO - Epoch 1 Step 3270 (Global: 3270): loss=0.0011, ppl=1.00, grad_norm=0.11, lr=8.67e-05, throughput=3461 tok/s 2025-11-17 14:10:32,907 - INFO - Epoch 1 Step 3280 (Global: 3280): loss=0.0012, ppl=1.00, grad_norm=0.10, lr=8.66e-05, throughput=3177 tok/s 2025-11-17 14:12:51,976 - INFO - Epoch 1 Step 3290 (Global: 3290): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=8.65e-05, throughput=3452 tok/s 2025-11-17 14:15:22,897 - INFO - Epoch 1 Step 3300 (Global: 3300): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=8.63e-05, throughput=3181 tok/s 2025-11-17 14:17:41,789 - INFO - Epoch 1 Step 3310 (Global: 3310): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=8.62e-05, throughput=3456 tok/s 2025-11-17 14:20:00,792 - INFO - Epoch 1 Step 3320 (Global: 3320): loss=0.0009, ppl=1.00, grad_norm=0.07, lr=8.61e-05, throughput=3453 tok/s 2025-11-17 14:22:20,632 - INFO - Epoch 1 Step 3330 (Global: 3330): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=8.60e-05, throughput=3433 tok/s 2025-11-17 14:24:50,763 - INFO - Epoch 1 Step 3340 (Global: 3340): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=8.59e-05, throughput=3197 tok/s 2025-11-17 14:27:09,087 - INFO - Epoch 1 Step 3350 (Global: 3350): loss=0.0010, ppl=1.00, grad_norm=0.10, lr=8.58e-05, throughput=3470 tok/s 2025-11-17 14:29:28,015 - INFO - Epoch 1 Step 3360 (Global: 3360): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=8.57e-05, throughput=3455 tok/s 2025-11-17 14:31:59,867 - INFO - Epoch 1 Step 3370 (Global: 3370): loss=0.0008, ppl=1.00, grad_norm=0.10, lr=8.55e-05, throughput=3161 tok/s 2025-11-17 14:34:19,540 - INFO - Epoch 1 Step 3380 (Global: 3380): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=8.54e-05, throughput=3437 tok/s 2025-11-17 14:36:39,424 - INFO - Epoch 1 Step 3390 (Global: 3390): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=8.53e-05, throughput=3431 tok/s 2025-11-17 14:39:09,695 - INFO - Epoch 1 Step 3400 (Global: 3400): loss=0.0010, ppl=1.00, grad_norm=0.20, lr=8.52e-05, throughput=3194 tok/s 2025-11-17 14:41:28,900 - INFO - Epoch 1 Step 3410 (Global: 3410): loss=0.0014, ppl=1.00, grad_norm=0.12, lr=8.51e-05, throughput=3448 tok/s 2025-11-17 14:43:59,373 - INFO - Epoch 1 Step 3420 (Global: 3420): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=8.49e-05, throughput=3190 tok/s 2025-11-17 14:46:17,834 - INFO - Epoch 1 Step 3430 (Global: 3430): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=8.48e-05, throughput=3467 tok/s 2025-11-17 14:48:36,928 - INFO - Epoch 1 Step 3440 (Global: 3440): loss=0.0012, ppl=1.00, grad_norm=0.10, lr=8.47e-05, throughput=3451 tok/s 2025-11-17 14:50:58,159 - INFO - Epoch 1 Step 3450 (Global: 3450): loss=0.0012, ppl=1.00, grad_norm=0.12, lr=8.46e-05, throughput=3399 tok/s 2025-11-17 14:53:29,445 - INFO - Epoch 1 Step 3460 (Global: 3460): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=8.45e-05, throughput=3173 tok/s 2025-11-17 14:55:49,768 - INFO - Epoch 1 Step 3470 (Global: 3470): loss=0.0007, ppl=1.00, grad_norm=0.09, lr=8.43e-05, throughput=3421 tok/s 2025-11-17 14:58:09,252 - INFO - Epoch 1 Step 3480 (Global: 3480): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=8.42e-05, throughput=3441 tok/s 2025-11-17 15:00:30,345 - INFO - Epoch 1 Step 3490 (Global: 3490): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=8.41e-05, throughput=3402 tok/s 2025-11-17 15:03:01,859 - INFO - Epoch 1 Step 3500 (Global: 3500): loss=0.0014, ppl=1.00, grad_norm=0.11, lr=8.40e-05, throughput=3168 tok/s 2025-11-17 15:03:01,862 - INFO - Running validation at step 3500... 2025-11-17 15:10:55,041 - INFO - Validation loss: 0.0010, perplexity: 1.00 2025-11-17 15:10:55,042 - INFO - Qualitative metrics (n=5): 2025-11-17 15:10:55,042 - INFO - BLEU: 1.0000 2025-11-17 15:10:55,042 - INFO - METEOR: 1.0000 2025-11-17 15:10:55,042 - INFO - Edit Distance: 0.0000 2025-11-17 15:10:55,042 - INFO - F-measure: 1.0000 2025-11-17 15:10:55,042 - INFO - ====================================================================== 2025-11-17 15:10:55,042 - INFO - Qualitative Evaluation Samples: 2025-11-17 15:10:55,043 - INFO - ====================================================================== 2025-11-17 15:10:55,043 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 15:10:55,043 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 15:10:55,043 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 15:10:55,043 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 15:10:55,043 - INFO - ---------------------------------------------------------------------- 2025-11-17 15:10:55,043 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 15:10:55,043 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 15:10:55,043 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 15:10:55,043 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 15:10:55,043 - INFO - ---------------------------------------------------------------------- 2025-11-17 15:10:55,044 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 15:10:55,044 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 15:10:55,044 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 15:10:55,044 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 15:10:55,044 - INFO - ---------------------------------------------------------------------- 2025-11-17 15:10:55,044 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 15:10:55,044 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 15:10:55,044 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 15:10:55,045 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 15:10:55,045 - INFO - ---------------------------------------------------------------------- 2025-11-17 15:10:55,045 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 15:10:55,045 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 15:10:55,045 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 15:10:55,045 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 15:10:55,045 - INFO - ---------------------------------------------------------------------- 2025-11-17 15:10:55,046 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_3500.jsonl 2025-11-17 15:11:41,777 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-17 15:11:41,788 - INFO - New best validation loss: 0.0010, perplexity: 1.00 2025-11-17 15:14:07,679 - INFO - Epoch 1 Step 3510 (Global: 3510): loss=0.0016, ppl=1.00, grad_norm=0.10, lr=8.38e-05, throughput=3290 tok/s 2025-11-17 15:16:46,348 - INFO - Epoch 1 Step 3520 (Global: 3520): loss=0.0010, ppl=1.00, grad_norm=0.08, lr=8.37e-05, throughput=3025 tok/s 2025-11-17 15:19:13,123 - INFO - Epoch 1 Step 3530 (Global: 3530): loss=0.0014, ppl=1.00, grad_norm=0.11, lr=8.36e-05, throughput=3270 tok/s 2025-11-17 15:21:38,355 - INFO - Epoch 1 Step 3540 (Global: 3540): loss=0.0011, ppl=1.00, grad_norm=0.08, lr=8.35e-05, throughput=3305 tok/s 2025-11-17 15:24:02,721 - INFO - Epoch 1 Step 3550 (Global: 3550): loss=0.0013, ppl=1.00, grad_norm=0.10, lr=8.33e-05, throughput=3325 tok/s 2025-11-17 15:26:37,262 - INFO - Epoch 1 Step 3560 (Global: 3560): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=8.32e-05, throughput=3106 tok/s 2025-11-17 15:29:03,954 - INFO - Epoch 1 Step 3570 (Global: 3570): loss=0.0011, ppl=1.00, grad_norm=0.11, lr=8.31e-05, throughput=3272 tok/s 2025-11-17 15:31:31,996 - INFO - Epoch 1 Step 3580 (Global: 3580): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=8.30e-05, throughput=3242 tok/s 2025-11-17 15:34:11,381 - INFO - Epoch 1 Step 3590 (Global: 3590): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=8.28e-05, throughput=3012 tok/s 2025-11-17 15:36:37,451 - INFO - Epoch 1 Step 3600 (Global: 3600): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=8.27e-05, throughput=3286 tok/s 2025-11-17 15:39:02,789 - INFO - Epoch 1 Step 3610 (Global: 3610): loss=0.0010, ppl=1.00, grad_norm=0.08, lr=8.26e-05, throughput=3303 tok/s 2025-11-17 15:41:40,978 - INFO - Epoch 1 Step 3620 (Global: 3620): loss=0.0014, ppl=1.00, grad_norm=0.10, lr=8.25e-05, throughput=3034 tok/s 2025-11-17 15:44:03,587 - INFO - Epoch 1 Step 3630 (Global: 3630): loss=0.0013, ppl=1.00, grad_norm=0.09, lr=8.23e-05, throughput=3366 tok/s 2025-11-17 15:46:35,118 - INFO - Epoch 1 Step 3640 (Global: 3640): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=8.22e-05, throughput=3168 tok/s 2025-11-17 15:48:57,139 - INFO - Epoch 1 Step 3650 (Global: 3650): loss=0.0049, ppl=1.00, grad_norm=0.51, lr=8.21e-05, throughput=3380 tok/s 2025-11-17 15:51:19,201 - INFO - Epoch 1 Step 3660 (Global: 3660): loss=0.0051, ppl=1.01, grad_norm=0.20, lr=8.20e-05, throughput=3379 tok/s 2025-11-17 15:53:40,779 - INFO - Epoch 1 Step 3670 (Global: 3670): loss=0.0016, ppl=1.00, grad_norm=0.12, lr=8.18e-05, throughput=3390 tok/s 2025-11-17 15:56:12,139 - INFO - Epoch 1 Step 3680 (Global: 3680): loss=0.0015, ppl=1.00, grad_norm=0.22, lr=8.17e-05, throughput=3171 tok/s 2025-11-17 15:58:34,343 - INFO - Epoch 1 Step 3690 (Global: 3690): loss=0.0017, ppl=1.00, grad_norm=0.13, lr=8.16e-05, throughput=3375 tok/s 2025-11-17 16:00:55,842 - INFO - Epoch 1 Step 3700 (Global: 3700): loss=0.0020, ppl=1.00, grad_norm=0.10, lr=8.14e-05, throughput=3392 tok/s 2025-11-17 16:03:18,415 - INFO - Epoch 1 Step 3710 (Global: 3710): loss=0.0013, ppl=1.00, grad_norm=0.09, lr=8.13e-05, throughput=3367 tok/s 2025-11-17 16:05:50,570 - INFO - Epoch 1 Step 3720 (Global: 3720): loss=0.0010, ppl=1.00, grad_norm=0.08, lr=8.12e-05, throughput=3155 tok/s 2025-11-17 16:08:15,809 - INFO - Epoch 1 Step 3730 (Global: 3730): loss=0.0017, ppl=1.00, grad_norm=0.17, lr=8.10e-05, throughput=3305 tok/s 2025-11-17 16:10:49,017 - INFO - Epoch 1 Step 3740 (Global: 3740): loss=0.0120, ppl=1.01, grad_norm=1.02, lr=8.09e-05, throughput=3133 tok/s 2025-11-17 16:13:12,658 - INFO - Epoch 1 Step 3750 (Global: 3750): loss=0.0031, ppl=1.00, grad_norm=0.16, lr=8.08e-05, throughput=3342 tok/s 2025-11-17 16:15:34,818 - INFO - Epoch 1 Step 3760 (Global: 3760): loss=0.0016, ppl=1.00, grad_norm=0.11, lr=8.06e-05, throughput=3377 tok/s 2025-11-17 16:18:07,741 - INFO - Epoch 1 Step 3770 (Global: 3770): loss=0.0019, ppl=1.00, grad_norm=0.13, lr=8.05e-05, throughput=3139 tok/s 2025-11-17 16:20:31,467 - INFO - Epoch 1 Step 3780 (Global: 3780): loss=0.0016, ppl=1.00, grad_norm=0.11, lr=8.04e-05, throughput=3340 tok/s 2025-11-17 16:22:53,674 - INFO - Epoch 1 Step 3790 (Global: 3790): loss=0.0013, ppl=1.00, grad_norm=0.11, lr=8.02e-05, throughput=3375 tok/s 2025-11-17 16:25:16,542 - INFO - Epoch 1 Step 3800 (Global: 3800): loss=0.0009, ppl=1.00, grad_norm=0.07, lr=8.01e-05, throughput=3360 tok/s 2025-11-17 16:27:48,797 - INFO - Epoch 1 Step 3810 (Global: 3810): loss=0.0015, ppl=1.00, grad_norm=0.10, lr=8.00e-05, throughput=3153 tok/s 2025-11-17 16:30:13,490 - INFO - Epoch 1 Step 3820 (Global: 3820): loss=0.0030, ppl=1.00, grad_norm=0.24, lr=7.98e-05, throughput=3317 tok/s 2025-11-17 16:32:35,471 - INFO - Epoch 1 Step 3830 (Global: 3830): loss=0.0019, ppl=1.00, grad_norm=0.13, lr=7.97e-05, throughput=3381 tok/s 2025-11-17 16:35:07,663 - INFO - Epoch 1 Step 3840 (Global: 3840): loss=0.0016, ppl=1.00, grad_norm=0.12, lr=7.96e-05, throughput=3154 tok/s 2025-11-17 16:37:29,948 - INFO - Epoch 1 Step 3850 (Global: 3850): loss=0.0012, ppl=1.00, grad_norm=0.07, lr=7.94e-05, throughput=3374 tok/s 2025-11-17 16:39:52,315 - INFO - Epoch 1 Step 3860 (Global: 3860): loss=0.0009, ppl=1.00, grad_norm=0.07, lr=7.93e-05, throughput=3372 tok/s 2025-11-17 16:42:23,605 - INFO - Epoch 1 Step 3870 (Global: 3870): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=7.92e-05, throughput=3173 tok/s 2025-11-17 16:44:45,654 - INFO - Epoch 1 Step 3880 (Global: 3880): loss=0.0010, ppl=1.00, grad_norm=0.09, lr=7.90e-05, throughput=3379 tok/s 2025-11-17 16:47:18,056 - INFO - Epoch 1 Step 3890 (Global: 3890): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=7.89e-05, throughput=3150 tok/s 2025-11-17 16:49:40,662 - INFO - Epoch 1 Step 3900 (Global: 3900): loss=0.0009, ppl=1.00, grad_norm=0.07, lr=7.88e-05, throughput=3366 tok/s 2025-11-17 16:52:02,352 - INFO - Epoch 1 Step 3910 (Global: 3910): loss=0.0019, ppl=1.00, grad_norm=0.13, lr=7.86e-05, throughput=3388 tok/s 2025-11-17 16:54:24,116 - INFO - Epoch 1 Step 3920 (Global: 3920): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=7.85e-05, throughput=3386 tok/s 2025-11-17 16:56:57,009 - INFO - Epoch 1 Step 3930 (Global: 3930): loss=0.0007, ppl=1.00, grad_norm=0.09, lr=7.83e-05, throughput=3139 tok/s 2025-11-17 16:59:18,561 - INFO - Epoch 1 Step 3940 (Global: 3940): loss=0.0008, ppl=1.00, grad_norm=0.10, lr=7.82e-05, throughput=3391 tok/s 2025-11-17 17:01:41,164 - INFO - Epoch 1 Step 3950 (Global: 3950): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=7.81e-05, throughput=3366 tok/s 2025-11-17 17:04:13,461 - INFO - Epoch 1 Step 3960 (Global: 3960): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=7.79e-05, throughput=3152 tok/s 2025-11-17 17:06:36,342 - INFO - Epoch 1 Step 3970 (Global: 3970): loss=0.0019, ppl=1.00, grad_norm=0.14, lr=7.78e-05, throughput=3360 tok/s 2025-11-17 17:08:58,578 - INFO - Epoch 1 Step 3980 (Global: 3980): loss=0.0011, ppl=1.00, grad_norm=0.13, lr=7.77e-05, throughput=3375 tok/s 2025-11-17 17:11:30,545 - INFO - Epoch 1 Step 3990 (Global: 3990): loss=0.0013, ppl=1.00, grad_norm=0.11, lr=7.75e-05, throughput=3159 tok/s 2025-11-17 17:13:53,343 - INFO - Epoch 1 Step 4000 (Global: 4000): loss=0.0021, ppl=1.00, grad_norm=0.19, lr=7.74e-05, throughput=3361 tok/s 2025-11-17 17:13:53,345 - INFO - Running validation at step 4000... 2025-11-17 17:21:35,214 - INFO - Validation loss: 0.0014, perplexity: 1.00 2025-11-17 17:21:35,214 - INFO - Qualitative metrics (n=5): 2025-11-17 17:21:35,214 - INFO - BLEU: 0.9923 2025-11-17 17:21:35,214 - INFO - METEOR: 0.9972 2025-11-17 17:21:35,214 - INFO - Edit Distance: 0.0036 2025-11-17 17:21:35,214 - INFO - F-measure: 0.9950 2025-11-17 17:21:35,215 - INFO - ====================================================================== 2025-11-17 17:21:35,215 - INFO - Qualitative Evaluation Samples: 2025-11-17 17:21:35,215 - INFO - ====================================================================== 2025-11-17 17:21:35,215 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 17:21:35,215 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 17:21:35,215 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 17:21:35,215 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 17:21:35,215 - INFO - ---------------------------------------------------------------------- 2025-11-17 17:21:35,216 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 17:21:35,216 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 17:21:35,216 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 17:21:35,216 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 17:21:35,216 - INFO - ---------------------------------------------------------------------- 2025-11-17 17:21:35,216 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 17:21:35,216 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 17:21:35,216 - INFO - Generated: ' at the meeting Layheaded mia. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 17:21:35,216 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 17:21:35,216 - INFO - ---------------------------------------------------------------------- 2025-11-17 17:21:35,216 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 17:21:35,217 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 17:21:35,217 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 17:21:35,217 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 17:21:35,217 - INFO - ---------------------------------------------------------------------- 2025-11-17 17:21:35,217 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 17:21:35,217 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 17:21:35,217 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 17:21:35,217 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 17:21:35,217 - INFO - ---------------------------------------------------------------------- 2025-11-17 17:21:35,221 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_4000.jsonl 2025-11-17 17:24:07,532 - INFO - Epoch 1 Step 4010 (Global: 4010): loss=0.0010, ppl=1.00, grad_norm=0.08, lr=7.72e-05, throughput=3164 tok/s 2025-11-17 17:26:28,870 - INFO - Epoch 1 Step 4020 (Global: 4020): loss=0.0015, ppl=1.00, grad_norm=0.10, lr=7.71e-05, throughput=3396 tok/s 2025-11-17 17:28:50,305 - INFO - Epoch 1 Step 4030 (Global: 4030): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=7.70e-05, throughput=3394 tok/s 2025-11-17 17:31:20,500 - INFO - Epoch 1 Step 4040 (Global: 4040): loss=0.0014, ppl=1.00, grad_norm=0.09, lr=7.68e-05, throughput=3196 tok/s 2025-11-17 17:33:41,274 - INFO - Epoch 1 Step 4050 (Global: 4050): loss=0.0013, ppl=1.00, grad_norm=0.09, lr=7.67e-05, throughput=3410 tok/s 2025-11-17 17:36:02,644 - INFO - Epoch 1 Step 4060 (Global: 4060): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=7.65e-05, throughput=3395 tok/s 2025-11-17 17:38:24,944 - INFO - Epoch 1 Step 4070 (Global: 4070): loss=0.0019, ppl=1.00, grad_norm=0.13, lr=7.64e-05, throughput=3373 tok/s 2025-11-17 17:40:57,506 - INFO - Epoch 1 Step 4080 (Global: 4080): loss=0.0016, ppl=1.00, grad_norm=0.12, lr=7.62e-05, throughput=3146 tok/s 2025-11-17 17:43:20,130 - INFO - Epoch 1 Step 4090 (Global: 4090): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=7.61e-05, throughput=3366 tok/s 2025-11-17 17:45:41,718 - INFO - Epoch 1 Step 4100 (Global: 4100): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=7.60e-05, throughput=3390 tok/s 2025-11-17 17:48:12,711 - INFO - Epoch 1 Step 4110 (Global: 4110): loss=0.0014, ppl=1.00, grad_norm=0.10, lr=7.58e-05, throughput=3179 tok/s 2025-11-17 17:50:33,687 - INFO - Epoch 1 Step 4120 (Global: 4120): loss=0.0012, ppl=1.00, grad_norm=0.10, lr=7.57e-05, throughput=3405 tok/s 2025-11-17 17:52:54,376 - INFO - Epoch 1 Step 4130 (Global: 4130): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=7.55e-05, throughput=3412 tok/s 2025-11-17 17:55:25,182 - INFO - Epoch 1 Step 4140 (Global: 4140): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=7.54e-05, throughput=3183 tok/s 2025-11-17 17:57:46,567 - INFO - Epoch 1 Step 4150 (Global: 4150): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=7.52e-05, throughput=3395 tok/s 2025-11-17 18:00:16,856 - INFO - Epoch 1 Step 4160 (Global: 4160): loss=0.0024, ppl=1.00, grad_norm=0.40, lr=7.51e-05, throughput=3194 tok/s 2025-11-17 18:02:38,354 - INFO - Epoch 1 Step 4170 (Global: 4170): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=7.49e-05, throughput=3392 tok/s 2025-11-17 18:04:59,282 - INFO - Epoch 1 Step 4180 (Global: 4180): loss=0.0022, ppl=1.00, grad_norm=0.22, lr=7.48e-05, throughput=3406 tok/s 2025-11-17 18:07:20,980 - INFO - Epoch 1 Step 4190 (Global: 4190): loss=0.0113, ppl=1.01, grad_norm=1.23, lr=7.47e-05, throughput=3388 tok/s 2025-11-17 18:09:51,573 - INFO - Epoch 1 Step 4200 (Global: 4200): loss=0.0042, ppl=1.00, grad_norm=0.25, lr=7.45e-05, throughput=3187 tok/s 2025-11-17 18:12:13,049 - INFO - Epoch 1 Step 4210 (Global: 4210): loss=0.0021, ppl=1.00, grad_norm=0.12, lr=7.44e-05, throughput=3393 tok/s 2025-11-17 18:14:34,357 - INFO - Epoch 1 Step 4220 (Global: 4220): loss=0.0015, ppl=1.00, grad_norm=0.12, lr=7.42e-05, throughput=3397 tok/s 2025-11-17 18:17:05,890 - INFO - Epoch 1 Step 4230 (Global: 4230): loss=0.0013, ppl=1.00, grad_norm=0.08, lr=7.41e-05, throughput=3168 tok/s 2025-11-17 18:19:27,110 - INFO - Epoch 1 Step 4240 (Global: 4240): loss=0.0015, ppl=1.00, grad_norm=0.08, lr=7.39e-05, throughput=3399 tok/s 2025-11-17 18:21:49,154 - INFO - Epoch 1 Step 4250 (Global: 4250): loss=0.0015, ppl=1.00, grad_norm=0.11, lr=7.38e-05, throughput=3379 tok/s 2025-11-17 18:24:18,948 - INFO - Epoch 1 Step 4260 (Global: 4260): loss=0.0014, ppl=1.00, grad_norm=0.14, lr=7.36e-05, throughput=3204 tok/s 2025-11-17 18:26:39,579 - INFO - Epoch 1 Step 4270 (Global: 4270): loss=0.0011, ppl=1.00, grad_norm=0.07, lr=7.35e-05, throughput=3413 tok/s 2025-11-17 18:29:09,453 - INFO - Epoch 1 Step 4280 (Global: 4280): loss=0.0044, ppl=1.00, grad_norm=0.43, lr=7.33e-05, throughput=3203 tok/s 2025-11-17 18:31:30,781 - INFO - Epoch 1 Step 4290 (Global: 4290): loss=0.0017, ppl=1.00, grad_norm=0.12, lr=7.32e-05, throughput=3396 tok/s 2025-11-17 18:33:53,032 - INFO - Epoch 1 Step 4300 (Global: 4300): loss=0.0028, ppl=1.00, grad_norm=0.16, lr=7.30e-05, throughput=3374 tok/s 2025-11-17 18:36:18,826 - INFO - Epoch 1 Step 4310 (Global: 4310): loss=0.0021, ppl=1.00, grad_norm=0.10, lr=7.29e-05, throughput=3292 tok/s 2025-11-17 18:38:46,873 - INFO - Epoch 1 Step 4320 (Global: 4320): loss=0.0024, ppl=1.00, grad_norm=0.11, lr=7.27e-05, throughput=3242 tok/s 2025-11-17 18:41:24,118 - INFO - Epoch 1 Step 4330 (Global: 4330): loss=0.0013, ppl=1.00, grad_norm=0.12, lr=7.26e-05, throughput=3053 tok/s 2025-11-17 18:43:48,284 - INFO - Epoch 1 Step 4340 (Global: 4340): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=7.24e-05, throughput=3330 tok/s 2025-11-17 18:46:16,564 - INFO - Epoch 1 Step 4350 (Global: 4350): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=7.23e-05, throughput=3237 tok/s 2025-11-17 18:48:49,970 - INFO - Epoch 1 Step 4360 (Global: 4360): loss=0.0013, ppl=1.00, grad_norm=0.10, lr=7.21e-05, throughput=3129 tok/s 2025-11-17 18:51:10,668 - INFO - Epoch 1 Step 4370 (Global: 4370): loss=0.0015, ppl=1.00, grad_norm=0.61, lr=7.20e-05, throughput=3412 tok/s 2025-11-17 18:53:41,330 - INFO - Epoch 1 Step 4380 (Global: 4380): loss=0.0010, ppl=1.00, grad_norm=0.10, lr=7.18e-05, throughput=3186 tok/s 2025-11-17 18:56:02,924 - INFO - Epoch 1 Step 4390 (Global: 4390): loss=0.0007, ppl=1.00, grad_norm=0.09, lr=7.17e-05, throughput=3390 tok/s 2025-11-17 18:58:25,906 - INFO - Epoch 1 Step 4400 (Global: 4400): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=7.15e-05, throughput=3357 tok/s 2025-11-17 19:00:58,391 - INFO - Epoch 1 Step 4410 (Global: 4410): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=7.14e-05, throughput=3148 tok/s 2025-11-17 19:03:19,550 - INFO - Epoch 1 Step 4420 (Global: 4420): loss=0.0010, ppl=1.00, grad_norm=0.07, lr=7.12e-05, throughput=3400 tok/s 2025-11-17 19:05:41,127 - INFO - Epoch 1 Step 4430 (Global: 4430): loss=0.0010, ppl=1.00, grad_norm=0.10, lr=7.11e-05, throughput=3390 tok/s 2025-11-17 19:08:02,352 - INFO - Epoch 1 Step 4440 (Global: 4440): loss=0.0009, ppl=1.00, grad_norm=0.06, lr=7.09e-05, throughput=3399 tok/s 2025-11-17 19:10:32,868 - INFO - Epoch 1 Step 4450 (Global: 4450): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=7.08e-05, throughput=3189 tok/s 2025-11-17 19:12:53,668 - INFO - Epoch 1 Step 4460 (Global: 4460): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=7.06e-05, throughput=3409 tok/s 2025-11-17 19:15:14,625 - INFO - Epoch 1 Step 4470 (Global: 4470): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=7.05e-05, throughput=3405 tok/s 2025-11-17 19:17:46,219 - INFO - Epoch 1 Step 4480 (Global: 4480): loss=0.0023, ppl=1.00, grad_norm=0.17, lr=7.03e-05, throughput=3166 tok/s 2025-11-17 19:20:07,199 - INFO - Epoch 1 Step 4490 (Global: 4490): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=7.02e-05, throughput=3405 tok/s 2025-11-17 19:22:27,826 - INFO - Epoch 1 Step 4500 (Global: 4500): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=7.00e-05, throughput=3413 tok/s 2025-11-17 19:22:27,828 - INFO - Running validation at step 4500... 2025-11-17 19:30:08,691 - INFO - Validation loss: 0.0010, perplexity: 1.00 2025-11-17 19:30:08,692 - INFO - Qualitative metrics (n=5): 2025-11-17 19:30:08,692 - INFO - BLEU: 1.0000 2025-11-17 19:30:08,692 - INFO - METEOR: 1.0000 2025-11-17 19:30:08,692 - INFO - Edit Distance: 0.0000 2025-11-17 19:30:08,692 - INFO - F-measure: 1.0000 2025-11-17 19:30:08,692 - INFO - ====================================================================== 2025-11-17 19:30:08,692 - INFO - Qualitative Evaluation Samples: 2025-11-17 19:30:08,692 - INFO - ====================================================================== 2025-11-17 19:30:08,693 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 19:30:08,693 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 19:30:08,857 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 19:30:08,858 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 19:30:08,858 - INFO - ---------------------------------------------------------------------- 2025-11-17 19:30:08,858 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 19:30:08,858 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 19:30:08,858 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 19:30:08,858 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 19:30:08,859 - INFO - ---------------------------------------------------------------------- 2025-11-17 19:30:08,859 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 19:30:08,859 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 19:30:08,859 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 19:30:08,859 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 19:30:08,859 - INFO - ---------------------------------------------------------------------- 2025-11-17 19:30:08,859 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 19:30:08,859 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 19:30:08,859 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 19:30:08,859 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 19:30:08,860 - INFO - ---------------------------------------------------------------------- 2025-11-17 19:30:08,860 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 19:30:08,860 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 19:30:08,860 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 19:30:08,860 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 19:30:08,861 - INFO - ---------------------------------------------------------------------- 2025-11-17 19:30:08,863 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_4500.jsonl 2025-11-17 19:32:40,822 - INFO - Epoch 1 Step 4510 (Global: 4510): loss=0.0013, ppl=1.00, grad_norm=0.10, lr=6.99e-05, throughput=3172 tok/s 2025-11-17 19:35:02,096 - INFO - Epoch 1 Step 4520 (Global: 4520): loss=0.0011, ppl=1.00, grad_norm=0.17, lr=6.97e-05, throughput=3398 tok/s 2025-11-17 19:37:22,730 - INFO - Epoch 1 Step 4530 (Global: 4530): loss=0.0012, ppl=1.00, grad_norm=0.11, lr=6.96e-05, throughput=3413 tok/s 2025-11-17 19:39:53,403 - INFO - Epoch 1 Step 4540 (Global: 4540): loss=0.0011, ppl=1.00, grad_norm=0.10, lr=6.94e-05, throughput=3186 tok/s 2025-11-17 19:42:13,583 - INFO - Epoch 1 Step 4550 (Global: 4550): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=6.92e-05, throughput=3424 tok/s 2025-11-17 19:44:33,868 - INFO - Epoch 1 Step 4560 (Global: 4560): loss=0.0013, ppl=1.00, grad_norm=0.11, lr=6.91e-05, throughput=3422 tok/s 2025-11-17 19:47:02,921 - INFO - Epoch 1 Step 4570 (Global: 4570): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=6.89e-05, throughput=3220 tok/s 2025-11-17 19:49:23,370 - INFO - Epoch 1 Step 4580 (Global: 4580): loss=0.0017, ppl=1.00, grad_norm=0.10, lr=6.88e-05, throughput=3418 tok/s 2025-11-17 19:51:53,429 - INFO - Epoch 1 Step 4590 (Global: 4590): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=6.86e-05, throughput=3199 tok/s 2025-11-17 19:54:14,223 - INFO - Epoch 1 Step 4600 (Global: 4600): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=6.85e-05, throughput=3409 tok/s 2025-11-17 19:56:34,496 - INFO - Epoch 1 Step 4610 (Global: 4610): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=6.83e-05, throughput=3422 tok/s 2025-11-17 19:58:54,880 - INFO - Epoch 1 Step 4620 (Global: 4620): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=6.82e-05, throughput=3419 tok/s 2025-11-17 20:01:25,205 - INFO - Epoch 1 Step 4630 (Global: 4630): loss=0.0010, ppl=1.00, grad_norm=0.09, lr=6.80e-05, throughput=3193 tok/s 2025-11-17 20:03:45,627 - INFO - Epoch 1 Step 4640 (Global: 4640): loss=0.0012, ppl=1.00, grad_norm=0.08, lr=6.78e-05, throughput=3418 tok/s 2025-11-17 20:06:06,128 - INFO - Epoch 1 Step 4650 (Global: 4650): loss=0.0009, ppl=1.00, grad_norm=0.07, lr=6.77e-05, throughput=3416 tok/s 2025-11-17 20:08:26,804 - INFO - Epoch 1 Step 4660 (Global: 4660): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=6.75e-05, throughput=3412 tok/s 2025-11-17 20:10:56,919 - INFO - Epoch 1 Step 4670 (Global: 4670): loss=0.0013, ppl=1.00, grad_norm=0.08, lr=6.74e-05, throughput=3198 tok/s 2025-11-17 20:13:17,659 - INFO - Epoch 1 Step 4680 (Global: 4680): loss=0.0014, ppl=1.00, grad_norm=0.13, lr=6.72e-05, throughput=3411 tok/s 2025-11-17 20:15:48,516 - INFO - Epoch 1 Step 4690 (Global: 4690): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=6.71e-05, throughput=3182 tok/s 2025-11-17 20:18:10,173 - INFO - Epoch 1 Step 4700 (Global: 4700): loss=0.0014, ppl=1.00, grad_norm=0.09, lr=6.69e-05, throughput=3389 tok/s 2025-11-17 20:20:40,358 - INFO - Epoch 1 Step 4710 (Global: 4710): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=6.67e-05, throughput=3196 tok/s 2025-11-17 20:23:01,589 - INFO - Epoch 1 Step 4720 (Global: 4720): loss=0.0010, ppl=1.00, grad_norm=0.13, lr=6.66e-05, throughput=3399 tok/s 2025-11-17 20:25:22,530 - INFO - Epoch 1 Step 4730 (Global: 4730): loss=0.0005, ppl=1.00, grad_norm=0.09, lr=6.64e-05, throughput=3406 tok/s 2025-11-17 20:27:43,468 - INFO - Epoch 1 Step 4740 (Global: 4740): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=6.63e-05, throughput=3406 tok/s 2025-11-17 20:30:04,495 - INFO - Epoch 1 Step 4750 (Global: 4750): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=6.61e-05, throughput=3404 tok/s 2025-11-17 20:32:35,527 - INFO - Epoch 1 Step 4760 (Global: 4760): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=6.60e-05, throughput=3178 tok/s 2025-11-17 20:34:55,987 - INFO - Epoch 1 Step 4770 (Global: 4770): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=6.58e-05, throughput=3417 tok/s 2025-11-17 20:37:16,526 - INFO - Epoch 1 Step 4780 (Global: 4780): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=6.56e-05, throughput=3415 tok/s 2025-11-17 20:39:47,204 - INFO - Epoch 1 Step 4790 (Global: 4790): loss=0.0012, ppl=1.00, grad_norm=0.12, lr=6.55e-05, throughput=3186 tok/s 2025-11-17 20:42:08,478 - INFO - Epoch 1 Step 4800 (Global: 4800): loss=0.0017, ppl=1.00, grad_norm=0.11, lr=6.53e-05, throughput=3398 tok/s 2025-11-17 20:44:29,853 - INFO - Epoch 1 Step 4810 (Global: 4810): loss=0.0009, ppl=1.00, grad_norm=0.09, lr=6.52e-05, throughput=3395 tok/s 2025-11-17 20:47:00,262 - INFO - Epoch 1 Step 4820 (Global: 4820): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=6.50e-05, throughput=3191 tok/s 2025-11-17 20:49:20,724 - INFO - Epoch 1 Step 4830 (Global: 4830): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=6.48e-05, throughput=3417 tok/s 2025-11-17 20:51:50,989 - INFO - Epoch 1 Step 4840 (Global: 4840): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=6.47e-05, throughput=3194 tok/s 2025-11-17 20:54:11,989 - INFO - Epoch 1 Step 4850 (Global: 4850): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=6.45e-05, throughput=3404 tok/s 2025-11-17 20:56:32,850 - INFO - Epoch 1 Step 4860 (Global: 4860): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=6.44e-05, throughput=3408 tok/s 2025-11-17 20:58:53,291 - INFO - Epoch 1 Step 4870 (Global: 4870): loss=0.0008, ppl=1.00, grad_norm=0.13, lr=6.42e-05, throughput=3418 tok/s 2025-11-17 21:01:24,054 - INFO - Epoch 1 Step 4880 (Global: 4880): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=6.40e-05, throughput=3184 tok/s 2025-11-17 21:03:45,243 - INFO - Epoch 1 Step 4890 (Global: 4890): loss=0.0006, ppl=1.00, grad_norm=0.05, lr=6.39e-05, throughput=3400 tok/s 2025-11-17 21:06:06,671 - INFO - Epoch 1 Step 4900 (Global: 4900): loss=0.0026, ppl=1.00, grad_norm=0.21, lr=6.37e-05, throughput=3394 tok/s 2025-11-17 21:08:37,068 - INFO - Epoch 1 Step 4910 (Global: 4910): loss=0.0015, ppl=1.00, grad_norm=0.12, lr=6.35e-05, throughput=3192 tok/s 2025-11-17 21:10:58,423 - INFO - Epoch 1 Step 4920 (Global: 4920): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=6.34e-05, throughput=3396 tok/s 2025-11-17 21:13:21,610 - INFO - Epoch 1 Step 4930 (Global: 4930): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=6.32e-05, throughput=3352 tok/s 2025-11-17 21:15:55,169 - INFO - Epoch 1 Step 4940 (Global: 4940): loss=0.0005, ppl=1.00, grad_norm=0.08, lr=6.31e-05, throughput=3126 tok/s 2025-11-17 21:18:17,471 - INFO - Epoch 1 Step 4950 (Global: 4950): loss=0.0009, ppl=1.00, grad_norm=0.07, lr=6.29e-05, throughput=3373 tok/s 2025-11-17 21:20:49,134 - INFO - Epoch 1 Step 4960 (Global: 4960): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=6.27e-05, throughput=3165 tok/s 2025-11-17 21:23:15,120 - INFO - Epoch 1 Step 4970 (Global: 4970): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=6.26e-05, throughput=3288 tok/s 2025-11-17 21:25:39,349 - INFO - Epoch 1 Step 4980 (Global: 4980): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=6.24e-05, throughput=3328 tok/s 2025-11-17 21:28:01,920 - INFO - Epoch 1 Step 4990 (Global: 4990): loss=0.0006, ppl=1.00, grad_norm=0.11, lr=6.23e-05, throughput=3367 tok/s 2025-11-17 21:30:33,641 - INFO - Epoch 1 Step 5000 (Global: 5000): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=6.21e-05, throughput=3164 tok/s 2025-11-17 21:30:33,643 - INFO - Running validation at step 5000... 2025-11-17 21:38:20,993 - INFO - Validation loss: 0.0008, perplexity: 1.00 2025-11-17 21:38:20,993 - INFO - Qualitative metrics (n=5): 2025-11-17 21:38:20,993 - INFO - BLEU: 1.0000 2025-11-17 21:38:20,993 - INFO - METEOR: 1.0000 2025-11-17 21:38:20,993 - INFO - Edit Distance: 0.0000 2025-11-17 21:38:20,994 - INFO - F-measure: 1.0000 2025-11-17 21:38:20,994 - INFO - ====================================================================== 2025-11-17 21:38:20,994 - INFO - Qualitative Evaluation Samples: 2025-11-17 21:38:20,994 - INFO - ====================================================================== 2025-11-17 21:38:20,994 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 21:38:20,994 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 21:38:20,994 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 21:38:20,994 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 21:38:20,994 - INFO - ---------------------------------------------------------------------- 2025-11-17 21:38:20,994 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 21:38:20,994 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 21:38:20,995 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 21:38:20,995 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 21:38:20,995 - INFO - ---------------------------------------------------------------------- 2025-11-17 21:38:20,995 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 21:38:20,995 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 21:38:20,995 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 21:38:20,995 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 21:38:20,995 - INFO - ---------------------------------------------------------------------- 2025-11-17 21:38:20,995 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 21:38:20,995 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 21:38:20,995 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 21:38:20,996 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 21:38:20,996 - INFO - ---------------------------------------------------------------------- 2025-11-17 21:38:20,996 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 21:38:20,996 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 21:38:20,996 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 21:38:20,996 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 21:38:20,996 - INFO - ---------------------------------------------------------------------- 2025-11-17 21:38:20,997 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_5000.jsonl 2025-11-17 21:39:01,869 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-17 21:39:01,886 - INFO - New best validation loss: 0.0008, perplexity: 1.00 2025-11-17 21:41:23,812 - INFO - Epoch 1 Step 5010 (Global: 5010): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=6.19e-05, throughput=3382 tok/s 2025-11-17 21:43:44,924 - INFO - Epoch 1 Step 5020 (Global: 5020): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=6.18e-05, throughput=3402 tok/s 2025-11-17 21:46:16,289 - INFO - Epoch 1 Step 5030 (Global: 5030): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=6.16e-05, throughput=3171 tok/s 2025-11-17 21:48:37,991 - INFO - Epoch 1 Step 5040 (Global: 5040): loss=0.0007, ppl=1.00, grad_norm=0.09, lr=6.14e-05, throughput=3387 tok/s 2025-11-17 21:50:59,604 - INFO - Epoch 1 Step 5050 (Global: 5050): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=6.13e-05, throughput=3390 tok/s 2025-11-17 21:53:31,335 - INFO - Epoch 1 Step 5060 (Global: 5060): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=6.11e-05, throughput=3164 tok/s 2025-11-17 21:55:52,801 - INFO - Epoch 1 Step 5070 (Global: 5070): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=6.10e-05, throughput=3393 tok/s 2025-11-17 21:58:23,512 - INFO - Epoch 1 Step 5080 (Global: 5080): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=6.08e-05, throughput=3185 tok/s 2025-11-17 22:00:45,428 - INFO - Epoch 1 Step 5090 (Global: 5090): loss=0.0011, ppl=1.00, grad_norm=0.10, lr=6.06e-05, throughput=3382 tok/s 2025-11-17 22:03:09,578 - INFO - Epoch 1 Step 5100 (Global: 5100): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=6.05e-05, throughput=3330 tok/s 2025-11-17 22:05:34,512 - INFO - Epoch 1 Step 5110 (Global: 5110): loss=0.0010, ppl=1.00, grad_norm=0.17, lr=6.03e-05, throughput=3312 tok/s 2025-11-17 22:07:57,403 - INFO - Epoch 1 Step 5120 (Global: 5120): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=6.01e-05, throughput=3359 tok/s 2025-11-17 22:10:30,253 - INFO - Epoch 1 Step 5130 (Global: 5130): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=6.00e-05, throughput=3140 tok/s 2025-11-17 22:12:53,816 - INFO - Epoch 1 Step 5140 (Global: 5140): loss=0.0013, ppl=1.00, grad_norm=0.19, lr=5.98e-05, throughput=3344 tok/s 2025-11-17 22:15:19,178 - INFO - Epoch 1 Step 5150 (Global: 5150): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=5.96e-05, throughput=3302 tok/s 2025-11-17 22:17:53,726 - INFO - Epoch 1 Step 5160 (Global: 5160): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=5.95e-05, throughput=3106 tok/s 2025-11-17 22:20:17,951 - INFO - Epoch 1 Step 5170 (Global: 5170): loss=0.0013, ppl=1.00, grad_norm=0.10, lr=5.93e-05, throughput=3328 tok/s 2025-11-17 22:22:53,444 - INFO - Epoch 1 Step 5180 (Global: 5180): loss=0.0007, ppl=1.00, grad_norm=0.10, lr=5.91e-05, throughput=3087 tok/s 2025-11-17 22:25:18,433 - INFO - Epoch 1 Step 5190 (Global: 5190): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=5.90e-05, throughput=3311 tok/s 2025-11-17 22:27:42,552 - INFO - Epoch 1 Step 5200 (Global: 5200): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=5.88e-05, throughput=3331 tok/s 2025-11-17 22:30:15,757 - INFO - Epoch 1 Step 5210 (Global: 5210): loss=0.0027, ppl=1.00, grad_norm=0.24, lr=5.87e-05, throughput=3133 tok/s 2025-11-17 22:32:38,913 - INFO - Epoch 1 Step 5220 (Global: 5220): loss=0.0009, ppl=1.00, grad_norm=0.07, lr=5.85e-05, throughput=3353 tok/s 2025-11-17 22:35:01,250 - INFO - Epoch 1 Step 5230 (Global: 5230): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=5.83e-05, throughput=3372 tok/s 2025-11-17 22:37:23,477 - INFO - Epoch 1 Step 5240 (Global: 5240): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=5.82e-05, throughput=3375 tok/s 2025-11-17 22:39:53,932 - INFO - Epoch 1 Step 5250 (Global: 5250): loss=0.0003, ppl=1.00, grad_norm=0.03, lr=5.80e-05, throughput=3194 tok/s 2025-11-17 22:42:15,316 - INFO - Epoch 1 Step 5260 (Global: 5260): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=5.78e-05, throughput=3395 tok/s 2025-11-17 22:44:36,512 - INFO - Epoch 1 Step 5270 (Global: 5270): loss=0.0011, ppl=1.00, grad_norm=0.11, lr=5.77e-05, throughput=3400 tok/s 2025-11-17 22:47:07,759 - INFO - Epoch 1 Step 5280 (Global: 5280): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=5.75e-05, throughput=3174 tok/s 2025-11-17 22:49:31,572 - INFO - Epoch 1 Step 5290 (Global: 5290): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=5.73e-05, throughput=3338 tok/s 2025-11-17 22:52:21,567 - INFO - Epoch 1 Step 5300 (Global: 5300): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=5.72e-05, throughput=2824 tok/s 2025-11-17 22:54:53,908 - INFO - Epoch 1 Step 5310 (Global: 5310): loss=0.0008, ppl=1.00, grad_norm=0.06, lr=5.70e-05, throughput=3151 tok/s 2025-11-17 22:57:15,315 - INFO - Epoch 1 Step 5320 (Global: 5320): loss=0.0017, ppl=1.00, grad_norm=0.09, lr=5.68e-05, throughput=3394 tok/s 2025-11-17 22:59:47,289 - INFO - Epoch 1 Step 5330 (Global: 5330): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=5.67e-05, throughput=3158 tok/s 2025-11-17 23:02:10,427 - INFO - Epoch 1 Step 5340 (Global: 5340): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=5.65e-05, throughput=3353 tok/s 2025-11-17 23:04:33,179 - INFO - Epoch 1 Step 5350 (Global: 5350): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=5.63e-05, throughput=3363 tok/s 2025-11-17 23:06:57,850 - INFO - Epoch 1 Step 5360 (Global: 5360): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=5.62e-05, throughput=3318 tok/s 2025-11-17 23:09:32,210 - INFO - Epoch 1 Step 5370 (Global: 5370): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=5.60e-05, throughput=3110 tok/s 2025-11-17 23:11:55,383 - INFO - Epoch 1 Step 5380 (Global: 5380): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=5.58e-05, throughput=3353 tok/s 2025-11-17 23:14:17,232 - INFO - Epoch 1 Step 5390 (Global: 5390): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=5.57e-05, throughput=3384 tok/s 2025-11-17 23:16:41,257 - INFO - Epoch 1 Step 5400 (Global: 5400): loss=0.0010, ppl=1.00, grad_norm=0.08, lr=5.55e-05, throughput=3333 tok/s 2025-11-17 23:19:13,433 - INFO - Epoch 1 Step 5410 (Global: 5410): loss=0.0009, ppl=1.00, grad_norm=0.09, lr=5.53e-05, throughput=3154 tok/s 2025-11-17 23:21:37,793 - INFO - Epoch 1 Step 5420 (Global: 5420): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=5.52e-05, throughput=3325 tok/s 2025-11-17 23:24:10,260 - INFO - Epoch 1 Step 5430 (Global: 5430): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=5.50e-05, throughput=3148 tok/s 2025-11-17 23:26:34,813 - INFO - Epoch 1 Step 5440 (Global: 5440): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=5.48e-05, throughput=3321 tok/s 2025-11-17 23:29:07,820 - INFO - Epoch 1 Step 5450 (Global: 5450): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=5.47e-05, throughput=3137 tok/s 2025-11-17 23:31:32,508 - INFO - Epoch 1 Step 5460 (Global: 5460): loss=0.0011, ppl=1.00, grad_norm=0.11, lr=5.45e-05, throughput=3318 tok/s 2025-11-17 23:33:56,455 - INFO - Epoch 1 Step 5470 (Global: 5470): loss=0.0009, ppl=1.00, grad_norm=0.09, lr=5.43e-05, throughput=3335 tok/s 2025-11-17 23:36:21,866 - INFO - Epoch 1 Step 5480 (Global: 5480): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=5.42e-05, throughput=3301 tok/s 2025-11-17 23:38:46,738 - INFO - Epoch 1 Step 5490 (Global: 5490): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=5.40e-05, throughput=3313 tok/s 2025-11-17 23:41:20,060 - INFO - Epoch 1 Step 5500 (Global: 5500): loss=0.0011, ppl=1.00, grad_norm=0.11, lr=5.38e-05, throughput=3131 tok/s 2025-11-17 23:41:20,062 - INFO - Running validation at step 5500... 2025-11-17 23:49:02,523 - INFO - Validation loss: 0.0007, perplexity: 1.00 2025-11-17 23:49:02,523 - INFO - Qualitative metrics (n=5): 2025-11-17 23:49:02,524 - INFO - BLEU: 1.0000 2025-11-17 23:49:02,524 - INFO - METEOR: 1.0000 2025-11-17 23:49:02,524 - INFO - Edit Distance: 0.0000 2025-11-17 23:49:02,524 - INFO - F-measure: 1.0000 2025-11-17 23:49:02,524 - INFO - ====================================================================== 2025-11-17 23:49:02,525 - INFO - Qualitative Evaluation Samples: 2025-11-17 23:49:02,525 - INFO - ====================================================================== 2025-11-17 23:49:02,525 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-17 23:49:02,525 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 23:49:02,525 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 23:49:02,526 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-17 23:49:02,526 - INFO - ---------------------------------------------------------------------- 2025-11-17 23:49:02,526 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-17 23:49:02,526 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 23:49:02,526 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 23:49:02,526 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-17 23:49:02,527 - INFO - ---------------------------------------------------------------------- 2025-11-17 23:49:02,527 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-17 23:49:02,527 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 23:49:02,527 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 23:49:02,527 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-17 23:49:02,528 - INFO - ---------------------------------------------------------------------- 2025-11-17 23:49:02,528 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-17 23:49:02,528 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 23:49:02,528 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 23:49:02,528 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-17 23:49:02,528 - INFO - ---------------------------------------------------------------------- 2025-11-17 23:49:02,529 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-17 23:49:02,529 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-17 23:49:02,529 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 23:49:02,529 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-17 23:49:02,529 - INFO - ---------------------------------------------------------------------- 2025-11-17 23:49:02,530 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_5500.jsonl 2025-11-17 23:49:47,885 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-17 23:49:47,901 - INFO - New best validation loss: 0.0007, perplexity: 1.00 2025-11-17 23:52:10,140 - INFO - Epoch 1 Step 5510 (Global: 5510): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=5.37e-05, throughput=3375 tok/s 2025-11-17 23:54:32,076 - INFO - Epoch 1 Step 5520 (Global: 5520): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=5.35e-05, throughput=3382 tok/s 2025-11-17 23:57:03,131 - INFO - Epoch 1 Step 5530 (Global: 5530): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=5.33e-05, throughput=3178 tok/s 2025-11-17 23:59:25,832 - INFO - Epoch 1 Step 5540 (Global: 5540): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=5.32e-05, throughput=3364 tok/s 2025-11-18 00:01:48,308 - INFO - Epoch 1 Step 5550 (Global: 5550): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=5.30e-05, throughput=3369 tok/s 2025-11-18 00:04:10,950 - INFO - Epoch 1 Step 5560 (Global: 5560): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=5.28e-05, throughput=3365 tok/s 2025-11-18 00:06:46,264 - INFO - Epoch 1 Step 5570 (Global: 5570): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=5.27e-05, throughput=3091 tok/s 2025-11-18 00:09:10,595 - INFO - Epoch 1 Step 5580 (Global: 5580): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=5.25e-05, throughput=3326 tok/s 2025-11-18 00:11:45,552 - INFO - Epoch 1 Step 5590 (Global: 5590): loss=0.0010, ppl=1.00, grad_norm=0.13, lr=5.23e-05, throughput=3098 tok/s 2025-11-18 00:14:09,300 - INFO - Epoch 1 Step 5600 (Global: 5600): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=5.22e-05, throughput=3339 tok/s 2025-11-18 00:16:44,664 - INFO - Epoch 1 Step 5610 (Global: 5610): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=5.20e-05, throughput=3090 tok/s 2025-11-18 00:19:09,412 - INFO - Epoch 1 Step 5620 (Global: 5620): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=5.18e-05, throughput=3316 tok/s 2025-11-18 00:21:32,856 - INFO - Epoch 1 Step 5630 (Global: 5630): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=5.17e-05, throughput=3346 tok/s 2025-11-18 00:23:56,890 - INFO - Epoch 1 Step 5640 (Global: 5640): loss=0.0008, ppl=1.00, grad_norm=0.06, lr=5.15e-05, throughput=3333 tok/s 2025-11-18 00:26:21,566 - INFO - Epoch 1 Step 5650 (Global: 5650): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=5.13e-05, throughput=3318 tok/s 2025-11-18 00:28:53,767 - INFO - Epoch 1 Step 5660 (Global: 5660): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=5.12e-05, throughput=3154 tok/s 2025-11-18 00:31:17,013 - INFO - Epoch 1 Step 5670 (Global: 5670): loss=0.0016, ppl=1.00, grad_norm=0.13, lr=5.10e-05, throughput=3351 tok/s 2025-11-18 00:33:43,210 - INFO - Epoch 1 Step 5680 (Global: 5680): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=5.08e-05, throughput=3283 tok/s 2025-11-18 00:36:19,653 - INFO - Epoch 1 Step 5690 (Global: 5690): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=5.07e-05, throughput=3068 tok/s 2025-11-18 00:38:45,299 - INFO - Epoch 1 Step 5700 (Global: 5700): loss=0.0008, ppl=1.00, grad_norm=0.14, lr=5.05e-05, throughput=3296 tok/s 2025-11-18 00:41:08,061 - INFO - Epoch 1 Step 5710 (Global: 5710): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=5.03e-05, throughput=3362 tok/s 2025-11-18 00:43:39,628 - INFO - Epoch 1 Step 5720 (Global: 5720): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=5.02e-05, throughput=3167 tok/s 2025-11-18 00:46:01,907 - INFO - Epoch 1 Step 5730 (Global: 5730): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=5.00e-05, throughput=3374 tok/s 2025-11-18 00:48:34,043 - INFO - Epoch 1 Step 5740 (Global: 5740): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=4.98e-05, throughput=3155 tok/s 2025-11-18 00:50:58,347 - INFO - Epoch 1 Step 5750 (Global: 5750): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=4.96e-05, throughput=3326 tok/s 2025-11-18 00:53:21,014 - INFO - Epoch 1 Step 5760 (Global: 5760): loss=0.0006, ppl=1.00, grad_norm=0.05, lr=4.95e-05, throughput=3365 tok/s 2025-11-18 00:55:43,131 - INFO - Epoch 1 Step 5770 (Global: 5770): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=4.93e-05, throughput=3378 tok/s 2025-11-18 00:58:13,997 - INFO - Epoch 1 Step 5780 (Global: 5780): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=4.91e-05, throughput=3182 tok/s 2025-11-18 01:00:35,135 - INFO - Epoch 1 Step 5790 (Global: 5790): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=4.90e-05, throughput=3401 tok/s 2025-11-18 01:03:01,483 - INFO - Epoch 1 Step 5800 (Global: 5800): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=4.88e-05, throughput=3280 tok/s 2025-11-18 01:05:41,588 - INFO - Epoch 1 Step 5810 (Global: 5810): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=4.86e-05, throughput=2998 tok/s 2025-11-18 01:08:05,665 - INFO - Epoch 1 Step 5820 (Global: 5820): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=4.85e-05, throughput=3332 tok/s 2025-11-18 01:10:30,548 - INFO - Epoch 1 Step 5830 (Global: 5830): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=4.83e-05, throughput=3313 tok/s 2025-11-18 01:13:04,258 - INFO - Epoch 1 Step 5840 (Global: 5840): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=4.81e-05, throughput=3123 tok/s 2025-11-18 01:15:27,770 - INFO - Epoch 1 Step 5850 (Global: 5850): loss=0.0004, ppl=1.00, grad_norm=0.07, lr=4.80e-05, throughput=3345 tok/s 2025-11-18 01:18:01,708 - INFO - Epoch 1 Step 5860 (Global: 5860): loss=0.0007, ppl=1.00, grad_norm=0.16, lr=4.78e-05, throughput=3118 tok/s 2025-11-18 01:20:27,126 - INFO - Epoch 1 Step 5870 (Global: 5870): loss=0.0005, ppl=1.00, grad_norm=0.11, lr=4.76e-05, throughput=3301 tok/s 2025-11-18 01:22:52,960 - INFO - Epoch 1 Step 5880 (Global: 5880): loss=0.0012, ppl=1.00, grad_norm=0.15, lr=4.75e-05, throughput=3291 tok/s 2025-11-18 01:25:21,470 - INFO - Epoch 1 Step 5890 (Global: 5890): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=4.73e-05, throughput=3232 tok/s 2025-11-18 01:27:58,542 - INFO - Epoch 1 Step 5900 (Global: 5900): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=4.71e-05, throughput=3056 tok/s 2025-11-18 01:30:23,313 - INFO - Epoch 1 Step 5910 (Global: 5910): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=4.70e-05, throughput=3316 tok/s 2025-11-18 01:32:57,769 - INFO - Epoch 1 Step 5920 (Global: 5920): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=4.68e-05, throughput=3108 tok/s 2025-11-18 01:35:22,052 - INFO - Epoch 1 Step 5930 (Global: 5930): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=4.66e-05, throughput=3327 tok/s 2025-11-18 01:37:56,357 - INFO - Epoch 1 Step 5940 (Global: 5940): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=4.65e-05, throughput=3111 tok/s 2025-11-18 01:40:20,314 - INFO - Epoch 1 Step 5950 (Global: 5950): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=4.63e-05, throughput=3334 tok/s 2025-11-18 01:42:53,454 - INFO - Epoch 1 Step 5960 (Global: 5960): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=4.61e-05, throughput=3134 tok/s 2025-11-18 01:45:18,679 - INFO - Epoch 1 Step 5970 (Global: 5970): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=4.60e-05, throughput=3305 tok/s 2025-11-18 01:47:53,159 - INFO - Epoch 1 Step 5980 (Global: 5980): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=4.58e-05, throughput=3107 tok/s 2025-11-18 01:50:18,291 - INFO - Epoch 1 Step 5990 (Global: 5990): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=4.56e-05, throughput=3307 tok/s 2025-11-18 01:52:43,418 - INFO - Epoch 1 Step 6000 (Global: 6000): loss=0.0004, ppl=1.00, grad_norm=0.09, lr=4.55e-05, throughput=3307 tok/s 2025-11-18 01:52:43,422 - INFO - Running validation at step 6000... 2025-11-18 02:00:35,526 - INFO - Validation loss: 0.0006, perplexity: 1.00 2025-11-18 02:00:35,527 - INFO - Qualitative metrics (n=5): 2025-11-18 02:00:35,527 - INFO - BLEU: 1.0000 2025-11-18 02:00:35,527 - INFO - METEOR: 1.0000 2025-11-18 02:00:35,527 - INFO - Edit Distance: 0.0000 2025-11-18 02:00:35,527 - INFO - F-measure: 1.0000 2025-11-18 02:00:35,527 - INFO - ====================================================================== 2025-11-18 02:00:35,528 - INFO - Qualitative Evaluation Samples: 2025-11-18 02:00:35,528 - INFO - ====================================================================== 2025-11-18 02:00:35,528 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-18 02:00:35,528 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 02:00:35,528 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 02:00:35,528 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 02:00:35,528 - INFO - ---------------------------------------------------------------------- 2025-11-18 02:00:35,529 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-18 02:00:35,529 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 02:00:35,529 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 02:00:35,529 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 02:00:35,529 - INFO - ---------------------------------------------------------------------- 2025-11-18 02:00:35,529 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-18 02:00:35,529 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 02:00:35,530 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 02:00:35,530 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 02:00:35,530 - INFO - ---------------------------------------------------------------------- 2025-11-18 02:00:35,530 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-18 02:00:35,530 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 02:00:35,530 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 02:00:35,531 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 02:00:35,531 - INFO - ---------------------------------------------------------------------- 2025-11-18 02:00:35,531 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-18 02:00:35,531 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 02:00:35,531 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 02:00:35,531 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 02:00:35,531 - INFO - ---------------------------------------------------------------------- 2025-11-18 02:00:35,533 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_6000.jsonl 2025-11-18 02:01:17,511 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-18 02:01:17,528 - INFO - New best validation loss: 0.0006, perplexity: 1.00 2025-11-18 02:03:42,245 - INFO - Epoch 1 Step 6010 (Global: 6010): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=4.53e-05, throughput=3317 tok/s 2025-11-18 02:06:17,150 - INFO - Epoch 1 Step 6020 (Global: 6020): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=4.51e-05, throughput=3099 tok/s 2025-11-18 02:08:42,948 - INFO - Epoch 1 Step 6030 (Global: 6030): loss=0.0007, ppl=1.00, grad_norm=0.09, lr=4.50e-05, throughput=3292 tok/s 2025-11-18 02:11:08,427 - INFO - Epoch 1 Step 6040 (Global: 6040): loss=0.0003, ppl=1.00, grad_norm=0.03, lr=4.48e-05, throughput=3300 tok/s 2025-11-18 02:13:43,724 - INFO - Epoch 1 Step 6050 (Global: 6050): loss=0.0010, ppl=1.00, grad_norm=0.09, lr=4.46e-05, throughput=3091 tok/s 2025-11-18 02:16:08,807 - INFO - Epoch 1 Step 6060 (Global: 6060): loss=0.0008, ppl=1.00, grad_norm=0.10, lr=4.45e-05, throughput=3309 tok/s 2025-11-18 02:18:42,565 - INFO - Epoch 1 Step 6070 (Global: 6070): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=4.43e-05, throughput=3122 tok/s 2025-11-18 02:21:07,557 - INFO - Epoch 1 Step 6080 (Global: 6080): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=4.41e-05, throughput=3311 tok/s 2025-11-18 02:23:32,254 - INFO - Epoch 1 Step 6090 (Global: 6090): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=4.40e-05, throughput=3317 tok/s 2025-11-18 02:25:57,042 - INFO - Epoch 1 Step 6100 (Global: 6100): loss=0.0009, ppl=1.00, grad_norm=0.09, lr=4.38e-05, throughput=3315 tok/s 2025-11-18 02:28:30,818 - INFO - Epoch 1 Step 6110 (Global: 6110): loss=0.0013, ppl=1.00, grad_norm=0.09, lr=4.36e-05, throughput=3122 tok/s 2025-11-18 02:30:55,732 - INFO - Epoch 1 Step 6120 (Global: 6120): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=4.35e-05, throughput=3312 tok/s 2025-11-18 02:33:20,121 - INFO - Epoch 1 Step 6130 (Global: 6130): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=4.33e-05, throughput=3324 tok/s 2025-11-18 02:35:54,561 - INFO - Epoch 1 Step 6140 (Global: 6140): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=4.31e-05, throughput=3108 tok/s 2025-11-18 02:38:19,806 - INFO - Epoch 1 Step 6150 (Global: 6150): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=4.30e-05, throughput=3305 tok/s 2025-11-18 02:40:43,715 - INFO - Epoch 1 Step 6160 (Global: 6160): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=4.28e-05, throughput=3336 tok/s 2025-11-18 02:43:16,262 - INFO - Epoch 1 Step 6170 (Global: 6170): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=4.26e-05, throughput=3147 tok/s 2025-11-18 02:45:40,173 - INFO - Epoch 1 Step 6180 (Global: 6180): loss=0.0010, ppl=1.00, grad_norm=0.10, lr=4.25e-05, throughput=3335 tok/s 2025-11-18 02:48:13,569 - INFO - Epoch 1 Step 6190 (Global: 6190): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=4.23e-05, throughput=3129 tok/s 2025-11-18 02:50:37,338 - INFO - Epoch 1 Step 6200 (Global: 6200): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=4.21e-05, throughput=3339 tok/s 2025-11-18 02:53:01,155 - INFO - Epoch 1 Step 6210 (Global: 6210): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=4.20e-05, throughput=3338 tok/s 2025-11-18 02:55:24,480 - INFO - Epoch 1 Step 6220 (Global: 6220): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=4.18e-05, throughput=3349 tok/s 2025-11-18 02:57:56,635 - INFO - Epoch 1 Step 6230 (Global: 6230): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=4.16e-05, throughput=3155 tok/s 2025-11-18 03:00:20,440 - INFO - Epoch 1 Step 6240 (Global: 6240): loss=0.0016, ppl=1.00, grad_norm=0.32, lr=4.15e-05, throughput=3338 tok/s 2025-11-18 03:02:44,734 - INFO - Epoch 1 Step 6250 (Global: 6250): loss=0.0003, ppl=1.00, grad_norm=0.06, lr=4.13e-05, throughput=3327 tok/s 2025-11-18 03:05:08,836 - INFO - Epoch 1 Step 6260 (Global: 6260): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=4.12e-05, throughput=3331 tok/s 2025-11-18 03:07:42,982 - INFO - Epoch 1 Step 6270 (Global: 6270): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=4.10e-05, throughput=3114 tok/s 2025-11-18 03:10:07,490 - INFO - Epoch 1 Step 6280 (Global: 6280): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=4.08e-05, throughput=3322 tok/s 2025-11-18 03:12:42,207 - INFO - Epoch 1 Step 6290 (Global: 6290): loss=0.0013, ppl=1.00, grad_norm=0.12, lr=4.07e-05, throughput=3103 tok/s 2025-11-18 03:15:07,240 - INFO - Epoch 1 Step 6300 (Global: 6300): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=4.05e-05, throughput=3310 tok/s 2025-11-18 03:17:32,204 - INFO - Epoch 1 Step 6310 (Global: 6310): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=4.03e-05, throughput=3311 tok/s 2025-11-18 03:20:05,757 - INFO - Epoch 1 Step 6320 (Global: 6320): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=4.02e-05, throughput=3126 tok/s 2025-11-18 03:22:30,016 - INFO - Epoch 1 Step 6330 (Global: 6330): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=4.00e-05, throughput=3327 tok/s 2025-11-18 03:24:54,723 - INFO - Epoch 1 Step 6340 (Global: 6340): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=3.98e-05, throughput=3317 tok/s 2025-11-18 03:27:18,640 - INFO - Epoch 1 Step 6350 (Global: 6350): loss=0.0002, ppl=1.00, grad_norm=0.02, lr=3.97e-05, throughput=3335 tok/s 2025-11-18 03:29:51,526 - INFO - Epoch 1 Step 6360 (Global: 6360): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=3.95e-05, throughput=3140 tok/s 2025-11-18 03:32:16,972 - INFO - Epoch 1 Step 6370 (Global: 6370): loss=0.0004, ppl=1.00, grad_norm=0.08, lr=3.93e-05, throughput=3300 tok/s 2025-11-18 03:34:41,498 - INFO - Epoch 1 Step 6380 (Global: 6380): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=3.92e-05, throughput=3321 tok/s 2025-11-18 03:37:15,408 - INFO - Epoch 1 Step 6390 (Global: 6390): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=3.90e-05, throughput=3119 tok/s 2025-11-18 03:39:39,793 - INFO - Epoch 1 Step 6400 (Global: 6400): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=3.89e-05, throughput=3325 tok/s 2025-11-18 03:42:04,363 - INFO - Epoch 1 Step 6410 (Global: 6410): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=3.87e-05, throughput=3320 tok/s 2025-11-18 03:44:38,322 - INFO - Epoch 1 Step 6420 (Global: 6420): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=3.85e-05, throughput=3118 tok/s 2025-11-18 03:47:02,861 - INFO - Epoch 1 Step 6430 (Global: 6430): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=3.84e-05, throughput=3321 tok/s 2025-11-18 03:49:36,658 - INFO - Epoch 1 Step 6440 (Global: 6440): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=3.82e-05, throughput=3121 tok/s 2025-11-18 03:52:00,992 - INFO - Epoch 1 Step 6450 (Global: 6450): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=3.80e-05, throughput=3326 tok/s 2025-11-18 03:54:24,747 - INFO - Epoch 1 Step 6460 (Global: 6460): loss=0.0003, ppl=1.00, grad_norm=0.06, lr=3.79e-05, throughput=3339 tok/s 2025-11-18 03:56:49,352 - INFO - Epoch 1 Step 6470 (Global: 6470): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=3.77e-05, throughput=3319 tok/s 2025-11-18 03:59:23,972 - INFO - Epoch 1 Step 6480 (Global: 6480): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=3.76e-05, throughput=3104 tok/s 2025-11-18 04:01:48,803 - INFO - Epoch 1 Step 6490 (Global: 6490): loss=0.0016, ppl=1.00, grad_norm=0.11, lr=3.74e-05, throughput=3314 tok/s 2025-11-18 04:04:13,499 - INFO - Epoch 1 Step 6500 (Global: 6500): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=3.72e-05, throughput=3317 tok/s 2025-11-18 04:04:13,502 - INFO - Running validation at step 6500... 2025-11-18 04:12:09,523 - INFO - Validation loss: 0.0006, perplexity: 1.00 2025-11-18 04:12:09,523 - INFO - Qualitative metrics (n=5): 2025-11-18 04:12:09,524 - INFO - BLEU: 1.0000 2025-11-18 04:12:09,524 - INFO - METEOR: 1.0000 2025-11-18 04:12:09,524 - INFO - Edit Distance: 0.0000 2025-11-18 04:12:09,524 - INFO - F-measure: 1.0000 2025-11-18 04:12:09,524 - INFO - ====================================================================== 2025-11-18 04:12:09,524 - INFO - Qualitative Evaluation Samples: 2025-11-18 04:12:09,524 - INFO - ====================================================================== 2025-11-18 04:12:09,524 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-18 04:12:09,524 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 04:12:09,525 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 04:12:09,525 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 04:12:09,525 - INFO - ---------------------------------------------------------------------- 2025-11-18 04:12:09,525 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-18 04:12:09,525 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 04:12:09,525 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 04:12:09,525 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 04:12:09,525 - INFO - ---------------------------------------------------------------------- 2025-11-18 04:12:09,526 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-18 04:12:09,526 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 04:12:09,526 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 04:12:09,526 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 04:12:09,526 - INFO - ---------------------------------------------------------------------- 2025-11-18 04:12:09,526 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-18 04:12:09,526 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 04:12:09,526 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 04:12:09,527 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 04:12:09,527 - INFO - ---------------------------------------------------------------------- 2025-11-18 04:12:09,527 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-18 04:12:09,527 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 04:12:09,527 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 04:12:09,527 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 04:12:09,527 - INFO - ---------------------------------------------------------------------- 2025-11-18 04:12:09,528 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_6500.jsonl 2025-11-18 04:13:00,266 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-18 04:13:00,280 - INFO - New best validation loss: 0.0006, perplexity: 1.00 2025-11-18 04:15:36,793 - INFO - Epoch 1 Step 6510 (Global: 6510): loss=0.0005, ppl=1.00, grad_norm=0.08, lr=3.71e-05, throughput=3067 tok/s 2025-11-18 04:18:03,372 - INFO - Epoch 1 Step 6520 (Global: 6520): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=3.69e-05, throughput=3275 tok/s 2025-11-18 04:20:38,030 - INFO - Epoch 1 Step 6530 (Global: 6530): loss=0.0004, ppl=1.00, grad_norm=0.08, lr=3.67e-05, throughput=3104 tok/s 2025-11-18 04:23:03,213 - INFO - Epoch 1 Step 6540 (Global: 6540): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=3.66e-05, throughput=3306 tok/s 2025-11-18 04:25:28,298 - INFO - Epoch 1 Step 6550 (Global: 6550): loss=0.0004, ppl=1.00, grad_norm=0.08, lr=3.64e-05, throughput=3308 tok/s 2025-11-18 04:27:54,249 - INFO - Epoch 1 Step 6560 (Global: 6560): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=3.63e-05, throughput=3289 tok/s 2025-11-18 04:30:19,819 - INFO - Epoch 1 Step 6570 (Global: 6570): loss=0.0010, ppl=1.00, grad_norm=0.11, lr=3.61e-05, throughput=3297 tok/s 2025-11-18 04:32:54,307 - INFO - Epoch 1 Step 6580 (Global: 6580): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=3.59e-05, throughput=3107 tok/s 2025-11-18 04:35:19,847 - INFO - Epoch 1 Step 6590 (Global: 6590): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=3.58e-05, throughput=3298 tok/s 2025-11-18 04:37:44,836 - INFO - Epoch 1 Step 6600 (Global: 6600): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=3.56e-05, throughput=3311 tok/s 2025-11-18 04:40:19,010 - INFO - Epoch 1 Step 6610 (Global: 6610): loss=0.0008, ppl=1.00, grad_norm=0.10, lr=3.55e-05, throughput=3113 tok/s 2025-11-18 04:42:43,794 - INFO - Epoch 1 Step 6620 (Global: 6620): loss=0.0008, ppl=1.00, grad_norm=0.15, lr=3.53e-05, throughput=3315 tok/s 2025-11-18 04:45:18,522 - INFO - Epoch 1 Step 6630 (Global: 6630): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=3.51e-05, throughput=3102 tok/s 2025-11-18 04:47:42,638 - INFO - Epoch 1 Step 6640 (Global: 6640): loss=0.0006, ppl=1.00, grad_norm=0.05, lr=3.50e-05, throughput=3331 tok/s 2025-11-18 04:50:08,224 - INFO - Epoch 1 Step 6650 (Global: 6650): loss=0.0002, ppl=1.00, grad_norm=0.04, lr=3.48e-05, throughput=3297 tok/s 2025-11-18 04:52:42,393 - INFO - Epoch 1 Step 6660 (Global: 6660): loss=0.0005, ppl=1.00, grad_norm=0.16, lr=3.47e-05, throughput=3114 tok/s 2025-11-18 04:55:06,575 - INFO - Epoch 1 Step 6670 (Global: 6670): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=3.45e-05, throughput=3329 tok/s 2025-11-18 04:57:29,433 - INFO - Epoch 1 Step 6680 (Global: 6680): loss=0.0007, ppl=1.00, grad_norm=0.09, lr=3.43e-05, throughput=3360 tok/s 2025-11-18 04:59:59,560 - INFO - Epoch 1 Step 6690 (Global: 6690): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=3.42e-05, throughput=3197 tok/s 2025-11-18 05:02:33,370 - INFO - Epoch 1 Step 6700 (Global: 6700): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=3.40e-05, throughput=3121 tok/s 2025-11-18 05:04:57,625 - INFO - Epoch 1 Step 6710 (Global: 6710): loss=0.0010, ppl=1.00, grad_norm=0.10, lr=3.39e-05, throughput=3327 tok/s 2025-11-18 05:07:21,879 - INFO - Epoch 1 Step 6720 (Global: 6720): loss=0.0005, ppl=1.00, grad_norm=0.09, lr=3.37e-05, throughput=3328 tok/s 2025-11-18 05:09:55,946 - INFO - Epoch 1 Step 6730 (Global: 6730): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=3.35e-05, throughput=3116 tok/s 2025-11-18 05:12:20,334 - INFO - Epoch 1 Step 6740 (Global: 6740): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=3.34e-05, throughput=3324 tok/s 2025-11-18 05:14:44,360 - INFO - Epoch 1 Step 6750 (Global: 6750): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=3.32e-05, throughput=3333 tok/s 2025-11-18 05:17:18,798 - INFO - Epoch 1 Step 6760 (Global: 6760): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=3.31e-05, throughput=3108 tok/s 2025-11-18 05:19:42,797 - INFO - Epoch 1 Step 6770 (Global: 6770): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=3.29e-05, throughput=3333 tok/s 2025-11-18 05:22:16,875 - INFO - Epoch 1 Step 6780 (Global: 6780): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=3.28e-05, throughput=3115 tok/s 2025-11-18 05:24:41,640 - INFO - Epoch 1 Step 6790 (Global: 6790): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=3.26e-05, throughput=3316 tok/s 2025-11-18 05:27:06,345 - INFO - Epoch 1 Step 6800 (Global: 6800): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=3.24e-05, throughput=3317 tok/s 2025-11-18 05:29:30,175 - INFO - Epoch 1 Step 6810 (Global: 6810): loss=0.0012, ppl=1.00, grad_norm=0.07, lr=3.23e-05, throughput=3337 tok/s 2025-11-18 05:32:04,369 - INFO - Epoch 1 Step 6820 (Global: 6820): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=3.21e-05, throughput=3113 tok/s 2025-11-18 05:34:28,470 - INFO - Epoch 1 Step 6830 (Global: 6830): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=3.20e-05, throughput=3331 tok/s 2025-11-18 05:36:53,118 - INFO - Epoch 1 Step 6840 (Global: 6840): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=3.18e-05, throughput=3318 tok/s 2025-11-18 05:39:26,902 - INFO - Epoch 1 Step 6850 (Global: 6850): loss=0.0008, ppl=1.00, grad_norm=0.09, lr=3.17e-05, throughput=3121 tok/s 2025-11-18 05:41:50,833 - INFO - Epoch 1 Step 6860 (Global: 6860): loss=0.0007, ppl=1.00, grad_norm=0.10, lr=3.15e-05, throughput=3335 tok/s 2025-11-18 05:44:15,071 - INFO - Epoch 1 Step 6870 (Global: 6870): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=3.13e-05, throughput=3328 tok/s 2025-11-18 05:46:49,310 - INFO - Epoch 1 Step 6880 (Global: 6880): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=3.12e-05, throughput=3112 tok/s 2025-11-18 05:49:14,285 - INFO - Epoch 1 Step 6890 (Global: 6890): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=3.10e-05, throughput=3311 tok/s 2025-11-18 05:51:48,893 - INFO - Epoch 1 Step 6900 (Global: 6900): loss=0.0013, ppl=1.00, grad_norm=0.19, lr=3.09e-05, throughput=3105 tok/s 2025-11-18 05:54:13,200 - INFO - Epoch 1 Step 6910 (Global: 6910): loss=0.0035, ppl=1.00, grad_norm=0.43, lr=3.07e-05, throughput=3326 tok/s 2025-11-18 05:56:37,697 - INFO - Epoch 1 Step 6920 (Global: 6920): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=3.06e-05, throughput=3322 tok/s 2025-11-18 05:59:02,145 - INFO - Epoch 1 Step 6930 (Global: 6930): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=3.04e-05, throughput=3323 tok/s 2025-11-18 06:01:27,536 - INFO - Epoch 1 Step 6940 (Global: 6940): loss=0.0010, ppl=1.00, grad_norm=0.09, lr=3.03e-05, throughput=3302 tok/s 2025-11-18 06:04:02,095 - INFO - Epoch 1 Step 6950 (Global: 6950): loss=0.0002, ppl=1.00, grad_norm=0.04, lr=3.01e-05, throughput=3106 tok/s 2025-11-18 06:06:26,979 - INFO - Epoch 1 Step 6960 (Global: 6960): loss=0.0013, ppl=1.00, grad_norm=0.13, lr=3.00e-05, throughput=3313 tok/s 2025-11-18 06:08:51,381 - INFO - Epoch 1 Step 6970 (Global: 6970): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=2.98e-05, throughput=3324 tok/s 2025-11-18 06:11:24,530 - INFO - Epoch 1 Step 6980 (Global: 6980): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=2.96e-05, throughput=3134 tok/s 2025-11-18 06:13:48,654 - INFO - Epoch 1 Step 6990 (Global: 6990): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=2.95e-05, throughput=3331 tok/s 2025-11-18 06:16:12,199 - INFO - Epoch 1 Step 7000 (Global: 7000): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=2.93e-05, throughput=3344 tok/s 2025-11-18 06:16:12,203 - INFO - Running validation at step 7000... 2025-11-18 06:24:15,073 - INFO - Validation loss: 0.0006, perplexity: 1.00 2025-11-18 06:24:15,074 - INFO - Qualitative metrics (n=5): 2025-11-18 06:24:15,074 - INFO - BLEU: 1.0000 2025-11-18 06:24:15,074 - INFO - METEOR: 1.0000 2025-11-18 06:24:15,074 - INFO - Edit Distance: 0.0000 2025-11-18 06:24:15,074 - INFO - F-measure: 1.0000 2025-11-18 06:24:15,074 - INFO - ====================================================================== 2025-11-18 06:24:15,074 - INFO - Qualitative Evaluation Samples: 2025-11-18 06:24:15,075 - INFO - ====================================================================== 2025-11-18 06:24:15,075 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-18 06:24:15,075 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 06:24:15,075 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 06:24:15,075 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 06:24:15,075 - INFO - ---------------------------------------------------------------------- 2025-11-18 06:24:15,075 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-18 06:24:15,075 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 06:24:15,075 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 06:24:15,076 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 06:24:15,076 - INFO - ---------------------------------------------------------------------- 2025-11-18 06:24:15,076 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-18 06:24:15,076 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 06:24:15,076 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 06:24:15,076 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 06:24:15,076 - INFO - ---------------------------------------------------------------------- 2025-11-18 06:24:15,076 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-18 06:24:15,076 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 06:24:15,076 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 06:24:15,077 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 06:24:15,077 - INFO - ---------------------------------------------------------------------- 2025-11-18 06:24:15,077 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-18 06:24:15,077 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 06:24:15,077 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 06:24:15,077 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 06:24:15,077 - INFO - ---------------------------------------------------------------------- 2025-11-18 06:24:15,078 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_7000.jsonl 2025-11-18 06:25:01,107 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-18 06:25:01,123 - INFO - New best validation loss: 0.0006, perplexity: 1.00 2025-11-18 06:27:25,449 - INFO - Epoch 1 Step 7010 (Global: 7010): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=2.92e-05, throughput=3326 tok/s 2025-11-18 06:29:50,098 - INFO - Epoch 1 Step 7020 (Global: 7020): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=2.90e-05, throughput=3318 tok/s 2025-11-18 06:32:14,588 - INFO - Epoch 1 Step 7030 (Global: 7030): loss=0.0006, ppl=1.00, grad_norm=0.10, lr=2.89e-05, throughput=3322 tok/s 2025-11-18 06:34:47,930 - INFO - Epoch 1 Step 7040 (Global: 7040): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=2.87e-05, throughput=3130 tok/s 2025-11-18 06:37:11,585 - INFO - Epoch 1 Step 7050 (Global: 7050): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=2.86e-05, throughput=3341 tok/s 2025-11-18 06:39:45,135 - INFO - Epoch 1 Step 7060 (Global: 7060): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=2.84e-05, throughput=3126 tok/s 2025-11-18 06:42:09,648 - INFO - Epoch 1 Step 7070 (Global: 7070): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=2.83e-05, throughput=3322 tok/s 2025-11-18 06:44:42,561 - INFO - Epoch 1 Step 7080 (Global: 7080): loss=0.0012, ppl=1.00, grad_norm=0.19, lr=2.81e-05, throughput=3139 tok/s 2025-11-18 06:47:05,933 - INFO - Epoch 1 Step 7090 (Global: 7090): loss=0.0004, ppl=1.00, grad_norm=0.07, lr=2.80e-05, throughput=3348 tok/s 2025-11-18 06:49:30,652 - INFO - Epoch 1 Step 7100 (Global: 7100): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=2.78e-05, throughput=3317 tok/s 2025-11-18 06:51:54,843 - INFO - Epoch 1 Step 7110 (Global: 7110): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=2.77e-05, throughput=3329 tok/s 2025-11-18 06:54:19,521 - INFO - Epoch 1 Step 7120 (Global: 7120): loss=0.0004, ppl=1.00, grad_norm=0.07, lr=2.75e-05, throughput=3318 tok/s 2025-11-18 06:56:53,819 - INFO - Epoch 1 Step 7130 (Global: 7130): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=2.74e-05, throughput=3111 tok/s 2025-11-18 06:59:17,541 - INFO - Epoch 1 Step 7140 (Global: 7140): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=2.72e-05, throughput=3340 tok/s 2025-11-18 07:01:41,909 - INFO - Epoch 1 Step 7150 (Global: 7150): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=2.71e-05, throughput=3325 tok/s 2025-11-18 07:04:15,294 - INFO - Epoch 1 Step 7160 (Global: 7160): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=2.69e-05, throughput=3129 tok/s 2025-11-18 07:06:41,032 - INFO - Epoch 1 Step 7170 (Global: 7170): loss=0.0008, ppl=1.00, grad_norm=0.12, lr=2.68e-05, throughput=3294 tok/s 2025-11-18 07:09:05,795 - INFO - Epoch 1 Step 7180 (Global: 7180): loss=0.0008, ppl=1.00, grad_norm=0.15, lr=2.66e-05, throughput=3316 tok/s 2025-11-18 07:11:40,217 - INFO - Epoch 1 Step 7190 (Global: 7190): loss=0.0006, ppl=1.00, grad_norm=0.10, lr=2.65e-05, throughput=3108 tok/s 2025-11-18 07:14:05,373 - INFO - Epoch 1 Step 7200 (Global: 7200): loss=0.0008, ppl=1.00, grad_norm=0.10, lr=2.63e-05, throughput=3307 tok/s 2025-11-18 07:16:38,983 - INFO - Epoch 1 Step 7210 (Global: 7210): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=2.62e-05, throughput=3125 tok/s 2025-11-18 07:19:03,482 - INFO - Epoch 1 Step 7220 (Global: 7220): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=2.60e-05, throughput=3322 tok/s 2025-11-18 07:21:28,117 - INFO - Epoch 1 Step 7230 (Global: 7230): loss=0.0005, ppl=1.00, grad_norm=0.08, lr=2.59e-05, throughput=3319 tok/s 2025-11-18 07:23:51,978 - INFO - Epoch 1 Step 7240 (Global: 7240): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=2.58e-05, throughput=3337 tok/s 2025-11-18 07:26:26,568 - INFO - Epoch 1 Step 7250 (Global: 7250): loss=0.0009, ppl=1.00, grad_norm=0.07, lr=2.56e-05, throughput=3105 tok/s 2025-11-18 07:28:51,197 - INFO - Epoch 1 Step 7260 (Global: 7260): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=2.55e-05, throughput=3319 tok/s 2025-11-18 07:31:15,169 - INFO - Epoch 1 Step 7270 (Global: 7270): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=2.53e-05, throughput=3334 tok/s 2025-11-18 07:33:48,976 - INFO - Epoch 1 Step 7280 (Global: 7280): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=2.52e-05, throughput=3121 tok/s 2025-11-18 07:36:12,391 - INFO - Epoch 1 Step 7290 (Global: 7290): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=2.50e-05, throughput=3347 tok/s 2025-11-18 07:38:38,810 - INFO - Epoch 1 Step 7300 (Global: 7300): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=2.49e-05, throughput=3278 tok/s 2025-11-18 07:41:19,408 - INFO - Epoch 1 Step 7310 (Global: 7310): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=2.47e-05, throughput=2989 tok/s 2025-11-18 07:43:52,690 - INFO - Epoch 1 Step 7320 (Global: 7320): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=2.46e-05, throughput=3132 tok/s 2025-11-18 07:46:37,616 - INFO - Epoch 1 Step 7330 (Global: 7330): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=2.44e-05, throughput=2910 tok/s 2025-11-18 07:49:05,818 - INFO - Epoch 1 Step 7340 (Global: 7340): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=2.43e-05, throughput=3239 tok/s 2025-11-18 07:51:38,220 - INFO - Epoch 1 Step 7350 (Global: 7350): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=2.42e-05, throughput=3150 tok/s 2025-11-18 07:54:07,413 - INFO - Epoch 1 Step 7360 (Global: 7360): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=2.40e-05, throughput=3217 tok/s 2025-11-18 07:56:41,856 - INFO - Epoch 1 Step 7370 (Global: 7370): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=2.39e-05, throughput=3108 tok/s 2025-11-18 07:59:06,082 - INFO - Epoch 1 Step 7380 (Global: 7380): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=2.37e-05, throughput=3328 tok/s 2025-11-18 08:01:30,046 - INFO - Epoch 1 Step 7390 (Global: 7390): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=2.36e-05, throughput=3334 tok/s 2025-11-18 08:03:53,680 - INFO - Epoch 1 Step 7400 (Global: 7400): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=2.34e-05, throughput=3342 tok/s 2025-11-18 08:06:28,530 - INFO - Epoch 1 Step 7410 (Global: 7410): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=2.33e-05, throughput=3100 tok/s 2025-11-18 08:08:53,273 - INFO - Epoch 1 Step 7420 (Global: 7420): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=2.32e-05, throughput=3316 tok/s 2025-11-18 08:11:27,393 - INFO - Epoch 1 Step 7430 (Global: 7430): loss=0.0006, ppl=1.00, grad_norm=0.10, lr=2.30e-05, throughput=3115 tok/s 2025-11-18 08:13:51,826 - INFO - Epoch 1 Step 7440 (Global: 7440): loss=0.0002, ppl=1.00, grad_norm=0.05, lr=2.29e-05, throughput=3323 tok/s 2025-11-18 08:16:16,360 - INFO - Epoch 1 Step 7450 (Global: 7450): loss=0.0011, ppl=1.00, grad_norm=0.08, lr=2.27e-05, throughput=3321 tok/s 2025-11-18 08:18:50,597 - INFO - Epoch 1 Step 7460 (Global: 7460): loss=0.0004, ppl=1.00, grad_norm=0.08, lr=2.26e-05, throughput=3112 tok/s 2025-11-18 08:21:16,449 - INFO - Epoch 1 Step 7470 (Global: 7470): loss=0.0004, ppl=1.00, grad_norm=0.08, lr=2.25e-05, throughput=3291 tok/s 2025-11-18 08:23:41,774 - INFO - Epoch 1 Step 7480 (Global: 7480): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=2.23e-05, throughput=3303 tok/s 2025-11-18 08:26:05,686 - INFO - Epoch 1 Step 7490 (Global: 7490): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=2.22e-05, throughput=3335 tok/s 2025-11-18 08:28:48,530 - INFO - Epoch 1 Step 7500 (Global: 7500): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=2.20e-05, throughput=2948 tok/s 2025-11-18 08:28:48,533 - INFO - Running validation at step 7500... 2025-11-18 08:36:40,875 - INFO - Validation loss: 0.0006, perplexity: 1.00 2025-11-18 08:36:40,876 - INFO - Qualitative metrics (n=5): 2025-11-18 08:36:40,876 - INFO - BLEU: 1.0000 2025-11-18 08:36:40,876 - INFO - METEOR: 1.0000 2025-11-18 08:36:40,877 - INFO - Edit Distance: 0.0000 2025-11-18 08:36:40,877 - INFO - F-measure: 1.0000 2025-11-18 08:36:40,877 - INFO - ====================================================================== 2025-11-18 08:36:40,877 - INFO - Qualitative Evaluation Samples: 2025-11-18 08:36:40,877 - INFO - ====================================================================== 2025-11-18 08:36:40,877 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-18 08:36:40,877 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 08:36:40,877 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 08:36:40,877 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 08:36:40,877 - INFO - ---------------------------------------------------------------------- 2025-11-18 08:36:40,877 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-18 08:36:40,878 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 08:36:40,878 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 08:36:40,878 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 08:36:40,878 - INFO - ---------------------------------------------------------------------- 2025-11-18 08:36:40,878 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-18 08:36:40,878 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 08:36:40,878 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 08:36:40,878 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 08:36:40,878 - INFO - ---------------------------------------------------------------------- 2025-11-18 08:36:40,878 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-18 08:36:40,878 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 08:36:40,879 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 08:36:40,879 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 08:36:40,879 - INFO - ---------------------------------------------------------------------- 2025-11-18 08:36:40,879 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-18 08:36:40,879 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 08:36:40,879 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 08:36:40,879 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 08:36:40,879 - INFO - ---------------------------------------------------------------------- 2025-11-18 08:36:40,881 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_7500.jsonl 2025-11-18 08:37:22,490 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-18 08:37:22,506 - INFO - New best validation loss: 0.0006, perplexity: 1.00 2025-11-18 08:39:46,620 - INFO - Epoch 1 Step 7510 (Global: 7510): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=2.19e-05, throughput=3331 tok/s 2025-11-18 08:42:19,437 - INFO - Epoch 1 Step 7520 (Global: 7520): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=2.18e-05, throughput=3141 tok/s 2025-11-18 08:44:43,424 - INFO - Epoch 1 Step 7530 (Global: 7530): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=2.16e-05, throughput=3334 tok/s 2025-11-18 08:47:07,895 - INFO - Epoch 1 Step 7540 (Global: 7540): loss=0.0009, ppl=1.00, grad_norm=0.07, lr=2.15e-05, throughput=3323 tok/s 2025-11-18 08:49:31,345 - INFO - Epoch 1 Step 7550 (Global: 7550): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=2.14e-05, throughput=3346 tok/s 2025-11-18 08:52:03,826 - INFO - Epoch 1 Step 7560 (Global: 7560): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=2.12e-05, throughput=3148 tok/s 2025-11-18 08:54:27,608 - INFO - Epoch 1 Step 7570 (Global: 7570): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=2.11e-05, throughput=3338 tok/s 2025-11-18 08:56:52,086 - INFO - Epoch 1 Step 7580 (Global: 7580): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=2.09e-05, throughput=3322 tok/s 2025-11-18 08:59:17,821 - INFO - Epoch 1 Step 7590 (Global: 7590): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=2.08e-05, throughput=3294 tok/s 2025-11-18 09:01:53,052 - INFO - Epoch 1 Step 7600 (Global: 7600): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=2.07e-05, throughput=3092 tok/s 2025-11-18 09:04:16,665 - INFO - Epoch 1 Step 7610 (Global: 7610): loss=0.0008, ppl=1.00, grad_norm=0.10, lr=2.05e-05, throughput=3342 tok/s 2025-11-18 09:06:50,805 - INFO - Epoch 1 Step 7620 (Global: 7620): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=2.04e-05, throughput=3114 tok/s 2025-11-18 09:09:15,269 - INFO - Epoch 1 Step 7630 (Global: 7630): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=2.03e-05, throughput=3323 tok/s 2025-11-18 09:11:48,471 - INFO - Epoch 1 Step 7640 (Global: 7640): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=2.01e-05, throughput=3133 tok/s 2025-11-18 09:14:12,530 - INFO - Epoch 1 Step 7650 (Global: 7650): loss=0.0002, ppl=1.00, grad_norm=0.06, lr=2.00e-05, throughput=3332 tok/s 2025-11-18 09:16:36,430 - INFO - Epoch 1 Step 7660 (Global: 7660): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=1.99e-05, throughput=3336 tok/s 2025-11-18 09:19:00,010 - INFO - Epoch 1 Step 7670 (Global: 7670): loss=0.0005, ppl=1.00, grad_norm=0.08, lr=1.97e-05, throughput=3343 tok/s 2025-11-18 09:21:23,912 - INFO - Epoch 1 Step 7680 (Global: 7680): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=1.96e-05, throughput=3336 tok/s 2025-11-18 09:23:57,745 - INFO - Epoch 1 Step 7690 (Global: 7690): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=1.95e-05, throughput=3120 tok/s 2025-11-18 09:26:20,985 - INFO - Epoch 1 Step 7700 (Global: 7700): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=1.93e-05, throughput=3351 tok/s 2025-11-18 09:28:44,930 - INFO - Epoch 1 Step 7710 (Global: 7710): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=1.92e-05, throughput=3335 tok/s 2025-11-18 09:31:18,964 - INFO - Epoch 1 Step 7720 (Global: 7720): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=1.91e-05, throughput=3116 tok/s 2025-11-18 09:33:43,625 - INFO - Epoch 1 Step 7730 (Global: 7730): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=1.89e-05, throughput=3318 tok/s 2025-11-18 09:36:08,220 - INFO - Epoch 1 Step 7740 (Global: 7740): loss=0.0004, ppl=1.00, grad_norm=0.07, lr=1.88e-05, throughput=3320 tok/s 2025-11-18 09:38:43,057 - INFO - Epoch 1 Step 7750 (Global: 7750): loss=0.0009, ppl=1.00, grad_norm=0.07, lr=1.87e-05, throughput=3100 tok/s 2025-11-18 09:41:08,235 - INFO - Epoch 1 Step 7760 (Global: 7760): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=1.85e-05, throughput=3306 tok/s 2025-11-18 09:43:42,729 - INFO - Epoch 1 Step 7770 (Global: 7770): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=1.84e-05, throughput=3107 tok/s 2025-11-18 09:46:08,059 - INFO - Epoch 1 Step 7780 (Global: 7780): loss=0.0007, ppl=1.00, grad_norm=0.12, lr=1.83e-05, throughput=3303 tok/s 2025-11-18 09:48:33,879 - INFO - Epoch 1 Step 7790 (Global: 7790): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=1.82e-05, throughput=3292 tok/s 2025-11-18 09:51:00,134 - INFO - Epoch 1 Step 7800 (Global: 7800): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=1.80e-05, throughput=3282 tok/s 2025-11-18 09:53:35,210 - INFO - Epoch 1 Step 7810 (Global: 7810): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=1.79e-05, throughput=3095 tok/s 2025-11-18 09:56:01,019 - INFO - Epoch 1 Step 7820 (Global: 7820): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=1.78e-05, throughput=3292 tok/s 2025-11-18 09:58:28,363 - INFO - Epoch 1 Step 7830 (Global: 7830): loss=0.0006, ppl=1.00, grad_norm=0.11, lr=1.76e-05, throughput=3258 tok/s 2025-11-18 10:01:03,134 - INFO - Epoch 1 Step 7840 (Global: 7840): loss=0.0003, ppl=1.00, grad_norm=0.03, lr=1.75e-05, throughput=3101 tok/s 2025-11-18 10:03:27,645 - INFO - Epoch 1 Step 7850 (Global: 7850): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=1.74e-05, throughput=3322 tok/s 2025-11-18 10:05:53,816 - INFO - Epoch 1 Step 7860 (Global: 7860): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=1.73e-05, throughput=3284 tok/s 2025-11-18 10:08:29,299 - INFO - Epoch 1 Step 7870 (Global: 7870): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=1.71e-05, throughput=3087 tok/s 2025-11-18 10:10:55,345 - INFO - Epoch 1 Step 7880 (Global: 7880): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=1.70e-05, throughput=3287 tok/s 2025-11-18 10:13:30,610 - INFO - Epoch 1 Step 7890 (Global: 7890): loss=0.0009, ppl=1.00, grad_norm=0.09, lr=1.69e-05, throughput=3092 tok/s 2025-11-18 10:15:54,584 - INFO - Epoch 1 Step 7900 (Global: 7900): loss=0.0005, ppl=1.00, grad_norm=0.08, lr=1.68e-05, throughput=3334 tok/s 2025-11-18 10:18:19,010 - INFO - Epoch 1 Step 7910 (Global: 7910): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=1.66e-05, throughput=3324 tok/s 2025-11-18 10:20:43,946 - INFO - Epoch 1 Step 7920 (Global: 7920): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=1.65e-05, throughput=3312 tok/s 2025-11-18 10:23:19,626 - INFO - Epoch 1 Step 7930 (Global: 7930): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=1.64e-05, throughput=3083 tok/s 2025-11-18 10:25:45,569 - INFO - Epoch 1 Step 7940 (Global: 7940): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=1.63e-05, throughput=3289 tok/s 2025-11-18 10:28:12,826 - INFO - Epoch 1 Step 7950 (Global: 7950): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=1.61e-05, throughput=3260 tok/s 2025-11-18 10:30:40,497 - INFO - Epoch 1 Step 7960 (Global: 7960): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=1.60e-05, throughput=3251 tok/s 2025-11-18 10:33:15,005 - INFO - Epoch 1 Step 7970 (Global: 7970): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=1.59e-05, throughput=3107 tok/s 2025-11-18 10:35:40,197 - INFO - Epoch 1 Step 7980 (Global: 7980): loss=0.0006, ppl=1.00, grad_norm=0.05, lr=1.58e-05, throughput=3306 tok/s 2025-11-18 10:38:14,379 - INFO - Epoch 1 Step 7990 (Global: 7990): loss=0.0012, ppl=1.00, grad_norm=0.10, lr=1.56e-05, throughput=3113 tok/s 2025-11-18 10:40:40,371 - INFO - Epoch 1 Step 8000 (Global: 8000): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=1.55e-05, throughput=3288 tok/s 2025-11-18 10:40:40,372 - INFO - Running validation at step 8000... 2025-11-18 10:48:43,065 - INFO - Validation loss: 0.0005, perplexity: 1.00 2025-11-18 10:48:43,066 - INFO - Qualitative metrics (n=5): 2025-11-18 10:48:43,066 - INFO - BLEU: 1.0000 2025-11-18 10:48:43,066 - INFO - METEOR: 1.0000 2025-11-18 10:48:43,066 - INFO - Edit Distance: 0.0000 2025-11-18 10:48:43,066 - INFO - F-measure: 1.0000 2025-11-18 10:48:43,066 - INFO - ====================================================================== 2025-11-18 10:48:43,066 - INFO - Qualitative Evaluation Samples: 2025-11-18 10:48:43,066 - INFO - ====================================================================== 2025-11-18 10:48:43,066 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-18 10:48:43,066 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 10:48:43,067 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 10:48:43,067 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 10:48:43,067 - INFO - ---------------------------------------------------------------------- 2025-11-18 10:48:43,067 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-18 10:48:43,067 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 10:48:43,067 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 10:48:43,067 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 10:48:43,067 - INFO - ---------------------------------------------------------------------- 2025-11-18 10:48:43,068 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-18 10:48:43,068 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 10:48:43,068 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 10:48:43,068 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 10:48:43,068 - INFO - ---------------------------------------------------------------------- 2025-11-18 10:48:43,068 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-18 10:48:43,068 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 10:48:43,068 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 10:48:43,068 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 10:48:43,068 - INFO - ---------------------------------------------------------------------- 2025-11-18 10:48:43,068 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-18 10:48:43,069 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 10:48:43,069 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 10:48:43,069 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 10:48:43,069 - INFO - ---------------------------------------------------------------------- 2025-11-18 10:48:43,071 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_8000.jsonl 2025-11-18 10:49:30,950 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-18 10:49:30,968 - INFO - New best validation loss: 0.0005, perplexity: 1.00 2025-11-18 10:51:59,250 - INFO - Epoch 1 Step 8010 (Global: 8010): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=1.54e-05, throughput=3237 tok/s 2025-11-18 10:54:37,882 - INFO - Epoch 1 Step 8020 (Global: 8020): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=1.53e-05, throughput=3026 tok/s 2025-11-18 10:57:04,041 - INFO - Epoch 1 Step 8030 (Global: 8030): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=1.52e-05, throughput=3284 tok/s 2025-11-18 10:59:32,033 - INFO - Epoch 1 Step 8040 (Global: 8040): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=1.50e-05, throughput=3244 tok/s 2025-11-18 11:02:06,663 - INFO - Epoch 1 Step 8050 (Global: 8050): loss=0.0004, ppl=1.00, grad_norm=0.07, lr=1.49e-05, throughput=3104 tok/s 2025-11-18 11:04:32,153 - INFO - Epoch 1 Step 8060 (Global: 8060): loss=0.0012, ppl=1.00, grad_norm=0.11, lr=1.48e-05, throughput=3299 tok/s 2025-11-18 11:06:57,430 - INFO - Epoch 1 Step 8070 (Global: 8070): loss=0.0010, ppl=1.00, grad_norm=0.10, lr=1.47e-05, throughput=3304 tok/s 2025-11-18 11:09:33,744 - INFO - Epoch 1 Step 8080 (Global: 8080): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=1.46e-05, throughput=3071 tok/s 2025-11-18 11:12:02,982 - INFO - Epoch 1 Step 8090 (Global: 8090): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=1.44e-05, throughput=3216 tok/s 2025-11-18 11:14:38,256 - INFO - Epoch 1 Step 8100 (Global: 8100): loss=0.0007, ppl=1.00, grad_norm=0.09, lr=1.43e-05, throughput=3091 tok/s 2025-11-18 11:17:04,143 - INFO - Epoch 1 Step 8110 (Global: 8110): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=1.42e-05, throughput=3290 tok/s 2025-11-18 11:19:30,203 - INFO - Epoch 1 Step 8120 (Global: 8120): loss=0.0010, ppl=1.00, grad_norm=0.08, lr=1.41e-05, throughput=3286 tok/s 2025-11-18 11:21:56,356 - INFO - Epoch 1 Step 8130 (Global: 8130): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=1.40e-05, throughput=3284 tok/s 2025-11-18 11:24:32,535 - INFO - Epoch 1 Step 8140 (Global: 8140): loss=0.0004, ppl=1.00, grad_norm=0.07, lr=1.39e-05, throughput=3073 tok/s 2025-11-18 11:26:57,915 - INFO - Epoch 1 Step 8150 (Global: 8150): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=1.37e-05, throughput=3302 tok/s 2025-11-18 11:29:22,420 - INFO - Epoch 1 Step 8160 (Global: 8160): loss=0.0006, ppl=1.00, grad_norm=0.09, lr=1.36e-05, throughput=3322 tok/s 2025-11-18 11:31:49,280 - INFO - Epoch 1 Step 8170 (Global: 8170): loss=0.0004, ppl=1.00, grad_norm=0.08, lr=1.35e-05, throughput=3268 tok/s 2025-11-18 11:34:26,473 - INFO - Epoch 1 Step 8180 (Global: 8180): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=1.34e-05, throughput=3054 tok/s 2025-11-18 11:36:54,170 - INFO - Epoch 1 Step 8190 (Global: 8190): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=1.33e-05, throughput=3250 tok/s 2025-11-18 11:39:29,094 - INFO - Epoch 1 Step 8200 (Global: 8200): loss=0.0008, ppl=1.00, grad_norm=0.12, lr=1.32e-05, throughput=3098 tok/s 2025-11-18 11:41:55,492 - INFO - Epoch 1 Step 8210 (Global: 8210): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=1.31e-05, throughput=3279 tok/s 2025-11-18 11:44:21,452 - INFO - Epoch 1 Step 8220 (Global: 8220): loss=0.0010, ppl=1.00, grad_norm=0.10, lr=1.29e-05, throughput=3289 tok/s 2025-11-18 11:46:57,583 - INFO - Epoch 1 Step 8230 (Global: 8230): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=1.28e-05, throughput=3074 tok/s 2025-11-18 11:49:24,052 - INFO - Epoch 1 Step 8240 (Global: 8240): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=1.27e-05, throughput=3277 tok/s 2025-11-18 11:51:49,744 - INFO - Epoch 1 Step 8250 (Global: 8250): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=1.26e-05, throughput=3295 tok/s 2025-11-18 11:54:15,168 - INFO - Epoch 1 Step 8260 (Global: 8260): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=1.25e-05, throughput=3301 tok/s 2025-11-18 11:56:49,390 - INFO - Epoch 1 Step 8270 (Global: 8270): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=1.24e-05, throughput=3112 tok/s 2025-11-18 11:59:21,020 - INFO - Epoch 1 Step 8280 (Global: 8280): loss=0.0002, ppl=1.00, grad_norm=0.04, lr=1.23e-05, throughput=3166 tok/s 2025-11-18 12:01:47,565 - INFO - Epoch 1 Step 8290 (Global: 8290): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=1.22e-05, throughput=3276 tok/s 2025-11-18 12:04:25,550 - INFO - Epoch 1 Step 8300 (Global: 8300): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=1.21e-05, throughput=3038 tok/s 2025-11-18 12:06:52,301 - INFO - Epoch 1 Step 8310 (Global: 8310): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=1.20e-05, throughput=3271 tok/s 2025-11-18 12:09:24,581 - INFO - Epoch 1 Step 8320 (Global: 8320): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=1.18e-05, throughput=3152 tok/s 2025-11-18 12:12:04,676 - INFO - Epoch 1 Step 8330 (Global: 8330): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=1.17e-05, throughput=2998 tok/s 2025-11-18 12:14:33,176 - INFO - Epoch 1 Step 8340 (Global: 8340): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=1.16e-05, throughput=3232 tok/s 2025-11-18 12:17:13,223 - INFO - Epoch 1 Step 8350 (Global: 8350): loss=0.0007, ppl=1.00, grad_norm=0.12, lr=1.15e-05, throughput=2999 tok/s 2025-11-18 12:19:44,894 - INFO - Epoch 1 Step 8360 (Global: 8360): loss=0.0005, ppl=1.00, grad_norm=0.08, lr=1.14e-05, throughput=3165 tok/s 2025-11-18 12:22:23,205 - INFO - Epoch 1 Step 8370 (Global: 8370): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=1.13e-05, throughput=3032 tok/s 2025-11-18 12:24:53,811 - INFO - Epoch 1 Step 8380 (Global: 8380): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=1.12e-05, throughput=3187 tok/s 2025-11-18 12:27:22,237 - INFO - Epoch 1 Step 8390 (Global: 8390): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=1.11e-05, throughput=3234 tok/s 2025-11-18 12:29:50,698 - INFO - Epoch 1 Step 8400 (Global: 8400): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=1.10e-05, throughput=3233 tok/s 2025-11-18 12:32:27,549 - INFO - Epoch 1 Step 8410 (Global: 8410): loss=0.0005, ppl=1.00, grad_norm=0.08, lr=1.09e-05, throughput=3060 tok/s 2025-11-18 12:34:55,136 - INFO - Epoch 1 Step 8420 (Global: 8420): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=1.08e-05, throughput=3252 tok/s 2025-11-18 12:37:25,468 - INFO - Epoch 1 Step 8430 (Global: 8430): loss=0.0005, ppl=1.00, grad_norm=0.12, lr=1.07e-05, throughput=3193 tok/s 2025-11-18 12:39:53,942 - INFO - Epoch 1 Step 8440 (Global: 8440): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=1.06e-05, throughput=3233 tok/s 2025-11-18 12:42:31,259 - INFO - Epoch 1 Step 8450 (Global: 8450): loss=0.0007, ppl=1.00, grad_norm=0.09, lr=1.05e-05, throughput=3051 tok/s 2025-11-18 12:44:57,906 - INFO - Epoch 1 Step 8460 (Global: 8460): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=1.04e-05, throughput=3273 tok/s 2025-11-18 12:47:36,239 - INFO - Epoch 1 Step 8470 (Global: 8470): loss=0.0014, ppl=1.00, grad_norm=0.12, lr=1.03e-05, throughput=3032 tok/s 2025-11-18 12:50:07,149 - INFO - Epoch 1 Step 8480 (Global: 8480): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=1.02e-05, throughput=3181 tok/s 2025-11-18 12:52:44,378 - INFO - Epoch 1 Step 8490 (Global: 8490): loss=0.0010, ppl=1.00, grad_norm=0.15, lr=1.01e-05, throughput=3053 tok/s 2025-11-18 12:55:13,431 - INFO - Epoch 1 Step 8500 (Global: 8500): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=9.96e-06, throughput=3220 tok/s 2025-11-18 12:55:13,434 - INFO - Running validation at step 8500... 2025-11-18 13:03:18,674 - INFO - Validation loss: 0.0005, perplexity: 1.00 2025-11-18 13:03:18,675 - INFO - Qualitative metrics (n=5): 2025-11-18 13:03:18,675 - INFO - BLEU: 1.0000 2025-11-18 13:03:18,675 - INFO - METEOR: 1.0000 2025-11-18 13:03:18,675 - INFO - Edit Distance: 0.0000 2025-11-18 13:03:18,675 - INFO - F-measure: 1.0000 2025-11-18 13:03:18,675 - INFO - ====================================================================== 2025-11-18 13:03:18,675 - INFO - Qualitative Evaluation Samples: 2025-11-18 13:03:18,676 - INFO - ====================================================================== 2025-11-18 13:03:18,676 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-18 13:03:18,676 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 13:03:18,676 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 13:03:18,676 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 13:03:18,676 - INFO - ---------------------------------------------------------------------- 2025-11-18 13:03:18,676 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-18 13:03:18,676 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 13:03:18,676 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 13:03:18,677 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 13:03:18,677 - INFO - ---------------------------------------------------------------------- 2025-11-18 13:03:18,677 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-18 13:03:18,677 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 13:03:18,677 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 13:03:18,677 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 13:03:18,677 - INFO - ---------------------------------------------------------------------- 2025-11-18 13:03:18,677 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-18 13:03:18,677 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 13:03:18,677 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 13:03:18,677 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 13:03:18,678 - INFO - ---------------------------------------------------------------------- 2025-11-18 13:03:18,678 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-18 13:03:18,678 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 13:03:18,678 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 13:03:18,678 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 13:03:18,678 - INFO - ---------------------------------------------------------------------- 2025-11-18 13:03:18,680 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_8500.jsonl 2025-11-18 13:04:11,102 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-18 13:04:11,116 - INFO - New best validation loss: 0.0005, perplexity: 1.00 2025-11-18 13:06:35,434 - INFO - Epoch 1 Step 8510 (Global: 8510): loss=0.0005, ppl=1.00, grad_norm=0.10, lr=9.86e-06, throughput=3326 tok/s 2025-11-18 13:09:11,193 - INFO - Epoch 1 Step 8520 (Global: 8520): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=9.76e-06, throughput=3082 tok/s 2025-11-18 13:11:36,580 - INFO - Epoch 1 Step 8530 (Global: 8530): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=9.67e-06, throughput=3302 tok/s 2025-11-18 13:14:00,622 - INFO - Epoch 1 Step 8540 (Global: 8540): loss=0.0007, ppl=1.00, grad_norm=0.09, lr=9.57e-06, throughput=3332 tok/s 2025-11-18 13:16:33,214 - INFO - Epoch 1 Step 8550 (Global: 8550): loss=0.0005, ppl=1.00, grad_norm=0.09, lr=9.47e-06, throughput=3146 tok/s 2025-11-18 13:18:57,373 - INFO - Epoch 1 Step 8560 (Global: 8560): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=9.37e-06, throughput=3330 tok/s 2025-11-18 13:21:21,599 - INFO - Epoch 1 Step 8570 (Global: 8570): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=9.27e-06, throughput=3328 tok/s 2025-11-18 13:23:45,639 - INFO - Epoch 1 Step 8580 (Global: 8580): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=9.18e-06, throughput=3333 tok/s 2025-11-18 13:26:19,526 - INFO - Epoch 1 Step 8590 (Global: 8590): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=9.08e-06, throughput=3119 tok/s 2025-11-18 13:28:43,892 - INFO - Epoch 1 Step 8600 (Global: 8600): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=8.98e-06, throughput=3325 tok/s 2025-11-18 13:31:08,617 - INFO - Epoch 1 Step 8610 (Global: 8610): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=8.89e-06, throughput=3317 tok/s 2025-11-18 13:33:43,461 - INFO - Epoch 1 Step 8620 (Global: 8620): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=8.79e-06, throughput=3100 tok/s 2025-11-18 13:36:10,494 - INFO - Epoch 1 Step 8630 (Global: 8630): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=8.70e-06, throughput=3265 tok/s 2025-11-18 13:38:35,955 - INFO - Epoch 1 Step 8640 (Global: 8640): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=8.60e-06, throughput=3300 tok/s 2025-11-18 13:41:09,700 - INFO - Epoch 1 Step 8650 (Global: 8650): loss=0.0009, ppl=1.00, grad_norm=0.09, lr=8.51e-06, throughput=3122 tok/s 2025-11-18 13:43:34,398 - INFO - Epoch 1 Step 8660 (Global: 8660): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=8.42e-06, throughput=3317 tok/s 2025-11-18 13:46:08,161 - INFO - Epoch 1 Step 8670 (Global: 8670): loss=0.0016, ppl=1.00, grad_norm=0.10, lr=8.32e-06, throughput=3122 tok/s 2025-11-18 13:48:32,514 - INFO - Epoch 1 Step 8680 (Global: 8680): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=8.23e-06, throughput=3325 tok/s 2025-11-18 13:50:57,820 - INFO - Epoch 1 Step 8690 (Global: 8690): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=8.14e-06, throughput=3303 tok/s 2025-11-18 13:53:21,964 - INFO - Epoch 1 Step 8700 (Global: 8700): loss=0.0003, ppl=1.00, grad_norm=0.03, lr=8.05e-06, throughput=3330 tok/s 2025-11-18 13:55:56,335 - INFO - Epoch 1 Step 8710 (Global: 8710): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=7.96e-06, throughput=3109 tok/s 2025-11-18 13:58:20,885 - INFO - Epoch 1 Step 8720 (Global: 8720): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=7.87e-06, throughput=3321 tok/s 2025-11-18 14:00:46,677 - INFO - Epoch 1 Step 8730 (Global: 8730): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=7.78e-06, throughput=3292 tok/s 2025-11-18 14:03:23,739 - INFO - Epoch 1 Step 8740 (Global: 8740): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=7.69e-06, throughput=3056 tok/s 2025-11-18 14:05:50,498 - INFO - Epoch 1 Step 8750 (Global: 8750): loss=0.0007, ppl=1.00, grad_norm=0.10, lr=7.60e-06, throughput=3271 tok/s 2025-11-18 14:08:16,034 - INFO - Epoch 1 Step 8760 (Global: 8760): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=7.51e-06, throughput=3298 tok/s 2025-11-18 14:10:50,500 - INFO - Epoch 1 Step 8770 (Global: 8770): loss=0.0005, ppl=1.00, grad_norm=0.11, lr=7.42e-06, throughput=3108 tok/s 2025-11-18 14:13:18,796 - INFO - Epoch 1 Step 8780 (Global: 8780): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=7.33e-06, throughput=3237 tok/s 2025-11-18 14:15:52,818 - INFO - Epoch 1 Step 8790 (Global: 8790): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=7.25e-06, throughput=3116 tok/s 2025-11-18 14:18:19,242 - INFO - Epoch 1 Step 8800 (Global: 8800): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=7.16e-06, throughput=3278 tok/s 2025-11-18 14:20:46,033 - INFO - Epoch 1 Step 8810 (Global: 8810): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=7.07e-06, throughput=3270 tok/s 2025-11-18 14:23:13,576 - INFO - Epoch 1 Step 8820 (Global: 8820): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=6.99e-06, throughput=3253 tok/s 2025-11-18 14:25:51,286 - INFO - Epoch 1 Step 8830 (Global: 8830): loss=0.0005, ppl=1.00, grad_norm=0.04, lr=6.90e-06, throughput=3044 tok/s 2025-11-18 14:28:19,793 - INFO - Epoch 1 Step 8840 (Global: 8840): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=6.82e-06, throughput=3232 tok/s 2025-11-18 14:30:48,072 - INFO - Epoch 1 Step 8850 (Global: 8850): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=6.74e-06, throughput=3237 tok/s 2025-11-18 14:33:19,368 - INFO - Epoch 1 Step 8860 (Global: 8860): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=6.65e-06, throughput=3173 tok/s 2025-11-18 14:35:56,678 - INFO - Epoch 1 Step 8870 (Global: 8870): loss=0.0008, ppl=1.00, grad_norm=0.24, lr=6.57e-06, throughput=3051 tok/s 2025-11-18 14:38:25,352 - INFO - Epoch 1 Step 8880 (Global: 8880): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=6.49e-06, throughput=3229 tok/s 2025-11-18 14:41:01,015 - INFO - Epoch 1 Step 8890 (Global: 8890): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=6.40e-06, throughput=3084 tok/s 2025-11-18 14:43:24,960 - INFO - Epoch 1 Step 8900 (Global: 8900): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=6.32e-06, throughput=3335 tok/s 2025-11-18 14:45:47,929 - INFO - Epoch 1 Step 8910 (Global: 8910): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=6.24e-06, throughput=3357 tok/s 2025-11-18 14:48:32,487 - INFO - Epoch 1 Step 8920 (Global: 8920): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=6.16e-06, throughput=2917 tok/s 2025-11-18 14:51:12,079 - INFO - Epoch 1 Step 8930 (Global: 8930): loss=0.0009, ppl=1.00, grad_norm=0.11, lr=6.08e-06, throughput=3008 tok/s 2025-11-18 14:53:53,552 - INFO - Epoch 1 Step 8940 (Global: 8940): loss=0.0006, ppl=1.00, grad_norm=0.11, lr=6.00e-06, throughput=2973 tok/s 2025-11-18 14:56:32,041 - INFO - Epoch 1 Step 8950 (Global: 8950): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=5.92e-06, throughput=3029 tok/s 2025-11-18 14:59:17,651 - INFO - Epoch 1 Step 8960 (Global: 8960): loss=0.0005, ppl=1.00, grad_norm=0.08, lr=5.84e-06, throughput=2898 tok/s 2025-11-18 15:01:52,542 - INFO - Epoch 1 Step 8970 (Global: 8970): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=5.76e-06, throughput=3099 tok/s 2025-11-18 15:04:23,892 - INFO - Epoch 1 Step 8980 (Global: 8980): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=5.68e-06, throughput=3172 tok/s 2025-11-18 15:07:00,433 - INFO - Epoch 1 Step 8990 (Global: 8990): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=5.61e-06, throughput=3066 tok/s 2025-11-18 15:09:26,404 - INFO - Epoch 1 Step 9000 (Global: 9000): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=5.53e-06, throughput=3288 tok/s 2025-11-18 15:09:26,407 - INFO - Running validation at step 9000... 2025-11-18 15:17:56,933 - INFO - Validation loss: 0.0005, perplexity: 1.00 2025-11-18 15:17:56,934 - INFO - Qualitative metrics (n=5): 2025-11-18 15:17:56,934 - INFO - BLEU: 1.0000 2025-11-18 15:17:56,934 - INFO - METEOR: 1.0000 2025-11-18 15:17:56,934 - INFO - Edit Distance: 0.0000 2025-11-18 15:17:56,935 - INFO - F-measure: 1.0000 2025-11-18 15:17:56,935 - INFO - ====================================================================== 2025-11-18 15:17:56,935 - INFO - Qualitative Evaluation Samples: 2025-11-18 15:17:56,935 - INFO - ====================================================================== 2025-11-18 15:17:56,935 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-18 15:17:56,935 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 15:17:56,935 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 15:17:56,935 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 15:17:56,935 - INFO - ---------------------------------------------------------------------- 2025-11-18 15:17:56,936 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-18 15:17:56,936 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 15:17:56,936 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 15:17:56,936 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 15:17:56,936 - INFO - ---------------------------------------------------------------------- 2025-11-18 15:17:56,936 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-18 15:17:56,936 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 15:17:56,936 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 15:17:56,936 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 15:17:56,937 - INFO - ---------------------------------------------------------------------- 2025-11-18 15:17:56,937 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-18 15:17:56,937 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 15:17:56,937 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 15:17:56,937 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 15:17:56,937 - INFO - ---------------------------------------------------------------------- 2025-11-18 15:17:56,937 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-18 15:17:56,938 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 15:17:56,938 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 15:17:56,938 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 15:17:56,938 - INFO - ---------------------------------------------------------------------- 2025-11-18 15:17:56,939 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_9000.jsonl 2025-11-18 15:18:48,597 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/best_checkpoint.pt 2025-11-18 15:18:48,613 - INFO - New best validation loss: 0.0005, perplexity: 1.00 2025-11-18 15:21:20,650 - INFO - Epoch 1 Step 9010 (Global: 9010): loss=0.0003, ppl=1.00, grad_norm=0.03, lr=5.45e-06, throughput=3157 tok/s 2025-11-18 15:24:01,834 - INFO - Epoch 1 Step 9020 (Global: 9020): loss=0.0002, ppl=1.00, grad_norm=0.03, lr=5.38e-06, throughput=2978 tok/s 2025-11-18 15:26:37,818 - INFO - Epoch 1 Step 9030 (Global: 9030): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=5.30e-06, throughput=3077 tok/s 2025-11-18 15:29:18,632 - INFO - Epoch 1 Step 9040 (Global: 9040): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=5.23e-06, throughput=2985 tok/s 2025-11-18 15:31:47,199 - INFO - Epoch 1 Step 9050 (Global: 9050): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=5.15e-06, throughput=3231 tok/s 2025-11-18 15:34:35,160 - INFO - Epoch 1 Step 9060 (Global: 9060): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=5.08e-06, throughput=2858 tok/s 2025-11-18 15:37:11,315 - INFO - Epoch 1 Step 9070 (Global: 9070): loss=0.0015, ppl=1.00, grad_norm=0.14, lr=5.01e-06, throughput=3074 tok/s 2025-11-18 15:39:50,132 - INFO - Epoch 1 Step 9080 (Global: 9080): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=4.93e-06, throughput=3022 tok/s 2025-11-18 15:42:14,041 - INFO - Epoch 1 Step 9090 (Global: 9090): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=4.86e-06, throughput=3336 tok/s 2025-11-18 15:44:39,063 - INFO - Epoch 1 Step 9100 (Global: 9100): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=4.79e-06, throughput=3310 tok/s 2025-11-18 15:47:13,698 - INFO - Epoch 1 Step 9110 (Global: 9110): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=4.72e-06, throughput=3104 tok/s 2025-11-18 15:49:41,812 - INFO - Epoch 1 Step 9120 (Global: 9120): loss=0.0011, ppl=1.00, grad_norm=0.11, lr=4.65e-06, throughput=3241 tok/s 2025-11-18 15:52:15,666 - INFO - Epoch 1 Step 9130 (Global: 9130): loss=0.0008, ppl=1.00, grad_norm=0.09, lr=4.58e-06, throughput=3120 tok/s 2025-11-18 15:54:59,513 - INFO - Epoch 1 Step 9140 (Global: 9140): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=4.51e-06, throughput=2930 tok/s 2025-11-18 15:57:29,676 - INFO - Epoch 1 Step 9150 (Global: 9150): loss=0.0007, ppl=1.00, grad_norm=0.09, lr=4.44e-06, throughput=3197 tok/s 2025-11-18 16:00:10,488 - INFO - Epoch 1 Step 9160 (Global: 9160): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=4.37e-06, throughput=2985 tok/s 2025-11-18 16:02:43,331 - INFO - Epoch 1 Step 9170 (Global: 9170): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=4.30e-06, throughput=3141 tok/s 2025-11-18 16:05:13,839 - INFO - Epoch 1 Step 9180 (Global: 9180): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=4.23e-06, throughput=3189 tok/s 2025-11-18 16:07:44,555 - INFO - Epoch 1 Step 9190 (Global: 9190): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=4.17e-06, throughput=3185 tok/s 2025-11-18 16:10:25,021 - INFO - Epoch 1 Step 9200 (Global: 9200): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=4.10e-06, throughput=2991 tok/s 2025-11-18 16:12:58,713 - INFO - Epoch 1 Step 9210 (Global: 9210): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=4.03e-06, throughput=3123 tok/s 2025-11-18 16:15:28,730 - INFO - Epoch 1 Step 9220 (Global: 9220): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=3.97e-06, throughput=3200 tok/s 2025-11-18 16:17:53,043 - INFO - Epoch 1 Step 9230 (Global: 9230): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=3.90e-06, throughput=3326 tok/s 2025-11-18 16:20:28,410 - INFO - Epoch 1 Step 9240 (Global: 9240): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=3.84e-06, throughput=3090 tok/s 2025-11-18 16:22:53,727 - INFO - Epoch 1 Step 9250 (Global: 9250): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=3.77e-06, throughput=3303 tok/s 2025-11-18 16:25:26,824 - INFO - Epoch 1 Step 9260 (Global: 9260): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=3.71e-06, throughput=3135 tok/s 2025-11-18 16:27:50,037 - INFO - Epoch 1 Step 9270 (Global: 9270): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=3.65e-06, throughput=3352 tok/s 2025-11-18 16:30:23,364 - INFO - Epoch 1 Step 9280 (Global: 9280): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=3.58e-06, throughput=3131 tok/s 2025-11-18 16:32:47,418 - INFO - Epoch 1 Step 9290 (Global: 9290): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=3.52e-06, throughput=3332 tok/s 2025-11-18 16:35:19,549 - INFO - Epoch 1 Step 9300 (Global: 9300): loss=0.0008, ppl=1.00, grad_norm=0.10, lr=3.46e-06, throughput=3155 tok/s 2025-11-18 16:38:04,197 - INFO - Epoch 1 Step 9310 (Global: 9310): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=3.40e-06, throughput=2915 tok/s 2025-11-18 16:40:34,395 - INFO - Epoch 1 Step 9320 (Global: 9320): loss=0.0009, ppl=1.00, grad_norm=0.09, lr=3.34e-06, throughput=3196 tok/s 2025-11-18 16:43:07,868 - INFO - Epoch 1 Step 9330 (Global: 9330): loss=0.0005, ppl=1.00, grad_norm=0.04, lr=3.28e-06, throughput=3128 tok/s 2025-11-18 16:45:31,821 - INFO - Epoch 1 Step 9340 (Global: 9340): loss=0.0007, ppl=1.00, grad_norm=0.10, lr=3.22e-06, throughput=3335 tok/s 2025-11-18 16:47:56,587 - INFO - Epoch 1 Step 9350 (Global: 9350): loss=0.0005, ppl=1.00, grad_norm=0.09, lr=3.16e-06, throughput=3316 tok/s 2025-11-18 16:50:33,843 - INFO - Epoch 1 Step 9360 (Global: 9360): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=3.10e-06, throughput=3052 tok/s 2025-11-18 16:53:01,456 - INFO - Epoch 1 Step 9370 (Global: 9370): loss=0.0013, ppl=1.00, grad_norm=0.21, lr=3.05e-06, throughput=3252 tok/s 2025-11-18 16:55:28,151 - INFO - Epoch 1 Step 9380 (Global: 9380): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=2.99e-06, throughput=3272 tok/s 2025-11-18 16:58:04,027 - INFO - Epoch 1 Step 9390 (Global: 9390): loss=0.0008, ppl=1.00, grad_norm=0.09, lr=2.93e-06, throughput=3079 tok/s 2025-11-18 17:00:28,842 - INFO - Epoch 1 Step 9400 (Global: 9400): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=2.88e-06, throughput=3315 tok/s 2025-11-18 17:03:03,862 - INFO - Epoch 1 Step 9410 (Global: 9410): loss=0.0009, ppl=1.00, grad_norm=0.09, lr=2.82e-06, throughput=3096 tok/s 2025-11-18 17:05:29,228 - INFO - Epoch 1 Step 9420 (Global: 9420): loss=0.0005, ppl=1.00, grad_norm=0.09, lr=2.76e-06, throughput=3302 tok/s 2025-11-18 17:07:54,630 - INFO - Epoch 1 Step 9430 (Global: 9430): loss=0.0005, ppl=1.00, grad_norm=0.08, lr=2.71e-06, throughput=3301 tok/s 2025-11-18 17:10:20,710 - INFO - Epoch 1 Step 9440 (Global: 9440): loss=0.0002, ppl=1.00, grad_norm=0.05, lr=2.66e-06, throughput=3286 tok/s 2025-11-18 17:12:56,386 - INFO - Epoch 1 Step 9450 (Global: 9450): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=2.60e-06, throughput=3083 tok/s 2025-11-18 17:15:24,226 - INFO - Epoch 1 Step 9460 (Global: 9460): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=2.55e-06, throughput=3247 tok/s 2025-11-18 17:17:50,903 - INFO - Epoch 1 Step 9470 (Global: 9470): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=2.50e-06, throughput=3273 tok/s 2025-11-18 17:20:25,426 - INFO - Epoch 1 Step 9480 (Global: 9480): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=2.44e-06, throughput=3106 tok/s 2025-11-18 17:22:49,520 - INFO - Epoch 1 Step 9490 (Global: 9490): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=2.39e-06, throughput=3331 tok/s 2025-11-18 17:25:14,006 - INFO - Epoch 1 Step 9500 (Global: 9500): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=2.34e-06, throughput=3322 tok/s 2025-11-18 17:25:14,009 - INFO - Running validation at step 9500... 2025-11-18 17:33:03,744 - INFO - Validation loss: 0.0005, perplexity: 1.00 2025-11-18 17:33:03,745 - INFO - Qualitative metrics (n=5): 2025-11-18 17:33:03,745 - INFO - BLEU: 1.0000 2025-11-18 17:33:03,745 - INFO - METEOR: 1.0000 2025-11-18 17:33:03,745 - INFO - Edit Distance: 0.0000 2025-11-18 17:33:03,745 - INFO - F-measure: 1.0000 2025-11-18 17:33:03,745 - INFO - ====================================================================== 2025-11-18 17:33:03,745 - INFO - Qualitative Evaluation Samples: 2025-11-18 17:33:03,746 - INFO - ====================================================================== 2025-11-18 17:33:03,746 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-18 17:33:03,746 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 17:33:03,746 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 17:33:03,746 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 17:33:03,747 - INFO - ---------------------------------------------------------------------- 2025-11-18 17:33:03,747 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-18 17:33:03,747 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 17:33:03,747 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 17:33:03,747 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 17:33:03,747 - INFO - ---------------------------------------------------------------------- 2025-11-18 17:33:03,747 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-18 17:33:03,748 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 17:33:03,748 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 17:33:03,748 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 17:33:03,748 - INFO - ---------------------------------------------------------------------- 2025-11-18 17:33:03,748 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-18 17:33:03,748 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 17:33:03,749 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 17:33:03,749 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 17:33:03,749 - INFO - ---------------------------------------------------------------------- 2025-11-18 17:33:03,749 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-18 17:33:03,749 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 17:33:03,749 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 17:33:03,749 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 17:33:03,750 - INFO - ---------------------------------------------------------------------- 2025-11-18 17:33:03,751 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_9500.jsonl 2025-11-18 17:35:39,960 - INFO - Epoch 1 Step 9510 (Global: 9510): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=2.29e-06, throughput=3086 tok/s 2025-11-18 17:38:05,465 - INFO - Epoch 1 Step 9520 (Global: 9520): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=2.24e-06, throughput=3299 tok/s 2025-11-18 17:40:39,304 - INFO - Epoch 1 Step 9530 (Global: 9530): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=2.19e-06, throughput=3120 tok/s 2025-11-18 17:43:03,640 - INFO - Epoch 1 Step 9540 (Global: 9540): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=2.14e-06, throughput=3326 tok/s 2025-11-18 17:45:28,267 - INFO - Epoch 1 Step 9550 (Global: 9550): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=2.10e-06, throughput=3319 tok/s 2025-11-18 17:47:53,130 - INFO - Epoch 1 Step 9560 (Global: 9560): loss=0.0008, ppl=1.00, grad_norm=0.09, lr=2.05e-06, throughput=3314 tok/s 2025-11-18 17:50:27,758 - INFO - Epoch 1 Step 9570 (Global: 9570): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=2.00e-06, throughput=3104 tok/s 2025-11-18 17:52:52,301 - INFO - Epoch 1 Step 9580 (Global: 9580): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=1.95e-06, throughput=3321 tok/s 2025-11-18 17:55:16,466 - INFO - Epoch 1 Step 9590 (Global: 9590): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=1.91e-06, throughput=3330 tok/s 2025-11-18 17:57:42,001 - INFO - Epoch 1 Step 9600 (Global: 9600): loss=0.0008, ppl=1.00, grad_norm=0.06, lr=1.86e-06, throughput=3298 tok/s 2025-11-18 18:00:17,375 - INFO - Epoch 1 Step 9610 (Global: 9610): loss=0.0006, ppl=1.00, grad_norm=0.08, lr=1.82e-06, throughput=3089 tok/s 2025-11-18 18:02:43,643 - INFO - Epoch 1 Step 9620 (Global: 9620): loss=0.0005, ppl=1.00, grad_norm=0.05, lr=1.77e-06, throughput=3282 tok/s 2025-11-18 18:05:18,468 - INFO - Epoch 1 Step 9630 (Global: 9630): loss=0.0008, ppl=1.00, grad_norm=0.06, lr=1.73e-06, throughput=3100 tok/s 2025-11-18 18:07:43,515 - INFO - Epoch 1 Step 9640 (Global: 9640): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=1.68e-06, throughput=3309 tok/s 2025-11-18 18:10:09,192 - INFO - Epoch 1 Step 9650 (Global: 9650): loss=0.0002, ppl=1.00, grad_norm=0.04, lr=1.64e-06, throughput=3295 tok/s 2025-11-18 18:12:43,524 - INFO - Epoch 1 Step 9660 (Global: 9660): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=1.60e-06, throughput=3110 tok/s 2025-11-18 18:15:08,076 - INFO - Epoch 1 Step 9670 (Global: 9670): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=1.56e-06, throughput=3321 tok/s 2025-11-18 18:17:33,635 - INFO - Epoch 1 Step 9680 (Global: 9680): loss=0.0008, ppl=1.00, grad_norm=0.08, lr=1.52e-06, throughput=3298 tok/s 2025-11-18 18:19:58,748 - INFO - Epoch 1 Step 9690 (Global: 9690): loss=0.0002, ppl=1.00, grad_norm=0.04, lr=1.48e-06, throughput=3308 tok/s 2025-11-18 18:22:33,809 - INFO - Epoch 1 Step 9700 (Global: 9700): loss=0.0010, ppl=1.00, grad_norm=0.13, lr=1.44e-06, throughput=3096 tok/s 2025-11-18 18:24:57,906 - INFO - Epoch 1 Step 9710 (Global: 9710): loss=0.0007, ppl=1.00, grad_norm=0.08, lr=1.40e-06, throughput=3331 tok/s 2025-11-18 18:27:22,822 - INFO - Epoch 1 Step 9720 (Global: 9720): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=1.36e-06, throughput=3312 tok/s 2025-11-18 18:29:57,867 - INFO - Epoch 1 Step 9730 (Global: 9730): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=1.32e-06, throughput=3096 tok/s 2025-11-18 18:32:22,357 - INFO - Epoch 1 Step 9740 (Global: 9740): loss=0.0004, ppl=1.00, grad_norm=0.07, lr=1.28e-06, throughput=3322 tok/s 2025-11-18 18:34:47,056 - INFO - Epoch 1 Step 9750 (Global: 9750): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=1.24e-06, throughput=3317 tok/s 2025-11-18 18:37:20,805 - INFO - Epoch 1 Step 9760 (Global: 9760): loss=0.0006, ppl=1.00, grad_norm=0.05, lr=1.21e-06, throughput=3122 tok/s 2025-11-18 18:39:45,397 - INFO - Epoch 1 Step 9770 (Global: 9770): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=1.17e-06, throughput=3320 tok/s 2025-11-18 18:42:19,394 - INFO - Epoch 1 Step 9780 (Global: 9780): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=1.13e-06, throughput=3117 tok/s 2025-11-18 18:44:44,449 - INFO - Epoch 1 Step 9790 (Global: 9790): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=1.10e-06, throughput=3309 tok/s 2025-11-18 18:47:09,163 - INFO - Epoch 1 Step 9800 (Global: 9800): loss=0.0009, ppl=1.00, grad_norm=0.08, lr=1.06e-06, throughput=3317 tok/s 2025-11-18 18:49:33,430 - INFO - Epoch 1 Step 9810 (Global: 9810): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=1.03e-06, throughput=3327 tok/s 2025-11-18 18:52:07,761 - INFO - Epoch 1 Step 9820 (Global: 9820): loss=0.0005, ppl=1.00, grad_norm=0.10, lr=9.97e-07, throughput=3110 tok/s 2025-11-18 18:54:32,130 - INFO - Epoch 1 Step 9830 (Global: 9830): loss=0.0011, ppl=1.00, grad_norm=0.09, lr=9.64e-07, throughput=3325 tok/s 2025-11-18 18:56:54,034 - INFO - Epoch 1 Step 9840 (Global: 9840): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=9.32e-07, throughput=3383 tok/s 2025-11-18 18:59:26,009 - INFO - Epoch 1 Step 9850 (Global: 9850): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=9.00e-07, throughput=3158 tok/s 2025-11-18 19:01:48,617 - INFO - Epoch 1 Step 9860 (Global: 9860): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=8.68e-07, throughput=3366 tok/s 2025-11-18 19:04:12,453 - INFO - Epoch 1 Step 9870 (Global: 9870): loss=0.0009, ppl=1.00, grad_norm=0.10, lr=8.37e-07, throughput=3337 tok/s 2025-11-18 19:06:44,472 - INFO - Epoch 1 Step 9880 (Global: 9880): loss=0.0002, ppl=1.00, grad_norm=0.05, lr=8.07e-07, throughput=3158 tok/s 2025-11-18 19:09:06,968 - INFO - Epoch 1 Step 9890 (Global: 9890): loss=0.0003, ppl=1.00, grad_norm=0.08, lr=7.77e-07, throughput=3369 tok/s 2025-11-18 19:11:38,888 - INFO - Epoch 1 Step 9900 (Global: 9900): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=7.48e-07, throughput=3160 tok/s 2025-11-18 19:14:01,241 - INFO - Epoch 1 Step 9910 (Global: 9910): loss=0.0007, ppl=1.00, grad_norm=0.07, lr=7.20e-07, throughput=3372 tok/s 2025-11-18 19:16:23,821 - INFO - Epoch 1 Step 9920 (Global: 9920): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=6.92e-07, throughput=3367 tok/s 2025-11-18 19:18:47,173 - INFO - Epoch 1 Step 9930 (Global: 9930): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=6.64e-07, throughput=3348 tok/s 2025-11-18 19:21:20,310 - INFO - Epoch 1 Step 9940 (Global: 9940): loss=0.0006, ppl=1.00, grad_norm=0.10, lr=6.37e-07, throughput=3135 tok/s 2025-11-18 19:23:43,046 - INFO - Epoch 1 Step 9950 (Global: 9950): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=6.11e-07, throughput=3363 tok/s 2025-11-18 19:26:05,750 - INFO - Epoch 1 Step 9960 (Global: 9960): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=5.85e-07, throughput=3364 tok/s 2025-11-18 19:28:28,714 - INFO - Epoch 1 Step 9970 (Global: 9970): loss=0.0008, ppl=1.00, grad_norm=0.06, lr=5.60e-07, throughput=3358 tok/s 2025-11-18 19:31:01,504 - INFO - Epoch 1 Step 9980 (Global: 9980): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=5.35e-07, throughput=3142 tok/s 2025-11-18 19:33:24,211 - INFO - Epoch 1 Step 9990 (Global: 9990): loss=0.0002, ppl=1.00, grad_norm=0.05, lr=5.11e-07, throughput=3364 tok/s 2025-11-18 19:35:56,815 - INFO - Epoch 1 Step 10000 (Global: 10000): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=4.87e-07, throughput=3145 tok/s 2025-11-18 19:35:56,818 - INFO - Running validation at step 10000... 2025-11-18 19:43:39,750 - INFO - Validation loss: 0.0005, perplexity: 1.00 2025-11-18 19:43:39,751 - INFO - Qualitative metrics (n=5): 2025-11-18 19:43:39,751 - INFO - BLEU: 1.0000 2025-11-18 19:43:39,751 - INFO - METEOR: 1.0000 2025-11-18 19:43:39,751 - INFO - Edit Distance: 0.0000 2025-11-18 19:43:39,751 - INFO - F-measure: 1.0000 2025-11-18 19:43:39,751 - INFO - ====================================================================== 2025-11-18 19:43:39,751 - INFO - Qualitative Evaluation Samples: 2025-11-18 19:43:39,751 - INFO - ====================================================================== 2025-11-18 19:43:39,752 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-18 19:43:39,752 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 19:43:39,752 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 19:43:39,752 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 19:43:39,752 - INFO - ---------------------------------------------------------------------- 2025-11-18 19:43:39,752 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-18 19:43:39,752 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 19:43:39,752 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 19:43:39,752 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 19:43:39,752 - INFO - ---------------------------------------------------------------------- 2025-11-18 19:43:39,753 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-18 19:43:39,753 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 19:43:39,753 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 19:43:39,753 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 19:43:39,753 - INFO - ---------------------------------------------------------------------- 2025-11-18 19:43:39,753 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-18 19:43:39,753 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 19:43:39,753 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 19:43:39,753 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 19:43:39,754 - INFO - ---------------------------------------------------------------------- 2025-11-18 19:43:39,754 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-18 19:43:39,754 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 19:43:39,754 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 19:43:39,754 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 19:43:39,754 - INFO - ---------------------------------------------------------------------- 2025-11-18 19:43:39,756 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_10000.jsonl 2025-11-18 19:46:00,830 - INFO - Epoch 1 Step 10010 (Global: 10010): loss=0.0010, ppl=1.00, grad_norm=0.12, lr=4.64e-07, throughput=3419 tok/s 2025-11-18 19:48:19,959 - INFO - Epoch 1 Step 10020 (Global: 10020): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=4.42e-07, throughput=3450 tok/s 2025-11-18 19:50:50,983 - INFO - Epoch 1 Step 10030 (Global: 10030): loss=0.0008, ppl=1.00, grad_norm=0.12, lr=4.20e-07, throughput=3178 tok/s 2025-11-18 19:53:09,742 - INFO - Epoch 1 Step 10040 (Global: 10040): loss=0.0005, ppl=1.00, grad_norm=0.12, lr=3.98e-07, throughput=3459 tok/s 2025-11-18 19:55:41,397 - INFO - Epoch 1 Step 10050 (Global: 10050): loss=0.0004, ppl=1.00, grad_norm=0.08, lr=3.78e-07, throughput=3165 tok/s 2025-11-18 19:58:00,632 - INFO - Epoch 1 Step 10060 (Global: 10060): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=3.57e-07, throughput=3447 tok/s 2025-11-18 20:00:19,266 - INFO - Epoch 1 Step 10070 (Global: 10070): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=3.38e-07, throughput=3462 tok/s 2025-11-18 20:02:38,552 - INFO - Epoch 1 Step 10080 (Global: 10080): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=3.18e-07, throughput=3446 tok/s 2025-11-18 20:05:07,416 - INFO - Epoch 1 Step 10090 (Global: 10090): loss=0.0004, ppl=1.00, grad_norm=0.04, lr=3.00e-07, throughput=3224 tok/s 2025-11-18 20:07:26,373 - INFO - Epoch 1 Step 10100 (Global: 10100): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=2.82e-07, throughput=3454 tok/s 2025-11-18 20:09:46,640 - INFO - Epoch 1 Step 10110 (Global: 10110): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=2.64e-07, throughput=3422 tok/s 2025-11-18 20:12:07,845 - INFO - Epoch 1 Step 10120 (Global: 10120): loss=0.0006, ppl=1.00, grad_norm=0.14, lr=2.47e-07, throughput=3399 tok/s 2025-11-18 20:14:40,159 - INFO - Epoch 1 Step 10130 (Global: 10130): loss=0.0004, ppl=1.00, grad_norm=0.07, lr=2.31e-07, throughput=3151 tok/s 2025-11-18 20:17:03,353 - INFO - Epoch 1 Step 10140 (Global: 10140): loss=0.0003, ppl=1.00, grad_norm=0.04, lr=2.15e-07, throughput=3352 tok/s 2025-11-18 20:19:33,336 - INFO - Epoch 1 Step 10150 (Global: 10150): loss=0.0008, ppl=1.00, grad_norm=0.07, lr=2.00e-07, throughput=3200 tok/s 2025-11-18 20:21:52,330 - INFO - Epoch 1 Step 10160 (Global: 10160): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=1.85e-07, throughput=3453 tok/s 2025-11-18 20:24:13,437 - INFO - Epoch 1 Step 10170 (Global: 10170): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=1.71e-07, throughput=3402 tok/s 2025-11-18 20:26:46,379 - INFO - Epoch 1 Step 10180 (Global: 10180): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=1.58e-07, throughput=3139 tok/s 2025-11-18 20:29:05,309 - INFO - Epoch 1 Step 10190 (Global: 10190): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=1.45e-07, throughput=3455 tok/s 2025-11-18 20:31:25,517 - INFO - Epoch 1 Step 10200 (Global: 10200): loss=0.0005, ppl=1.00, grad_norm=0.13, lr=1.32e-07, throughput=3424 tok/s 2025-11-18 20:33:45,911 - INFO - Epoch 1 Step 10210 (Global: 10210): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=1.20e-07, throughput=3419 tok/s 2025-11-18 20:36:16,157 - INFO - Epoch 1 Step 10220 (Global: 10220): loss=0.0005, ppl=1.00, grad_norm=0.06, lr=1.09e-07, throughput=3195 tok/s 2025-11-18 20:38:34,641 - INFO - Epoch 1 Step 10230 (Global: 10230): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=9.81e-08, throughput=3466 tok/s 2025-11-18 20:40:55,154 - INFO - Epoch 1 Step 10240 (Global: 10240): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=8.79e-08, throughput=3416 tok/s 2025-11-18 20:43:27,189 - INFO - Epoch 1 Step 10250 (Global: 10250): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=7.83e-08, throughput=3157 tok/s 2025-11-18 20:45:49,260 - INFO - Epoch 1 Step 10260 (Global: 10260): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=6.92e-08, throughput=3379 tok/s 2025-11-18 20:48:12,846 - INFO - Epoch 1 Step 10270 (Global: 10270): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=6.06e-08, throughput=3343 tok/s 2025-11-18 20:50:46,935 - INFO - Epoch 1 Step 10280 (Global: 10280): loss=0.0004, ppl=1.00, grad_norm=0.07, lr=5.27e-08, throughput=3115 tok/s 2025-11-18 20:53:09,284 - INFO - Epoch 1 Step 10290 (Global: 10290): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=4.53e-08, throughput=3372 tok/s 2025-11-18 20:55:41,660 - INFO - Epoch 1 Step 10300 (Global: 10300): loss=0.0004, ppl=1.00, grad_norm=0.06, lr=3.84e-08, throughput=3150 tok/s 2025-11-18 20:58:03,366 - INFO - Epoch 1 Step 10310 (Global: 10310): loss=0.0004, ppl=1.00, grad_norm=0.05, lr=3.21e-08, throughput=3387 tok/s 2025-11-18 21:00:25,208 - INFO - Epoch 1 Step 10320 (Global: 10320): loss=0.0008, ppl=1.00, grad_norm=0.10, lr=2.64e-08, throughput=3384 tok/s 2025-11-18 21:02:51,243 - INFO - Epoch 1 Step 10330 (Global: 10330): loss=0.0007, ppl=1.00, grad_norm=0.06, lr=2.12e-08, throughput=3287 tok/s 2025-11-18 21:05:29,100 - INFO - Epoch 1 Step 10340 (Global: 10340): loss=0.0012, ppl=1.00, grad_norm=0.10, lr=1.66e-08, throughput=3041 tok/s 2025-11-18 21:07:55,126 - INFO - Epoch 1 Step 10350 (Global: 10350): loss=0.0006, ppl=1.00, grad_norm=0.07, lr=1.26e-08, throughput=3287 tok/s 2025-11-18 21:10:19,091 - INFO - Epoch 1 Step 10360 (Global: 10360): loss=0.0006, ppl=1.00, grad_norm=0.06, lr=9.12e-09, throughput=3334 tok/s 2025-11-18 21:12:53,316 - INFO - Epoch 1 Step 10370 (Global: 10370): loss=0.0003, ppl=1.00, grad_norm=0.05, lr=6.20e-09, throughput=3112 tok/s 2025-11-18 21:15:18,795 - INFO - Epoch 1 Step 10380 (Global: 10380): loss=0.0010, ppl=1.00, grad_norm=0.08, lr=3.84e-09, throughput=3300 tok/s 2025-11-18 21:17:43,644 - INFO - Epoch 1 Step 10390 (Global: 10390): loss=0.0004, ppl=1.00, grad_norm=0.07, lr=2.05e-09, throughput=3314 tok/s 2025-11-18 21:20:35,379 - INFO - Epoch 1 Step 10400 (Global: 10400): loss=0.0005, ppl=1.00, grad_norm=0.07, lr=8.11e-10, throughput=2795 tok/s 2025-11-18 21:23:31,415 - INFO - Epoch 1 Step 10410 (Global: 10410): loss=0.0004, ppl=1.00, grad_norm=0.10, lr=1.38e-10, throughput=2727 tok/s 2025-11-18 21:25:11,141 - INFO - Flushing 8 remainder batches from gradient accumulation 2025-11-18 21:25:11,146 - INFO - Rescaling gradients by 1.50x (compensating for 8/12 batches) 2025-11-18 21:25:11,362 - INFO - Remainder batch: loss=0.0004, ppl=1.00, grad_norm=0.11 2025-11-18 21:25:11,377 - INFO - Epoch 1 training: loss=0.1588, ppl=1.17, grad_norm=0.30, throughput=3060 tok/s (163386.9s total) 2025-11-18 21:25:11,385 - INFO - Running final validation... 2025-11-18 21:34:01,419 - INFO - Validation loss: 0.0005, perplexity: 1.00 2025-11-18 21:34:01,420 - INFO - Qualitative metrics (n=5): 2025-11-18 21:34:01,420 - INFO - BLEU: 1.0000 2025-11-18 21:34:01,420 - INFO - METEOR: 1.0000 2025-11-18 21:34:01,420 - INFO - Edit Distance: 0.0000 2025-11-18 21:34:01,421 - INFO - F-measure: 1.0000 2025-11-18 21:34:01,421 - INFO - ====================================================================== 2025-11-18 21:34:01,421 - INFO - Qualitative Evaluation Samples: 2025-11-18 21:34:01,421 - INFO - ====================================================================== 2025-11-18 21:34:01,421 - INFO - Sample 1 (ID: sample_141920_chunk_1): 2025-11-18 21:34:01,422 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 21:34:01,422 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 21:34:01,422 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' 2025-11-18 21:34:01,422 - INFO - ---------------------------------------------------------------------- 2025-11-18 21:34:01,423 - INFO - Sample 2 (ID: sample_170543_chunk_2): 2025-11-18 21:34:01,423 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 21:34:01,423 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 21:34:01,423 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' 2025-11-18 21:34:01,423 - INFO - ---------------------------------------------------------------------- 2025-11-18 21:34:01,424 - INFO - Sample 3 (ID: sample_107152_chunk_9): 2025-11-18 21:34:01,424 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 21:34:01,424 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 21:34:01,424 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' 2025-11-18 21:34:01,424 - INFO - ---------------------------------------------------------------------- 2025-11-18 21:34:01,425 - INFO - Sample 4 (ID: sample_069148_chunk_0): 2025-11-18 21:34:01,425 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 21:34:01,425 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 21:34:01,425 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' 2025-11-18 21:34:01,425 - INFO - ---------------------------------------------------------------------- 2025-11-18 21:34:01,426 - INFO - Sample 5 (ID: sample_103176_chunk_4): 2025-11-18 21:34:01,426 - INFO - Context: [Conv1D Residual compressed from 1000 tokens] 2025-11-18 21:34:01,426 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 21:34:01,426 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' 2025-11-18 21:34:01,426 - INFO - ---------------------------------------------------------------------- 2025-11-18 21:34:01,428 - INFO - Qualitative samples saved to: outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/qualitative_step_10417.jsonl 2025-11-18 21:34:02,064 - INFO - Training complete! 2025-11-18 21:34:47,946 - INFO - Saved checkpoint to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/final_checkpoint.pt 2025-11-18 21:34:47,953 - INFO - Final checkpoint saved to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035/final_checkpoint.pt 2025-11-18 21:34:47,954 - INFO - Best validation loss: 0.0005, perplexity: 1.00 2025-11-18 21:34:47,955 - INFO - Checkpoints saved to outputs/production_conv1d_residual_t250_k5_reconstruction_20251116_235035 2025-11-18 21:34:48,622 - INFO - W&B run finished