IMU-1: Sample-Efficient Pre-training of Small Language Models Paper • 2602.02522 • Published 16 days ago • 5
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published Apr 29, 2025 • 94
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published Feb 20, 2025 • 174
LLaMA Beyond English: An Empirical Study on Language Capability Transfer Paper • 2401.01055 • Published Jan 2, 2024 • 55