Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Paper • 2510.04212 • Published Oct 5, 2025 • 23
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1, 2025 • 58
Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published Sep 9, 2025 • 83
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers Paper • 2509.06493 • Published Sep 8, 2025 • 11
view article Article Welcome EmbeddingGemma, Google's new efficient embedding model +4 Sep 4, 2025 • 267