Spectral Condition for $μ$P under Width-Depth Scaling Paper • 2603.00541 • Published 10 days ago • 15
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing Paper • 2311.01410 • Published Nov 2, 2023
Revisiting Discriminative vs. Generative Classifiers: Theory and Implications Paper • 2302.02334 • Published Feb 5, 2023
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability Paper • 2405.16845 • Published May 27, 2024