Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 8 days ago • 85
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 54
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 168
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning Paper • 2409.03025 • Published Sep 4, 2024 • 1