Submitted by akhaliq 27 LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model · 5 authors 4 2
Submitted by akhaliq 24 CameraCtrl: Enabling Camera Control for Text-to-Video Generation · 7 authors 644 1
Submitted by akhaliq 22 Bigger is not Always Better: Scaling Properties of Latent Diffusion Models · 6 authors 1
Submitted by akhaliq 8 LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models · 7 authors 2 1