multimodal LLM
updated
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
Generation and Editing
Paper
• 2311.00571
• Published
• 43
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
• 2311.05437
• Published
• 51
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task
Instruction Tuning
Paper
• 2310.08166
• Published
• 1
Reformulating Vision-Language Foundation Models and Datasets Towards
Universal Multimodal Assistants
Paper
• 2310.00653
• Published
• 3
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
Fine-grained Correctional Human Feedback
Paper
• 2312.00849
• Published
• 12
Merlin:Empowering Multimodal LLMs with Foresight Minds
Paper
• 2312.00589
• Published
• 27
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved
Pre-Training
Paper
• 2401.00849
• Published
• 17
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language
Models
Paper
• 2312.17661
• Published
• 15
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
• 2312.16862
• Published
• 31
Long Context Transfer from Language to Vision
Paper
• 2406.16852
• Published
• 33
Building and better understanding vision-language models: insights and
future directions
Paper
• 2408.12637
• Published
• 133