multimodal LLM - a joytafty Collection

joytafty 's Collections

LLM Hallucination

multimodal LLM

updated Sep 10, 2024

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 43
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning

Paper • 2310.08166 • Published Oct 12, 2023 • 1
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

Paper • 2310.00653 • Published Oct 1, 2023 • 3
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Paper • 2312.00849 • Published Dec 1, 2023 • 12
Merlin:Empowering Multimodal LLMs with Foresight Minds

Paper • 2312.00589 • Published Nov 30, 2023 • 27
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models

Paper • 2312.17661 • Published Dec 29, 2023 • 15
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Paper • 2312.16862 • Published Dec 28, 2023 • 31
Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24, 2024 • 33
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133