UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper โข 2602.12279 โข Published 12 days ago โข 19
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper โข 2602.12279 โข Published 12 days ago โข 19
Video Instruction Tuning With Synthetic Data Paper โข 2410.02713 โข Published Oct 3, 2024 โข 41 โข 3
LLaVA-Critic: Learning to Evaluate Multimodal Models Paper โข 2410.02712 โข Published Oct 3, 2024 โข 37
LLaVA-Critic: Learning to Evaluate Multimodal Models Paper โข 2410.02712 โข Published Oct 3, 2024 โข 37
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper โข 2409.12959 โข Published Sep 19, 2024 โข 38
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper โข 2408.16768 โข Published Aug 29, 2024 โข 28
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper โข 2407.07895 โข Published Jul 10, 2024 โข 42
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding Paper โข 2406.09411 โข Published Jun 13, 2024 โข 19
TrustLLM: Trustworthiness in Large Language Models Paper โข 2401.05561 โข Published Jan 10, 2024 โข 69
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper โข 2312.02949 โข Published Dec 5, 2023 โข 14
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper โข 2311.05437 โข Published Nov 9, 2023 โข 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper โข 2311.00571 โข Published Nov 1, 2023 โข 43
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V Paper โข 2310.11441 โข Published Oct 17, 2023 โข 29