A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos Paper • 2512.16978 • Published 13 days ago • 4
Robust and Calibrated Detection of Authentic Multimedia Content Paper • 2512.15182 • Published 14 days ago • 15
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos Paper • 2506.05349 • Published Jun 5 • 24
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published Mar 6 • 72
UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities Paper • 2412.10372 • Published Dec 13, 2024 • 3
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities Paper • 2412.07769 • Published Dec 10, 2024 • 30