Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks Paper β’ 2505.11881 β’ Published May 17 β’ 4
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper β’ 2510.05684 β’ Published Oct 7 β’ 141
ESREAL: Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models Paper β’ 2403.16167 β’ Published Mar 24, 2024 β’ 1
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper β’ 2510.05684 β’ Published Oct 7 β’ 141
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper β’ 2510.05684 β’ Published Oct 7 β’ 141
Exploring Fine-Tuning of Large Audio Language Models for Spoken Language Understanding under Limited Speech data Paper β’ 2509.15389 β’ Published Sep 18 β’ 3
Exploring Fine-Tuning of Large Audio Language Models for Spoken Language Understanding under Limited Speech data Paper β’ 2509.15389 β’ Published Sep 18 β’ 3
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language Paper β’ 2503.23730 β’ Published Mar 31 β’ 3
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language Paper β’ 2503.23730 β’ Published Mar 31 β’ 3