Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published Jun 8 • 114
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World? Paper • 2506.05287 • Published Jun 5 • 15
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29 • 32
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published Apr 10 • 47
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention Paper • 2504.16083 • Published Apr 22 • 9
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators Paper • 2505.09558 • Published May 14 • 11
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper • 2505.09343 • Published May 14 • 73
DeepCritic: Deliberate Critique with Large Language Models Paper • 2505.00662 • Published May 1 • 54
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation Paper • 2406.07867 • Published Jun 12, 2024 • 1
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published May 5 • 85
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published May 20 • 134