MMMU

non-profit

https://mmmu-benchmark.github.io/

Activity Feed Request to join this org

AI & ML interests

Multimodal Model Evaluation

Recent Activity

zhangysk authored a paper 2 days ago

BABE: Biology Arena BEnchmark

zhangysk authored a paper 2 days ago

Context Forcing: Consistent Autoregressive Video Generation with Long Context

zhangysk authored a paper 2 days ago

Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities

View all activity

zhangysk

authored 3 papers 2 days ago

BABE: Biology Arena BEnchmark

Paper • 2602.05857 • Published 3 days ago • 8

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Paper • 2602.06028 • Published 3 days ago • 29

Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities

Paper • 2601.21937 • Published 10 days ago • 17

zhangysk

authored a paper 9 days ago

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Paper • 2601.21420 • Published 10 days ago • 42

drogozhang

authored a paper 12 days ago

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Paper • 2601.18217 • Published 13 days ago • 11

wren93

authored 3 papers 19 days ago

Scaling Zero-Shot Reference-to-Video Generation

Paper • 2512.06905 • Published Dec 7, 2025 • 29

OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Paper • 2512.07802 • Published Dec 8, 2025 • 45

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Paper • 2512.21338 • Published Dec 24, 2025 • 22

zhangysk

authored 6 papers 27 days ago

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Paper • 2512.12730 • Published Dec 14, 2025 • 45

AutoMV: An Automatic Multi-Agent System for Music Video Generation

Paper • 2512.12196 • Published Dec 13, 2025 • 6

Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements

Paper • 2512.24867 • Published Dec 31, 2025 • 1

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Paper • 2512.24617 • Published Dec 31, 2025 • 64

AInsteinBench: Benchmarking Coding Agents on Scientific Repositories

Paper • 2512.21373 • Published Dec 24, 2025

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Paper • 2601.06002 • Published 30 days ago • 52

yuanshengni

in MMMU/MMMU 27 days ago

wrong_use，need deleted

#6 opened about 1 month ago by

yuexiang96

authored 4 papers about 2 months ago

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Paper • 2510.24702 • Published Oct 28, 2025 • 29

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29, 2025 • 46

Simulating Environments with Reasoning Models for Agent Training

Paper • 2511.01824 • Published Nov 3, 2025 • 2

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published Dec 8, 2025 • 38

zhangysk

submitted a paper to Daily Papers about 2 months ago

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Paper • 2512.12730 • Published Dec 14, 2025 • 45