9 89 25

Zesen Cheng

ClownRat

https://clownrat6.github.io/

AI & ML interests

multi-modal foundation model; Segmentation, Detection, and Tracking;

Recent Activity

authored a paper 2 days ago

Qwen3-VL Technical Report

upvoted a paper 2 months ago

Reinforcement Learning on Pre-Training Data

updated a model 3 months ago

DAMO-NLP-SG/VideoLLaMA2.1-7B-16F

View all activity

Organizations

upvoted a paper 2 months ago

Reinforcement Learning on Pre-Training Data

Paper • 2509.19249 • Published Sep 23 • 68

upvoted 8 papers 6 months ago

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 114

EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?

Paper • 2506.05287 • Published Jun 5 • 15

Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Paper • 2504.20966 • Published Apr 29 • 32

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Paper • 2504.07956 • Published Apr 10 • 47

MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention

Paper • 2504.16083 • Published Apr 22 • 9

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

Paper • 2505.09558 • Published May 14 • 11

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 73

Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 38

upvoted 11 papers 7 months ago

DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1 • 54

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Paper • 2406.07867 • Published Jun 12, 2024 • 1

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 85

Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 134

Llama-Nemotron: Efficient Reasoning Models

Paper • 2505.00949 • Published May 2 • 42

LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Paper • 2505.02625 • Published May 5 • 22

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Paper • 2505.02735 • Published May 5 • 34

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Paper • 2505.02835 • Published May 5 • 29

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 80

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Paper • 2505.03739 • Published May 6 • 10

Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models

Paper • 2505.03821 • Published May 3 • 25

Zesen Cheng

AI & ML interests

Recent Activity

Organizations

ClownRat's activity