9 89 25

Zesen Cheng

ClownRat

https://clownrat6.github.io/

AI & ML interests

multi-modal foundation model; Segmentation, Detection, and Tracking;

Recent Activity

authored a paper 2 days ago

Qwen3-VL Technical Report

upvoted a paper 2 months ago

Reinforcement Learning on Pre-Training Data

updated a model 3 months ago

DAMO-NLP-SG/VideoLLaMA2.1-7B-16F

View all activity

Organizations

authored a paper 2 days ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published 10 days ago • 106

upvoted a paper 2 months ago

Reinforcement Learning on Pre-Training Data

Paper • 2509.19249 • Published Sep 23 • 68

updated a model 3 months ago

DAMO-NLP-SG/VideoLLaMA2.1-7B-16F

Video-Text-to-Text • 8B • Updated Sep 4 • 1.11k • 10

liked a model 4 months ago

rednote-hilab/dots.vlm1.inst

Image-Text-to-Text • 672B • Updated Aug 21 • 9.48k • 80

liked a dataset 5 months ago

OpenGVLab/VideoChat-Flash-Training-Data

Viewer • Updated Jun 24 • 87k • 41.1k • 13

liked a Space 5 months ago

VideoRefer VideoLLaMA3

👀

VideoRefer x VideoLLaMA3

upvoted 2 papers 6 months ago

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 114

EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?

Paper • 2506.05287 • Published Jun 5 • 15

liked a dataset 6 months ago

fesvhtr/FunQA

Updated Sep 9, 2024 • 142 • 1

upvoted 6 papers 6 months ago

Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Paper • 2504.20966 • Published Apr 29 • 32

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Paper • 2504.07956 • Published Apr 10 • 47

MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention

Paper • 2504.16083 • Published Apr 22 • 9

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

Paper • 2505.09558 • Published May 14 • 11

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 73

Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 38

upvoted 5 papers 7 months ago

DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1 • 54

Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Paper • 2406.07867 • Published Jun 12, 2024 • 1

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 85

Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 134

Llama-Nemotron: Efficient Reasoning Models

Paper • 2505.00949 • Published May 2 • 42

Zesen Cheng

AI & ML interests

Recent Activity

Organizations

ClownRat's activity

VideoRefer VideoLLaMA3