1 44 1

Rui Sun PRO

ThreeSR

https://threesr.github.io/

AI & ML interests

Vision and Language Multimodal Learning, CV, NLP, LLM

Recent Activity

updated a collection 2 days ago

New Papers

updated a collection 2 days ago

New Papers

updated a collection 2 days ago

New Papers

View all activity

Organizations

upvoted a paper 12 days ago

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Paper • 2601.18491 • Published 17 days ago • 123

upvoted a paper 22 days ago

Aligning Agentic World Models via Knowledgeable Experience Learning

Paper • 2601.13247 • Published 24 days ago • 15

upvoted a paper 26 days ago

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Paper • 2511.15705 • Published Nov 19, 2025 • 97

upvoted an article about 1 month ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Jun 3, 2025

•

320

upvoted a paper 2 months ago

RELIC: Interactive Video World Model with Long-Horizon Memory

Paper • 2512.04040 • Published Dec 3, 2025 • 24

upvoted 13 papers 3 months ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published Nov 24, 2025 • 61

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 93

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Paper • 2510.26802 • Published Oct 30, 2025 • 34

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30, 2025 • 123

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30, 2025 • 110

upvoted a paper 4 months ago

Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6, 2025 • 119

upvoted a paper 5 months ago

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24, 2025 • 100

Rui Sun PRO

AI & ML interests

Recent Activity

Organizations

ThreeSR's activity

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data