Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published May 19 • 45
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model Paper • 2504.10068 • Published Apr 14 • 30
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published Jan 21 • 66
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training Paper • 2501.08197 • Published Jan 14 • 9
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published Dec 16, 2024 • 58
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 52
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published Dec 12, 2024 • 30
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation Paper • 2405.20092 • Published May 30, 2024 • 1
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 72
Distilled Dual-Encoder Model for Vision-Language Understanding Paper • 2112.08723 • Published Dec 16, 2021
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models Paper • 2305.15033 • Published May 24, 2023
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information Paper • 2409.13199 • Published Sep 20, 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension Paper • 2406.18227 • Published Jun 26, 2024
MTGER: Multi-view Temporal Graph Enhanced Temporal Reasoning over Time-Involved Document Paper • 2311.04816 • Published Nov 8, 2023