Collections
Discover the best community collections!
Collections including paper arxiv:2503.22268
-
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Paper • 2503.10437 • Published • 33 -
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Paper • 2503.09642 • Published • 19 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 34 -
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Paper • 2503.16422 • Published • 14
-
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 47 -
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 52 -
An Empirical Study of Autoregressive Pre-training from Videos
Paper • 2501.05453 • Published • 41 -
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
Paper • 2501.07556 • Published • 7
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 20 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 10 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 29 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 121
-
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
Paper • 2502.08590 • Published • 42 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47 -
Soundwave: Less is More for Speech-Text Alignment in LLMs
Paper • 2502.12900 • Published • 86 -
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space
Paper • 2503.09419 • Published • 6
-
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
Paper • 2411.05738 • Published • 15 -
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents
Paper • 2410.22476 • Published • 27 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49 -
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 25
-
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 119 -
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Paper • 2408.07931 • Published • 22 -
DELTA: Dense Efficient Long-range 3D Tracking for any video
Paper • 2410.24211 • Published • 9 -
Segment Any Motion in Videos
Paper • 2503.22268 • Published • 19
-
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Paper • 2503.10437 • Published • 33 -
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Paper • 2503.09642 • Published • 19 -
VGGT: Visual Geometry Grounded Transformer
Paper • 2503.11651 • Published • 34 -
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Paper • 2503.16422 • Published • 14
-
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
Paper • 2502.08590 • Published • 42 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47 -
Soundwave: Less is More for Speech-Text Alignment in LLMs
Paper • 2502.12900 • Published • 86 -
Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space
Paper • 2503.09419 • Published • 6
-
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 47 -
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 52 -
An Empirical Study of Autoregressive Pre-training from Videos
Paper • 2501.05453 • Published • 41 -
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
Paper • 2501.07556 • Published • 7
-
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
Paper • 2411.05738 • Published • 15 -
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents
Paper • 2410.22476 • Published • 27 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49 -
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 25
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 119 -
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Paper • 2408.07931 • Published • 22 -
DELTA: Dense Efficient Long-range 3D Tracking for any video
Paper • 2410.24211 • Published • 9 -
Segment Any Motion in Videos
Paper • 2503.22268 • Published • 19
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 20 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 10 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 29 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 121