MeshSplatting: Differentiable Rendering with Opaque Meshes Paper • 2512.06818 • Published 21 days ago • 10
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion Paper • 2510.20766 • Published Oct 23 • 34
Revisiting Multimodal Positional Encoding in Vision-Language Models Paper • 2510.23095 • Published Oct 27 • 20
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24 • 81
Does FLUX Already Know How to Perform Physically Plausible Image Composition? Paper • 2509.21278 • Published Sep 25 • 16
"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries Paper • 2508.15752 • Published Aug 21 • 7
Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds Paper • 2508.14892 • Published Aug 20 • 9
FoNE: Precise Single-Token Number Embeddings via Fourier Features Paper • 2502.09741 • Published Feb 13 • 15
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer Paper • 2505.04622 • Published May 7 • 27
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper • 2505.04601 • Published May 7 • 29
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 87
SEAL: Entangled White-box Watermarks on Low-Rank Adaptation Paper • 2501.09284 • Published Jan 16 • 10
GameFactory: Creating New Games with Generative Interactive Videos Paper • 2501.08325 • Published Jan 14 • 67
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking Paper • 2501.02690 • Published Jan 5 • 16