Idea
updated
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
• 2402.14083
• Published • 47
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published • 626
Genie: Generative Interactive Environments
Paper
• 2402.15391
• Published • 72
Humanoid Locomotion as Next Token Prediction
Paper
• 2402.19469
• Published • 29
ViTAR: Vision Transformer with Any Resolution
Paper
• 2403.18361
• Published • 55
Simulating Classroom Education with LLM-Empowered Agents
Paper
• 2406.19226
• Published • 32
MIRAI: Evaluating LLM Agents for Event Forecasting
Paper
• 2407.01231
• Published • 18
Prithvi WxC: Foundation Model for Weather and Climate
Paper
• 2409.13598
• Published • 45
Selective Attention Improves Transformer
Paper
• 2410.02703
• Published • 25
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper
• 2411.17465
• Published • 90
Chimera: Improving Generalist Model with Domain-Specific Experts
Paper
• 2412.05983
• Published • 9
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper
• 2412.08635
• Published • 49
Large Action Models: From Inception to Implementation
Paper
• 2412.10047
• Published • 36
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
• 2412.09871
• Published • 108
AnySat: An Earth Observation Model for Any Resolutions, Scales, and
Modalities
Paper
• 2412.14123
• Published • 11
Cosmos World Foundation Model Platform for Physical AI
Paper
• 2501.03575
• Published • 82
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
• 2501.04682
• Published • 99
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot
Planning
Paper
• 2411.04983
• Published • 13
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published • 154
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Paper
• 2502.05173
• Published • 64
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
• 2502.11089
• Published • 169
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context
Memory of Transformers
Paper
• 2502.15007
• Published • 175
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Paper
• 2502.20395
• Published • 45
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper
• 2503.14456
• Published • 154
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper
• 2503.15558
• Published • 50
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
• 2504.01990
• Published • 305
Paper
• 2504.00927
• Published • 56
One-Minute Video Generation with Test-Time Training
Paper
• 2504.05298
• Published • 110
MineWorld: a Real-Time and Open-Source Interactive World Model on
Minecraft
Paper
• 2504.08388
• Published • 42
SocioVerse: A World Model for Social Simulation Powered by LLM Agents
and A Pool of 10 Million Real-World Users
Paper
• 2504.10157
• Published • 17
Adaptive Computation Pruning for the Forgetting Transformer
Paper
• 2504.06949
• Published • 3
Voila: Voice-Language Foundation Models for Real-Time Autonomous
Interaction and Voice Role-Play
Paper
• 2505.02707
• Published • 85
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper
• 2506.06962
• Published • 28
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper
• 2508.05004
• Published • 131
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Paper
• 2508.11987
• Published • 72
Intelligence per Watt: Measuring Intelligence Efficiency of Local AI
Paper
• 2511.07885
• Published • 10