-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2510.13786
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 247 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 117
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 31 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 138 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
-
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 30 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 39 -
BitNet Distillation
Paper • 2510.13998 • Published • 54 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 47
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 492 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 140 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 139
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 492 -
SpikingBrain Technical Report: Spiking Brain-inspired Large Models
Paper • 2509.05276 • Published • 3 -
Self-Adapting Language Models
Paper • 2506.10943 • Published • 6 -
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 30
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 138 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 247 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 117
-
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 30 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 39 -
BitNet Distillation
Paper • 2510.13998 • Published • 54 -
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper • 2510.19430 • Published • 47
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 492 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 140 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 139
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 31 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 492 -
SpikingBrain Technical Report: Spiking Brain-inspired Large Models
Paper • 2509.05276 • Published • 3 -
Self-Adapting Language Models
Paper • 2506.10943 • Published • 6 -
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper • 2510.13786 • Published • 30