view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL +4 Jun 3 • 96
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published Mar 12 • 36
Learning to Learn Faster from Human Feedback with Language Model Predictive Control Paper • 2402.11450 • Published Feb 18, 2024 • 22