We just released a big blog surveying 16 OSS frameworks for async RL training of LLMs!
We're building a new async GRPO trainer for TRL and as first step, we needed to understand how the ecosystem solves this problem today.
The problem: in synchronous RL training, generation dominates wall-clock time. 32K-token rollouts on a 32B model take hours while training GPUs sit completely idle. With reasoning models and agentic RL making rollouts longer and more variable, this only gets worse.
The ecosystem converged on the same fix: separate inference + training onto different GPU pools, rollout buffer, and async weight sync.
We compared 16 frameworks across 7 axes: orchestration, buffer design, weight sync, staleness management, partial rollouts, LoRA, and MoE support.
This survey is step one. The async GRPO trainer for TRL is next!
Nemotron 3 Super by @nvidia is here! NVIDIA's hybrid Mamba2/Transformer models are now natively supported in transformers (no trust_remote_code needed)
Fine-tune them with TRL in just a few lines of code. Notebook + script included to get started right away. goooo!
What happens when you make an LLM drive a car where physics are real and actions can't be undone?
I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.
The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.
In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.
The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.
This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.
@CohereLabs just released πΏ Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages π! But thereβs a catch:
Tiny Aya is just a language model. It doesnβt support tool calling, the key capability that turns frontier models into powerful *agents*. So the real question is:
How hard is it to turn Tiny Aya into an agent?
Turns outβ¦ itβs simple, thanks to Hugging Face TRL. Weβre sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*.
if you're looking for a good first issue to get your open-source journey started, you could contribute to this TRL issue by documenting one impactful paper in the docs
FunctionGemma Tuning Lab is a new no-code tool by @google that lets you fine-tune a model directly from the browser, with no coding knowledge required, using TRL behind the scenes.
It includes GDPO, the latest variant of GRPO for multi-reward RL β¨ GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence β developed by @sliuau@SimonX et al.
Recursive Language Models (RLM) is a new interface for LLMs with cool ideas by Alex Zhang!
β οΈ LLMs struggle with long prompts β attention overload & lost info π RLMs inspect, split & call themselves on chunks, then aggregate results β Handles millions of tokens, reduces noise, improves reasoning π‘ System prompt guides recursion π― RLM trajectories can be used for RL training or distillation (OpenEnv+TRL!!)