RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
• 2412.14922
• Published
• 88
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
• 2412.17256
• Published
• 47
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper
• 2412.17747
• Published
• 32
Outcome-Refining Process Supervision for Code Generation
Paper
• 2412.15118
• Published
• 19
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
• 2501.03262
• Published
• 104
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published
• 115
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 441
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper
• 2501.12599
• Published
• 126
Towards General-Purpose Model-Free Reinforcement Learning
Paper
• 2501.16142
• Published
• 31
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
• 2501.17703
• Published
• 59