Geometrically-Constrained Agent for Spatial Reasoning Paper • 2511.22659 • Published 13 days ago • 38
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published Oct 20 • 67
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law Paper • 2507.18576 • Published Jul 24 • 8
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step Paper • 2509.23924 • Published Sep 28 • 8
Rethinking Entropy Regularization in Large Reasoning Models Paper • 2509.25133 • Published Sep 29 • 4
Rethinking Entropy Regularization in Large Reasoning Models Paper • 2509.25133 • Published Sep 29 • 4
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models Paper • 2509.23962 • Published Sep 28 • 5 • 2
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions Paper • 2510.08211 • Published Oct 9 • 22
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards Paper • 2510.08529 • Published Oct 9 • 18
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models Paper • 2509.23962 • Published Sep 28 • 5
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models Paper • 2509.23962 • Published Sep 28 • 5
Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring Paper • 2502.05242 • Published Feb 7
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step Paper • 2509.23924 • Published Sep 28 • 8
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published Sep 18 • 53
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45^{circ} Law Paper • 2507.18576 • Published Jul 24 • 8