AGT^{AO}: Robust and Stabilized LLM Unlearning via Adversarial Gating Training with Adaptive Orthogonality Paper • 2602.01703 • Published Feb 2 • 1
WMDP Benchmark Collection The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning • 9 items • Updated 3 days ago • 10