Efficient Adversarial Training in LLMs with Continuous Attacks Paper • 2405.15589 • Published May 24, 2024
Contrastive Language-Image Pretrained Models are Zero-Shot Human Scanpath Predictors Paper • 2305.12380 • Published May 21, 2023
Closing the Distribution Gap in Adversarial Training for LLMs Paper • 2602.15238 • Published about 1 month ago
A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness Paper • 2603.06594 • Published Feb 4 • 1
CoinflipForSafety Collection Datasets from the paper: A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness (arxiv: https://arxiv.org/abs/2603.06594) • 4 items • Updated 3 days ago • 1
A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness Paper • 2603.06594 • Published Feb 4 • 1
CoinflipForSafety Collection Datasets from the paper: A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness (arxiv: https://arxiv.org/abs/2603.06594) • 4 items • Updated 3 days ago • 1
CoinflipForSafety Collection Datasets from the paper: A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness (arxiv: https://arxiv.org/abs/2603.06594) • 4 items • Updated 3 days ago • 1