Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory Paper • 2310.17884 • Published Oct 27, 2023 • 1
Do Membership Inference Attacks Work on Large Language Models? Paper • 2402.07841 • Published Feb 12, 2024
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs Paper • 2403.04801 • Published Mar 5, 2024
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text Paper • 2410.04265 • Published Oct 5, 2024
Breaking News: Case Studies of Generative AI's Use in Journalism Paper • 2406.13706 • Published Jun 19, 2024
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions Paper • 2409.16427 • Published Sep 24, 2024 • 1
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models Paper • 2503.12072 • Published Mar 15
Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models Paper • 2505.18773 • Published May 24 • 7
Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation Paper • 2507.17937 • Published Jul 23 • 1
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage Paper • 2508.09603 • Published Aug 13 • 2
Harnessing Optimization Dynamics for Curvature-Informed Model Merging Paper • 2509.11167 • Published Sep 14 • 1
Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases Paper • 2506.17336 • Published Jun 19 • 1
Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability Paper • 2510.06084 • Published Oct 7
Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs Paper • 2511.05933 • Published 28 days ago • 7
Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs Paper • 2511.05933 • Published 28 days ago • 7 • 2
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Paper • 2406.18510 • Published Jun 26, 2024 • 10