yxwang's picture

2 2

yxwang

yxwang

·

AI & ML interests

None yet

Recent Activity

updated a dataset 12 days ago

yxwang/SafeVid-350K

upvoted a paper 20 days ago

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

authored a paper 2 months ago

Fake Alignment: Are LLMs Really Aligned Well?

View all activity

Organizations

None yet

updated a dataset 12 days ago

yxwang/SafeVid-350K

Preview • Updated 12 days ago • 151 • 5

upvoted a paper 20 days ago

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

Paper • 2511.12710 • Published 23 days ago • 36

authored 12 papers 2 months ago

Fake Alignment: Are LLMs Really Aligned Well?

Paper • 2311.05915 • Published Nov 10, 2023 • 2

Flames: Benchmarking Value Alignment of LLMs in Chinese

Paper • 2311.06899 • Published Nov 12, 2023 • 2

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Paper • 2401.15071 • Published Jan 26, 2024 • 37

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models

Paper • 2406.07594 • Published Jun 11, 2024 • 1

ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

Paper • 2406.14952 • Published Jun 21, 2024

Reflection-Bench: probing AI intelligence with reflection

Paper • 2410.16270 • Published Oct 21, 2024 • 6

Safety at Scale: A Comprehensive Survey of Large Model Safety

Paper • 2502.05206 • Published Feb 2 • 3

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

Paper • 2507.18576 • Published Jul 24 • 8

A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports

Paper • 2510.02190 • Published Oct 2 • 18

The Other Mind: How Language Models Exhibit Human Temporal Cognition

Paper • 2507.15851 • Published Jul 21

A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos

Paper • 2502.15806 • Published Feb 19 • 2

Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?

Paper • 2506.14805 • Published Jun 3 • 3

liked a dataset 2 months ago

EVIGBYEN/RigorousBench

Viewer • Updated Oct 8 • 214 • 111 • 3

upvoted a paper 2 months ago

A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports

Paper • 2510.02190 • Published Oct 2 • 18

liked a dataset 5 months ago

yxwang/SafeVid-350K

Preview • Updated 12 days ago • 151 • 5

published a dataset 7 months ago

yxwang/SafeVid-350K

Preview • Updated 12 days ago • 151 • 5