← Back to search
paper reviewed open access llmsec-2024-00061

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety

Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, Weiyan Shi

2024 — ACL 2024 85 citations

Abstract

Applies social science persuasion techniques to jailbreak LLMs, showing high attack success rates using persuasion taxonomy.

Categories

Tags

persuasionsocial-sciencehuman-like

Framework Mappings

OWASP LLM: LLM01 MITRE ATLAS: AML.T0054

Cite This Resource

@article{llmsec202400061,
  title = {How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety},
  author = {Yi Zeng and Hongpeng Lin and Jingwen Zhang and Diyi Yang and Ruoxi Jia and Weiyan Shi},
  year = {2024},
  journal = {ACL 2024},
  url = {https://arxiv.org/abs/2401.06373},
}

Metadata

Added
2026-04-14
Added by
manual
Source
manual
arxiv_id
2401.06373