← Back to search
paper reviewed open access llmsec-2024-00061
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety
Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, Weiyan Shi
2024 — ACL 2024 85 citations
Abstract
Applies social science persuasion techniques to jailbreak LLMs, showing high attack success rates using persuasion taxonomy.
Framework Mappings
OWASP LLM: LLM01 MITRE ATLAS: AML.T0054
Cite This Resource
@article{llmsec202400061,
title = {How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety},
author = {Yi Zeng and Hongpeng Lin and Jingwen Zhang and Diyi Yang and Ruoxi Jia and Weiyan Shi},
year = {2024},
journal = {ACL 2024},
url = {https://arxiv.org/abs/2401.06373},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2401.06373