paper reviewed open access llmsec-2024-00048

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

Xiaogeng Liu, Nan Xu, Muhao Chen, Chaowei Xiao

2024 — ICLR 2024 230 citations

Abstract

Proposes AutoDAN, a method for automatically generating stealthy jailbreak prompts that are semantically meaningful and can bypass perplexity-based defenses.

Framework Mappings

OWASP LLM: LLM01 MITRE ATLAS: AML.T0054

Cite This Resource

@article{llmsec202400048,
  title = {AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models},
  author = {Xiaogeng Liu and Nan Xu and Muhao Chen and Chaowei Xiao},
  year = {2024},
  journal = {ICLR 2024},
  url = {https://arxiv.org/abs/2310.04451},
}

Metadata

Added: 2026-04-14
Added by: manual
Source: manual
arxiv_id: 2310.04451

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

Abstract

Categories

Tags

Framework Mappings

Cite This Resource

Metadata