← Back to search

paper reviewed open access llmsec-2024-00054

PAIR: Prompt Automatic Iterative Refinement for Jailbreaking LLMs

Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

2024 — NeurIPS 2024 220 citations

View Resource PDF

Abstract

Uses an attacker LLM to automatically generate jailbreak prompts through iterative refinement achieving high success with only black-box access.

Categories

jailbreaking red teaming

Tags

automatediterativeblack-box

Framework Mappings

OWASP LLM: LLM01 MITRE ATLAS: AML.T0054

Cite This Resource

@article{llmsec202400054,
  title = {PAIR: Prompt Automatic Iterative Refinement for Jailbreaking LLMs},
  author = {Patrick Chao and Alexander Robey and Edgar Dobriban and Hamed Hassani and George J. Pappas and Eric Wong},
  year = {2024},
  journal = {NeurIPS 2024},
  url = {https://arxiv.org/abs/2310.08419},
}

Metadata

Added: 2026-04-14
Added by: manual
Source: manual
arxiv_id: 2310.08419