← Back to search
paper reviewed open access llmsec-2024-00002

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson

2023-07 — arXiv preprint 890 citations

Abstract

Proposes an automated method (GCG) to generate adversarial suffixes that cause aligned LLMs to produce harmful content, with attacks transferring across models including ChatGPT and Claude.

Categories

Tags

GCGadversarial-suffixtransferability

Framework Mappings

OWASP LLM: LLM01 MITRE ATLAS: AML.T0043 MITRE ATLAS: AML.T0054

Cite This Resource

@article{llmsec202400002,
  title = {Universal and Transferable Adversarial Attacks on Aligned Language Models},
  author = {Andy Zou and Zifan Wang and Nicholas Carlini and Milad Nasr and J. Zico Kolter and Matt Fredrikson},
  year = {2023},
  journal = {arXiv preprint},
  url = {https://arxiv.org/abs/2307.15043},
}

Metadata

Added
2026-04-14
Added by
manual
Source
manual
arxiv_id
2307.15043