← Back to search
paper reviewed open access llmsec-2024-00002
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson
2023-07 — arXiv preprint 890 citations
Abstract
Proposes an automated method (GCG) to generate adversarial suffixes that cause aligned LLMs to produce harmful content, with attacks transferring across models including ChatGPT and Claude.
Framework Mappings
OWASP LLM: LLM01 MITRE ATLAS: AML.T0043 MITRE ATLAS: AML.T0054
Cite This Resource
@article{llmsec202400002,
title = {Universal and Transferable Adversarial Attacks on Aligned Language Models},
author = {Andy Zou and Zifan Wang and Nicholas Carlini and Milad Nasr and J. Zico Kolter and Matt Fredrikson},
year = {2023},
journal = {arXiv preprint},
url = {https://arxiv.org/abs/2307.15043},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2307.15043