← Back to search
paper reviewed open access llmsec-2025-00019
Are Aligned Neural Networks Adversarially Aligned?
Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, Ludwig Schmidt
2024 — NeurIPS 2023 180 citations
Abstract
Evaluates whether multimodal LLMs aligned to refuse harmful text requests also refuse harmful image-based requests, finding significant gaps.
Framework Mappings
OWASP LLM: LLM01
Cite This Resource
@article{llmsec202500019,
title = {Are Aligned Neural Networks Adversarially Aligned?},
author = {Nicholas Carlini and Milad Nasr and Christopher A. Choquette-Choo and Matthew Jagielski and Irena Gao and Anas Awadalla and Pang Wei Koh and Daphne Ippolito and Katherine Lee and Florian Tramer and Ludwig Schmidt},
year = {2024},
journal = {NeurIPS 2023},
url = {https://arxiv.org/abs/2306.15447},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2306.15447