← Back to search
paper reviewed open access llmsec-2025-00019

Are Aligned Neural Networks Adversarially Aligned?

Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, Ludwig Schmidt

2024 — NeurIPS 2023 180 citations

Abstract

Evaluates whether multimodal LLMs aligned to refuse harmful text requests also refuse harmful image-based requests, finding significant gaps.

Categories

Tags

multimodalalignment-gapimage-attacks

Framework Mappings

OWASP LLM: LLM01

Cite This Resource

@article{llmsec202500019,
  title = {Are Aligned Neural Networks Adversarially Aligned?},
  author = {Nicholas Carlini and Milad Nasr and Christopher A. Choquette-Choo and Matthew Jagielski and Irena Gao and Anas Awadalla and Pang Wei Koh and Daphne Ippolito and Katherine Lee and Florian Tramer and Ludwig Schmidt},
  year = {2024},
  journal = {NeurIPS 2023},
  url = {https://arxiv.org/abs/2306.15447},
}

Metadata

Added
2026-04-14
Added by
manual
Source
manual
arxiv_id
2306.15447