← Back to search
paper reviewed open access llmsec-2021-00001

Extracting Training Data from Large Language Models

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel

2021-06 — USENIX Security 2021 1200 citations

Abstract

Demonstrates that large language models memorize and can be prompted to emit verbatim training data, including PII, revealing significant privacy risks.

Categories

Tags

memorizationtraining-data-extractionprivacy

Framework Mappings

OWASP LLM: LLM02 MITRE ATLAS: AML.T0024 MITRE ATLAS: AML.T0056

Cite This Resource

@article{llmsec202100001,
  title = {Extracting Training Data from Large Language Models},
  author = {Nicholas Carlini and Florian Tramer and Eric Wallace and Matthew Jagielski and Ariel Herbert-Voss and Katherine Lee and Adam Roberts and Tom Brown and Dawn Song and Ulfar Erlingsson and Alina Oprea and Colin Raffel},
  year = {2021},
  journal = {USENIX Security 2021},
  doi = {10.5555/3489212.3489351},
  url = {https://arxiv.org/abs/2012.07805},
}

Metadata

Added
2026-04-14
Added by
manual
Source
manual
arxiv_id
2012.07805