← Back to search
paper reviewed open access llmsec-2021-00001
Extracting Training Data from Large Language Models
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel
2021-06 — USENIX Security 2021 1200 citations
Abstract
Demonstrates that large language models memorize and can be prompted to emit verbatim training data, including PII, revealing significant privacy risks.
Categories
Tags
memorizationtraining-data-extractionprivacy
Framework Mappings
OWASP LLM: LLM02 MITRE ATLAS: AML.T0024 MITRE ATLAS: AML.T0056
Cite This Resource
@article{llmsec202100001,
title = {Extracting Training Data from Large Language Models},
author = {Nicholas Carlini and Florian Tramer and Eric Wallace and Matthew Jagielski and Ariel Herbert-Voss and Katherine Lee and Adam Roberts and Tom Brown and Dawn Song and Ulfar Erlingsson and Alina Oprea and Colin Raffel},
year = {2021},
journal = {USENIX Security 2021},
doi = {10.5555/3489212.3489351},
url = {https://arxiv.org/abs/2012.07805},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2012.07805