← Back to search
paper reviewed open access llmsec-2023-00004

Scalable Extraction of Training Data from (Production) Language Models

Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramer, Katherine Lee

2023-11 — arXiv preprint 320 citations

Abstract

Develops a scalable attack to extract over a gigabyte of training data from semi-open and closed models including ChatGPT, at a cost of roughly $200.

Categories

Tags

training-data-extractionChatGPTproduction-model

Framework Mappings

OWASP LLM: LLM02 MITRE ATLAS: AML.T0024 MITRE ATLAS: AML.T0056

Cite This Resource

@article{llmsec202300004,
  title = {Scalable Extraction of Training Data from (Production) Language Models},
  author = {Milad Nasr and Nicholas Carlini and Jonathan Hayase and Matthew Jagielski and A. Feder Cooper and Daphne Ippolito and Christopher A. Choquette-Choo and Eric Wallace and Florian Tramer and Katherine Lee},
  year = {2023},
  journal = {arXiv preprint},
  url = {https://arxiv.org/abs/2311.17035},
}

Metadata

Added
2026-04-14
Added by
manual
Source
manual
arxiv_id
2311.17035