paper reviewed open access llmsec-2023-00004

Scalable Extraction of Training Data from (Production) Language Models

Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramer, Katherine Lee

2023-11 — arXiv preprint 320 citations

View Resource PDF

Abstract

Develops a scalable attack to extract over a gigabyte of training data from semi-open and closed models including ChatGPT, at a cost of roughly $200.

Framework Mappings

OWASP LLM: LLM02 MITRE ATLAS: AML.T0024 MITRE ATLAS: AML.T0056

Cite This Resource

@article{llmsec202300004,
  title = {Scalable Extraction of Training Data from (Production) Language Models},
  author = {Milad Nasr and Nicholas Carlini and Jonathan Hayase and Matthew Jagielski and A. Feder Cooper and Daphne Ippolito and Christopher A. Choquette-Choo and Eric Wallace and Florian Tramer and Katherine Lee},
  year = {2023},
  journal = {arXiv preprint},
  url = {https://arxiv.org/abs/2311.17035},
}

Metadata

Added: 2026-04-14
Added by: manual
Source: manual
arxiv_id: 2311.17035

Scalable Extraction of Training Data from (Production) Language Models

Abstract

Categories

Tags

Framework Mappings

Cite This Resource

Metadata