← Back to search
paper reviewed open access llmsec-2023-00004
Scalable Extraction of Training Data from (Production) Language Models
Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramer, Katherine Lee
2023-11 — arXiv preprint 320 citations
Abstract
Develops a scalable attack to extract over a gigabyte of training data from semi-open and closed models including ChatGPT, at a cost of roughly $200.
Categories
Tags
training-data-extractionChatGPTproduction-model
Framework Mappings
OWASP LLM: LLM02 MITRE ATLAS: AML.T0024 MITRE ATLAS: AML.T0056
Cite This Resource
@article{llmsec202300004,
title = {Scalable Extraction of Training Data from (Production) Language Models},
author = {Milad Nasr and Nicholas Carlini and Jonathan Hayase and Matthew Jagielski and A. Feder Cooper and Daphne Ippolito and Christopher A. Choquette-Choo and Eric Wallace and Florian Tramer and Katherine Lee},
year = {2023},
journal = {arXiv preprint},
url = {https://arxiv.org/abs/2311.17035},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2311.17035