Model Extraction

3 resources

Attacks & Threats

Model stealing, distillation attacks, and weight extraction

paper reviewed open access 2024

Prompt Stealing Attacks Against Text-to-Image Generation Models

Xinyue Shen, Yiting Qu, Michael Backes + 1 more — USENIX Security 2024

Demonstrates attacks that steal the prompts used to generate images from text-to-image models, raising IP and privacy concerns.

model extraction membership inference 110 citations

paper reviewed open access 2024

Stealing Part of a Production Language Model

Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham + 10 more — ICML 2024

Demonstrates that it is possible to steal the embedding projection layer of production LLMs like OpenAI's models through the API, confirming model extraction risks.

model extraction 95 citations

paper reviewed open access 2023

Scalable Extraction of Training Data from (Production) Language Models

Milad Nasr, Nicholas Carlini, Jonathan Hayase + 7 more — arXiv preprint

Develops a scalable attack to extract over a gigabyte of training data from semi-open and closed models including ChatGPT, at a cost of roughly $200.

membership inference model extraction 320 citations