paper reviewed open access llmsec-2023-00003

Poisoning Language Models During Instruction Tuning

Alexander Wan, Eric Wallace, Sheng Shen, Dan Klein

2023-05 — ICML 2023 210 citations

Abstract

Shows that adversaries can insert poisoned examples into instruction-tuning datasets, causing models to generate targeted outputs for attacker-chosen triggers.

Framework Mappings

OWASP LLM: LLM04 MITRE ATLAS: AML.T0020

Cite This Resource

@article{llmsec202300003,
  title = {Poisoning Language Models During Instruction Tuning},
  author = {Alexander Wan and Eric Wallace and Sheng Shen and Dan Klein},
  year = {2023},
  journal = {ICML 2023},
  url = {https://arxiv.org/abs/2305.00944},
}

Metadata

Added: 2026-04-14
Added by: manual
Source: manual
arxiv_id: 2305.00944

Poisoning Language Models During Instruction Tuning

Abstract

Categories

Tags

Framework Mappings

Cite This Resource

Metadata