← Back to search
paper reviewed open access llmsec-2023-00003
Poisoning Language Models During Instruction Tuning
Alexander Wan, Eric Wallace, Sheng Shen, Dan Klein
2023-05 — ICML 2023 210 citations
Abstract
Shows that adversaries can insert poisoned examples into instruction-tuning datasets, causing models to generate targeted outputs for attacker-chosen triggers.
Framework Mappings
OWASP LLM: LLM04 MITRE ATLAS: AML.T0020
Cite This Resource
@article{llmsec202300003,
title = {Poisoning Language Models During Instruction Tuning},
author = {Alexander Wan and Eric Wallace and Sheng Shen and Dan Klein},
year = {2023},
journal = {ICML 2023},
url = {https://arxiv.org/abs/2305.00944},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2305.00944