← Back to search
paper reviewed open access llmsec-2024-00030
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell
2024-01 — ICLR 2024 95 citations
Abstract
Uses data from an online game (Tensor Trust) where players compete to craft prompt injections and defenses, creating a large dataset of human-generated attacks.
Framework Mappings
OWASP LLM: LLM01 MITRE ATLAS: AML.T0051
Cite This Resource
@article{llmsec202400030,
title = {Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game},
author = {Sam Toyer and Olivia Watkins and Ethan Adrian Mendes and Justin Svegliato and Luke Bailey and Tiffany Wang and Isaac Ong and Karim Elmaaroufi and Pieter Abbeel and Trevor Darrell and Alan Ritter and Stuart Russell},
year = {2024},
journal = {ICLR 2024},
url = {https://arxiv.org/abs/2311.01011},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2311.01011