← Back to search
paper reviewed open access llmsec-2024-00030

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell

2024-01 — ICLR 2024 95 citations

Abstract

Uses data from an online game (Tensor Trust) where players compete to craft prompt injections and defenses, creating a large dataset of human-generated attacks.

Categories

Tags

game-baseddatasetinterpretable

Framework Mappings

OWASP LLM: LLM01 MITRE ATLAS: AML.T0051

Cite This Resource

@article{llmsec202400030,
  title = {Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game},
  author = {Sam Toyer and Olivia Watkins and Ethan Adrian Mendes and Justin Svegliato and Luke Bailey and Tiffany Wang and Isaac Ong and Karim Elmaaroufi and Pieter Abbeel and Trevor Darrell and Alan Ritter and Stuart Russell},
  year = {2024},
  journal = {ICLR 2024},
  url = {https://arxiv.org/abs/2311.01011},
}

Metadata

Added
2026-04-14
Added by
manual
Source
manual
arxiv_id
2311.01011