paper reviewed open access llmsec-2024-00030

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell

2024-01 — ICLR 2024 95 citations

View Resource PDF

Abstract

Uses data from an online game (Tensor Trust) where players compete to craft prompt injections and defenses, creating a large dataset of human-generated attacks.

Framework Mappings

OWASP LLM: LLM01 MITRE ATLAS: AML.T0051

Cite This Resource

@article{llmsec202400030,
  title = {Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game},
  author = {Sam Toyer and Olivia Watkins and Ethan Adrian Mendes and Justin Svegliato and Luke Bailey and Tiffany Wang and Isaac Ong and Karim Elmaaroufi and Pieter Abbeel and Trevor Darrell and Alan Ritter and Stuart Russell},
  year = {2024},
  journal = {ICLR 2024},
  url = {https://arxiv.org/abs/2311.01011},
}

Metadata

Added: 2026-04-14
Added by: manual
Source: manual
arxiv_id: 2311.01011

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Abstract

Categories

Tags

Framework Mappings

Cite This Resource

Metadata