Prompt Injection

13 resources

Attacks & Threats

Direct, indirect, and multi-turn prompt injection attacks

paper reviewed open access 2024

Prompt Injection Attack Against LLM-Integrated Applications

Yi Liu, Gelei Deng, Yuekang Li + 6 more — ACM Computing Surveys

First comprehensive survey of prompt injection attacks against LLM-integrated applications, categorizing attacks and defenses with a unified framework.

prompt injection survey 280 citations

paper reviewed open access 2024

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Eric Wallace, Kai Xiao, Reimar Leike + 3 more — arXiv preprint

Proposes an instruction hierarchy for training LLMs to prioritize system prompts over user prompts over third-party content, as a defense against prompt injection.

prompt injection input filtering guardrails 150 citations

paper reviewed open access 2024

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Sam Toyer, Olivia Watkins, Ethan Adrian Mendes + 9 more — ICLR 2024

Uses data from an online game (Tensor Trust) where players compete to craft prompt injections and defenses, creating a large dataset of human-generated attacks.

prompt injection benchmarks 95 citations

paper reviewed open access 2024

AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic + 3 more — arXiv preprint

Introduces AgentDojo, a framework for evaluating the security of LLM agents against prompt injection and other attacks in realistic tool-use scenarios.

agentic threats prompt injection benchmarks tool use security 75 citations

paper reviewed open access 2024

From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?

Rodrigo Pedro, Daniel Castro, Paolo Molina + 1 more — USENIX Security 2024

Demonstrates how prompt injection can be chained with traditional web attacks (SQL injection, XSS) in LLM-integrated applications.

prompt injection model serving security 70 citations

paper reviewed open access 2024

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying + 1 more — ACL 2024 Findings

Presents InjecAgent, a benchmark for evaluating indirect prompt injection attacks against LLM agents that use tools, showing most agents are highly vulnerable.

prompt injection tool use security benchmarks 65 citations

paper reviewed open access 2024

Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models

Jingwei Yi, Yueqi Xie, Bin Zhu + 5 more — arXiv preprint

Provides a benchmark for indirect prompt injection attacks and evaluates several defense strategies including perplexity-based detection and sandwich defense.

prompt injection input filtering benchmarks 60 citations

paper reviewed open access 2024

Securing LLM Systems Against Prompt Injection

Yupei Liu, Yuqi Jia, Runpeng Geng + 2 more — arXiv preprint

Proposes defense mechanisms against prompt injection in LLM systems including isolation-based approaches, input/output filtering, and detection methods.

prompt injection input filtering sandboxing isolation 50 citations

paper reviewed open access 2024

GPT in Sheep's Clothing: The Risk of Customized GPTs

Tao Qin, Zhen Li, Wenxin Mao + 1 more — arXiv preprint

Analyzes security risks of custom GPTs in the OpenAI GPT Store including prompt leakage, data exfiltration, and malicious GPTs.

supply chain attacks prompt injection model serving security 45 citations

tool reviewed open access 2024

Vigil: LLM Prompt Injection Detection and Defense Toolkit

DeadBits — GitHub

Open-source scanner for detecting prompt injections using vector similarity, YARA rules, text classifiers, and canary tokens.

input filtering prompt injection monitoring detection

paper reviewed open access 2023

Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra + 3 more — AISec 2023

Introduces indirect prompt injection attacks against LLM-integrated applications, demonstrating how adversaries can remotely control LLMs by injecting prompts into data sources the LLM retrieves.

prompt injection agentic threats 450 citations

paper reviewed open access 2023

Ignore This Title and HackAPrompt: Exposing Systemic Weaknesses of LLMs through a Global Scale Prompt Hacking Competition

Sander Schulhoff, Jeremy Pinto, Anaum Khan + 7 more — EMNLP 2023

Presents results from a global prompt hacking competition with 600K+ adversarial prompts, revealing systemic LLM vulnerabilities across multiple models and defense strategies.

prompt injection jailbreaking benchmarks 180 citations

tool reviewed open access 2023

Rebuff: Self-Hardening Prompt Injection Detector

Protect AI — GitHub

Open-source tool designed to detect and prevent prompt injection attacks using multiple detection methods including heuristics, LLM-based analysis, and canary tokens.

input filtering prompt injection