← Back to all categories

Prompt Injection

13 resources

Attacks & Threats

Direct, indirect, and multi-turn prompt injection attacks

paper reviewed open access 2024

Prompt Injection Attack Against LLM-Integrated Applications

Yi Liu, Gelei Deng, Yuekang Li + 6 more — ACM Computing Surveys

First comprehensive survey of prompt injection attacks against LLM-integrated applications, categorizing attacks and defenses with a unified framework.

paper reviewed open access 2024

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Eric Wallace, Kai Xiao, Reimar Leike + 3 more — arXiv preprint

Proposes an instruction hierarchy for training LLMs to prioritize system prompts over user prompts over third-party content, as a defense against prompt injection.

paper reviewed open access 2024

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Sam Toyer, Olivia Watkins, Ethan Adrian Mendes + 9 more — ICLR 2024

Uses data from an online game (Tensor Trust) where players compete to craft prompt injections and defenses, creating a large dataset of human-generated attacks.

paper reviewed open access 2024

AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic + 3 more — arXiv preprint

Introduces AgentDojo, a framework for evaluating the security of LLM agents against prompt injection and other attacks in realistic tool-use scenarios.

paper reviewed open access 2024

From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?

Rodrigo Pedro, Daniel Castro, Paolo Molina + 1 more — USENIX Security 2024

Demonstrates how prompt injection can be chained with traditional web attacks (SQL injection, XSS) in LLM-integrated applications.

paper reviewed open access 2024

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying + 1 more — ACL 2024 Findings

Presents InjecAgent, a benchmark for evaluating indirect prompt injection attacks against LLM agents that use tools, showing most agents are highly vulnerable.

paper reviewed open access 2024

Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models

Jingwei Yi, Yueqi Xie, Bin Zhu + 5 more — arXiv preprint

Provides a benchmark for indirect prompt injection attacks and evaluates several defense strategies including perplexity-based detection and sandwich defense.

paper reviewed open access 2024

Securing LLM Systems Against Prompt Injection

Yupei Liu, Yuqi Jia, Runpeng Geng + 2 more — arXiv preprint

Proposes defense mechanisms against prompt injection in LLM systems including isolation-based approaches, input/output filtering, and detection methods.

paper reviewed open access 2024

GPT in Sheep's Clothing: The Risk of Customized GPTs

Tao Qin, Zhen Li, Wenxin Mao + 1 more — arXiv preprint

Analyzes security risks of custom GPTs in the OpenAI GPT Store including prompt leakage, data exfiltration, and malicious GPTs.

tool reviewed open access 2024

Vigil: LLM Prompt Injection Detection and Defense Toolkit

DeadBits — GitHub

Open-source scanner for detecting prompt injections using vector similarity, YARA rules, text classifiers, and canary tokens.

paper reviewed open access 2023

Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra + 3 more — AISec 2023

Introduces indirect prompt injection attacks against LLM-integrated applications, demonstrating how adversaries can remotely control LLMs by injecting prompts into data sources the LLM retrieves.

paper reviewed open access 2023

Ignore This Title and HackAPrompt: Exposing Systemic Weaknesses of LLMs through a Global Scale Prompt Hacking Competition

Sander Schulhoff, Jeremy Pinto, Anaum Khan + 7 more — EMNLP 2023

Presents results from a global prompt hacking competition with 600K+ adversarial prompts, revealing systemic LLM vulnerabilities across multiple models and defense strategies.

tool reviewed open access 2023

Rebuff: Self-Hardening Prompt Injection Detector

Protect AI — GitHub

Open-source tool designed to detect and prevent prompt injection attacks using multiple detection methods including heuristics, LLM-based analysis, and canary tokens.