Prompt Injection
13 resourcesAttacks & Threats
Direct, indirect, and multi-turn prompt injection attacks
Prompt Injection Attack Against LLM-Integrated Applications
Yi Liu, Gelei Deng, Yuekang Li + 6 more — ACM Computing Surveys
First comprehensive survey of prompt injection attacks against LLM-integrated applications, categorizing attacks and defenses with a unified framework.
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Eric Wallace, Kai Xiao, Reimar Leike + 3 more — arXiv preprint
Proposes an instruction hierarchy for training LLMs to prioritize system prompts over user prompts over third-party content, as a defense against prompt injection.
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer, Olivia Watkins, Ethan Adrian Mendes + 9 more — ICLR 2024
Uses data from an online game (Tensor Trust) where players compete to craft prompt injections and defenses, creating a large dataset of human-generated attacks.
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Edoardo Debenedetti, Jie Zhang, Mislav Balunovic + 3 more — arXiv preprint
Introduces AgentDojo, a framework for evaluating the security of LLM agents against prompt injection and other attacks in realistic tool-use scenarios.
From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?
Rodrigo Pedro, Daniel Castro, Paolo Molina + 1 more — USENIX Security 2024
Demonstrates how prompt injection can be chained with traditional web attacks (SQL injection, XSS) in LLM-integrated applications.
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents
Qiusi Zhan, Zhixiang Liang, Zifan Ying + 1 more — ACL 2024 Findings
Presents InjecAgent, a benchmark for evaluating indirect prompt injection attacks against LLM agents that use tools, showing most agents are highly vulnerable.
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Jingwei Yi, Yueqi Xie, Bin Zhu + 5 more — arXiv preprint
Provides a benchmark for indirect prompt injection attacks and evaluates several defense strategies including perplexity-based detection and sandwich defense.
Securing LLM Systems Against Prompt Injection
Yupei Liu, Yuqi Jia, Runpeng Geng + 2 more — arXiv preprint
Proposes defense mechanisms against prompt injection in LLM systems including isolation-based approaches, input/output filtering, and detection methods.
GPT in Sheep's Clothing: The Risk of Customized GPTs
Tao Qin, Zhen Li, Wenxin Mao + 1 more — arXiv preprint
Analyzes security risks of custom GPTs in the OpenAI GPT Store including prompt leakage, data exfiltration, and malicious GPTs.
Vigil: LLM Prompt Injection Detection and Defense Toolkit
DeadBits — GitHub
Open-source scanner for detecting prompt injections using vector similarity, YARA rules, text classifiers, and canary tokens.
Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra + 3 more — AISec 2023
Introduces indirect prompt injection attacks against LLM-integrated applications, demonstrating how adversaries can remotely control LLMs by injecting prompts into data sources the LLM retrieves.
Ignore This Title and HackAPrompt: Exposing Systemic Weaknesses of LLMs through a Global Scale Prompt Hacking Competition
Sander Schulhoff, Jeremy Pinto, Anaum Khan + 7 more — EMNLP 2023
Presents results from a global prompt hacking competition with 600K+ adversarial prompts, revealing systemic LLM vulnerabilities across multiple models and defense strategies.
Rebuff: Self-Hardening Prompt Injection Detector
Protect AI — GitHub
Open-source tool designed to detect and prevent prompt injection attacks using multiple detection methods including heuristics, LLM-based analysis, and canary tokens.