Agentic Threats
8 resourcesAttacks & Threats
Tool misuse, autonomous harm, and agent-specific attack vectors
LLM Agents Can Autonomously Hack Websites
Richard Fang, Rohan Bindu, Akul Gupta + 2 more — arXiv preprint
Demonstrates that LLM agents can autonomously perform web hacking tasks including SQL injection, XSS, and CSRF attacks without human guidance.
LLM Agents Can Autonomously Exploit One-day Vulnerabilities
Richard Fang, Rohan Bindu, Akul Gupta + 1 more — arXiv preprint
Shows that LLM agents (GPT-4) can autonomously exploit real-world one-day vulnerabilities given CVE descriptions, achieving 87% success rate.
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Edoardo Debenedetti, Jie Zhang, Mislav Balunovic + 3 more — arXiv preprint
Introduces AgentDojo, a framework for evaluating the security of LLM agents against prompt injection and other attacks in realistic tool-use scenarios.
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Tongxin Yuan, Zhiwei He, Lingzhong Dong + 9 more — EMNLP 2024
Introduces R-Judge benchmark for evaluating whether LLM agents can identify safety risks in agentic scenarios involving tool use and multi-step reasoning.
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies
Feng He, Tianqing Zhu, Dayong Ye + 3 more — arXiv preprint
Surveys security and privacy challenges specific to LLM-based agents, covering agent architectures, attack surfaces, and defense mechanisms.
ConfusedPilot: Confused Deputy Attacks Against RAG-based Code Assistants
Andrew Patel, Hossein Aboutorab, Ilia Kolochenko — arXiv preprint
Introduces confused deputy attacks against RAG-based code assistants like GitHub Copilot, where poisoned code repositories manipulate assistant outputs.
Adversarial Attacks on Multimodal Agents
Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov + 2 more — arXiv preprint
Demonstrates adversarial attacks on multimodal agents that take actions in digital environments, showing visual perturbations can hijack agent behavior.
Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra + 3 more — AISec 2023
Introduces indirect prompt injection attacks against LLM-integrated applications, demonstrating how adversaries can remotely control LLMs by injecting prompts into data sources the LLM retrieves.