← Back to all categories

Agentic Threats

8 resources

Attacks & Threats

Tool misuse, autonomous harm, and agent-specific attack vectors

paper reviewed open access 2024

LLM Agents Can Autonomously Hack Websites

Richard Fang, Rohan Bindu, Akul Gupta + 2 more — arXiv preprint

Demonstrates that LLM agents can autonomously perform web hacking tasks including SQL injection, XSS, and CSRF attacks without human guidance.

paper reviewed open access 2024

LLM Agents Can Autonomously Exploit One-day Vulnerabilities

Richard Fang, Rohan Bindu, Akul Gupta + 1 more — arXiv preprint

Shows that LLM agents (GPT-4) can autonomously exploit real-world one-day vulnerabilities given CVE descriptions, achieving 87% success rate.

paper reviewed open access 2024

AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic + 3 more — arXiv preprint

Introduces AgentDojo, a framework for evaluating the security of LLM agents against prompt injection and other attacks in realistic tool-use scenarios.

paper reviewed open access 2024

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

Tongxin Yuan, Zhiwei He, Lingzhong Dong + 9 more — EMNLP 2024

Introduces R-Judge benchmark for evaluating whether LLM agents can identify safety risks in agentic scenarios involving tool use and multi-step reasoning.

paper reviewed open access 2024

The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies

Feng He, Tianqing Zhu, Dayong Ye + 3 more — arXiv preprint

Surveys security and privacy challenges specific to LLM-based agents, covering agent architectures, attack surfaces, and defense mechanisms.

paper reviewed open access 2024

ConfusedPilot: Confused Deputy Attacks Against RAG-based Code Assistants

Andrew Patel, Hossein Aboutorab, Ilia Kolochenko — arXiv preprint

Introduces confused deputy attacks against RAG-based code assistants like GitHub Copilot, where poisoned code repositories manipulate assistant outputs.

paper reviewed open access 2024

Adversarial Attacks on Multimodal Agents

Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov + 2 more — arXiv preprint

Demonstrates adversarial attacks on multimodal agents that take actions in digital environments, showing visual perturbations can hijack agent behavior.

paper reviewed open access 2023

Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra + 3 more — AISec 2023

Introduces indirect prompt injection attacks against LLM-integrated applications, demonstrating how adversaries can remotely control LLMs by injecting prompts into data sources the LLM retrieves.