Tool Use Security
9 resourcesAgentic AI Security
Function calling security, plugin safety, and API tool governance
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
John Yang, Carlos E. Jimenez, Alexander Wettig + 4 more — NeurIPS 2024
Demonstrates autonomous coding agents that interact with computer interfaces to solve software engineering tasks, raising questions about agent containment.
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Edoardo Debenedetti, Jie Zhang, Mislav Balunovic + 3 more — arXiv preprint
Introduces AgentDojo, a framework for evaluating the security of LLM agents against prompt injection and other attacks in realistic tool-use scenarios.
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents
Qiusi Zhan, Zhixiang Liang, Zifan Ying + 1 more — ACL 2024 Findings
Presents InjecAgent, a benchmark for evaluating indirect prompt injection attacks against LLM agents that use tools, showing most agents are highly vulnerable.
Adversarial Attacks on Multimodal Agents
Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov + 2 more — arXiv preprint
Demonstrates adversarial attacks on multimodal agents that take actions in digital environments, showing visual perturbations can hijack agent behavior.
PyRIT: Python Risk Identification Toolkit for Generative AI
Microsoft AI Red Team — GitHub / Microsoft
Microsoft's open-source framework for red teaming generative AI systems, supporting automated prompt generation, attack strategies, and scoring of AI responses.
Model Context Protocol (MCP): Security Considerations and Best Practices
Anthropic — Anthropic Documentation
Documentation and analysis of security considerations for the Model Context Protocol, covering authentication, authorization, and tool sandboxing.
Model Context Protocol (MCP): Specification
Anthropic — Anthropic / GitHub
Open protocol specification for connecting AI models to external data sources and tools, enabling standardized tool use with security considerations.
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu + 4 more — ICLR 2023
Foundational work on the ReAct paradigm for LLM agents that interleave reasoning and tool-use actions, enabling complex task completion with security implications.
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessi + 6 more — NeurIPS 2023
Demonstrates how LLMs can learn to use external tools (APIs, search engines, calculators) through self-supervised learning, foundational for understanding tool-use security.