GenAI Security Literature Review

A comprehensive, community-driven, auto-updating database of GenAI and LLM security research, standards, tools, and resources.

Search Resources Browse Categories

100

Total Resources

100

Reviewed

Recent Additions

standard reviewed open access 2025

OWASP Top 10 for Large Language Model Applications

Steve Wilson, OWASP LLM AI Security Team — OWASP Foundation

The definitive OWASP guide identifying the top 10 most critical security risks in LLM applications, with descriptions, examples, and mitigation strategies.

threat modeling risk frameworks

standard reviewed open access 2025

OWASP Top 10 for Agentic AI Applications

OWASP Foundation — OWASP Foundation

Identifies the top 10 security risks specific to agentic AI applications including excessive agency, unsafe tool execution, and inadequate oversight.

threat modeling risk frameworks agent architecture

paper reviewed open access 2024

Jailbroken: How Does LLM Safety Training Fail?

Alexander Wei, Nika Haghtalab, Jacob Steinhardt — NeurIPS 2023

Analyzes failure modes of LLM safety training, identifying two broad categories: competing objectives and mismatched generalization, demonstrating attacks that exploit each.

jailbreaking guardrails 520 citations

paper reviewed open access 2024

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Boxin Wang, Weixin Chen, Hengzhi Pei + 7 more — NeurIPS 2023

Comprehensive trustworthiness evaluation of GPT models across 8 dimensions including toxicity, bias, robustness, privacy, fairness, and machine ethics.

benchmarks responsible ai survey 520 citations

paper reviewed open access 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Evan Hubinger, Carson Denison, Jesse Mu + 27 more — arXiv preprint

Demonstrates that LLMs can be trained with deceptive behaviors (sleeper agents) that persist through standard safety training including RLHF, posing risks for backdoor attacks.

data poisoning guardrails adversarial examples 420 citations

Most Cited

paper reviewed open access 2023

ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu + 4 more — ICLR 2023

Foundational work on the ReAct paradigm for LLM agents that interleave reasoning and tool-use actions, enabling complex task completion with security implications.

agent architecture tool use security 2500 citations

paper reviewed open access 2023

Toolformer: Language Models Can Teach Themselves to Use Tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessi + 6 more — NeurIPS 2023

Demonstrates how LLMs can learn to use external tools (APIs, search engines, calculators) through self-supervised learning, foundational for understanding tool-use security.

tool use security agent architecture 1400 citations

paper reviewed open access 2021

Extracting Training Data from Large Language Models

Nicholas Carlini, Florian Tramer, Eric Wallace + 9 more — USENIX Security 2021

Demonstrates that large language models memorize and can be prompted to emit verbatim training data, including PII, revealing significant privacy risks.

membership inference differential privacy 1200 citations

paper reviewed open access 2022

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu + 44 more — arXiv preprint

Introduces Constitutional AI (CAI), a method for training AI systems to be harmless using a set of principles (a constitution) and AI-generated feedback, reducing reliance on human red teamers.

guardrails responsible ai 1100 citations

paper reviewed open access 2023

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini + 3 more — arXiv preprint

Proposes an automated method (GCG) to generate adversarial suffixes that cause aligned LLMs to produce harmful content, with attacks transferring across models including ChatGPT and Claude.

jailbreaking adversarial examples 890 citations