Surveys
12 resourcesSurveys & Meta
Literature surveys, systematizations of knowledge, and meta-analyses
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Boxin Wang, Weixin Chen, Hengzhi Pei + 7 more — NeurIPS 2023
Comprehensive trustworthiness evaluation of GPT models across 8 dimensions including toxicity, bias, robustness, privacy, fairness, and machine ethics.
A Survey on Large Language Model (LLM) Security and Privacy: The Good, The Bad, and The Ugly
Yifan Yao, Jinhao Duan, Kaidi Xu + 3 more — High-Confidence Computing
Comprehensive survey covering LLM security and privacy from three perspectives: beneficial applications of LLMs for security, attacks against LLMs, and defensive techniques.
TrustLLM: Trustworthiness in Large Language Models
Lichao Sun, Yue Huang, Haoran Wang + 2 more — ICML 2024
Comprehensive study of LLM trustworthiness across truthfulness, safety, fairness, robustness, privacy, and machine ethics with benchmarks.
Prompt Injection Attack Against LLM-Integrated Applications
Yi Liu, Gelei Deng, Yuekang Li + 6 more — ACM Computing Surveys
First comprehensive survey of prompt injection attacks against LLM-integrated applications, categorizing attacks and defenses with a unified framework.
Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
Leo Schwinn, David Dobre, Stephan Gunnemann + 1 more — arXiv preprint
Systematizes adversarial attacks and defenses for LLMs, connecting them to the classical adversarial ML literature while identifying LLM-specific threats.
A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models
Aysan Esmradi, Daniel Wankit Yip, Chun Fai Chan — arXiv preprint
Surveys attack techniques across the LLM lifecycle including training, fine-tuning, and inference, with comprehensive mitigation strategies.
Machine Unlearning for Large Language Models: A Survey
Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan + 2 more — arXiv preprint
Surveys machine unlearning techniques for LLMs including methods for forgetting specific training data, complying with data deletion requests, and maintaining model utility.
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies
Feng He, Tianqing Zhu, Dayong Ye + 3 more — arXiv preprint
Surveys security and privacy challenges specific to LLM-based agents, covering agent architectures, attack surfaces, and defense mechanisms.
OWASP AI Security and Privacy Guide
Rob van der Veer, OWASP AI Exchange Team — OWASP Foundation
Comprehensive guide for AI security and privacy including threat analysis, controls, and regulatory mapping for AI systems.
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (NIST AI 100-2e2025)
Apostol Vassilev, Alina Oprea, Alie Fordyce + 1 more — NIST
NIST's authoritative taxonomy of adversarial ML attacks and mitigations covering evasion, poisoning, privacy, and abuse attacks against AI systems.
Generative AI Security: Theories and Practices
Ken Huang, Yang Wang, Ben Goertzel + 3 more — Springer
Comprehensive textbook covering generative AI security from foundations to advanced topics including LLM threats, defenses, privacy, and governance.
Identifying and Mitigating the Security Risks of Generative AI
Clark Barrett, Brad Boyd, Elie Burzstein + 20 more — Foundations and Trends in Privacy and Security
Comprehensive treatment of generative AI security risks across the ML lifecycle with a focus on practical mitigations and deployment considerations.