← Back to search

paper reviewed open access llmsec-2024-00033

TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models

Jiaqi Xue, Mengxin Zheng, Ting Hua, Yilin Shen, Yepeng Liu, Ladislau Boloni, Qian Lou

2024-05 — NeurIPS 2023 85 citations

View Resource PDF

Abstract

Proposes TrojLLM, a black-box attack that generates universal trojan prompts to compromise LLMs without access to model internals.

Categories

data poisoning adversarial examples supply chain attacks

Tags

trojanbackdoorblack-box

Framework Mappings

OWASP LLM: LLM03 OWASP LLM: LLM04 MITRE ATLAS: AML.T0018 MITRE ATLAS: AML.T0043

Cite This Resource

@article{llmsec202400033,
  title = {TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models},
  author = {Jiaqi Xue and Mengxin Zheng and Ting Hua and Yilin Shen and Yepeng Liu and Ladislau Boloni and Qian Lou},
  year = {2024},
  journal = {NeurIPS 2023},
  url = {https://arxiv.org/abs/2306.06815},
}

Metadata

Added: 2026-04-14
Added by: manual
Source: manual
arxiv_id: 2306.06815