← Back to search
dataset reviewed open access llmsec-2025-00004

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks

2024-02 — ICML 2024 180 citations

Abstract

Introduces HarmBench, a standardized framework for evaluating automated red teaming methods and robust refusal in LLMs with a comprehensive behavior taxonomy.

Categories

Tags

benchmarkred-teamingevaluation-frameworkdataset

Framework Mappings

OWASP LLM: LLM01 NIST AI RMF: MEASURE

Cite This Resource

@article{llmsec202500004,
  title = {HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal},
  author = {Mantas Mazeika and Long Phan and Xuwang Yin and Andy Zou and Zifan Wang and Norman Mu and Elham Sakhaee and Nathaniel Li and Steven Basart and Bo Li and David Forsyth and Dan Hendrycks},
  year = {2024},
  journal = {ICML 2024},
  url = {https://arxiv.org/abs/2402.04249},
}

Metadata

Added
2026-04-14
Added by
manual
Source
manual
arxiv_id
2402.04249