← Back to search
dataset reviewed open access llmsec-2025-00004
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks
2024-02 — ICML 2024 180 citations
Abstract
Introduces HarmBench, a standardized framework for evaluating automated red teaming methods and robust refusal in LLMs with a comprehensive behavior taxonomy.
Framework Mappings
OWASP LLM: LLM01 NIST AI RMF: MEASURE
Cite This Resource
@article{llmsec202500004,
title = {HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal},
author = {Mantas Mazeika and Long Phan and Xuwang Yin and Andy Zou and Zifan Wang and Norman Mu and Elham Sakhaee and Nathaniel Li and Steven Basart and Bo Li and David Forsyth and Dan Hendrycks},
year = {2024},
journal = {ICML 2024},
url = {https://arxiv.org/abs/2402.04249},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2402.04249