← Back to search
paper reviewed open access llmsec-2025-00026
StrongREJECT: A Comprehensive Evaluation of LLM Safety Refusal Behaviors
Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer
2024 — arXiv preprint 65 citations
Abstract
Introduces StrongREJECT, a high-quality evaluation benchmark for measuring how well LLMs refuse harmful requests.
Framework Mappings
NIST AI RMF: MEASURE
Cite This Resource
@article{llmsec202500026,
title = {StrongREJECT: A Comprehensive Evaluation of LLM Safety Refusal Behaviors},
author = {Alexandra Souly and Qingyuan Lu and Dillon Bowen and Tu Trinh and Elvis Hsieh and Sana Pandey and Pieter Abbeel and Justin Svegliato and Scott Emmons and Olivia Watkins and Sam Toyer},
year = {2024},
journal = {arXiv preprint},
url = {https://arxiv.org/abs/2402.10260},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2402.10260