paper reviewed open access llmsec-2025-00026

StrongREJECT: A Comprehensive Evaluation of LLM Safety Refusal Behaviors

Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

2024 — arXiv preprint 65 citations

View Resource PDF

Abstract

Introduces StrongREJECT, a high-quality evaluation benchmark for measuring how well LLMs refuse harmful requests.

Framework Mappings

NIST AI RMF: MEASURE

Cite This Resource

@article{llmsec202500026,
  title = {StrongREJECT: A Comprehensive Evaluation of LLM Safety Refusal Behaviors},
  author = {Alexandra Souly and Qingyuan Lu and Dillon Bowen and Tu Trinh and Elvis Hsieh and Sana Pandey and Pieter Abbeel and Justin Svegliato and Scott Emmons and Olivia Watkins and Sam Toyer},
  year = {2024},
  journal = {arXiv preprint},
  url = {https://arxiv.org/abs/2402.10260},
}

Metadata

Added: 2026-04-14
Added by: manual
Source: manual
arxiv_id: 2402.10260

StrongREJECT: A Comprehensive Evaluation of LLM Safety Refusal Behaviors

Abstract

Categories

Tags

Framework Mappings

Cite This Resource

Metadata