← Back to search
paper reviewed open access llmsec-2025-00026

StrongREJECT: A Comprehensive Evaluation of LLM Safety Refusal Behaviors

Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

2024 — arXiv preprint 65 citations

Abstract

Introduces StrongREJECT, a high-quality evaluation benchmark for measuring how well LLMs refuse harmful requests.

Categories

Tags

evaluationrefusalbenchmarkreliability

Framework Mappings

NIST AI RMF: MEASURE

Cite This Resource

@article{llmsec202500026,
  title = {StrongREJECT: A Comprehensive Evaluation of LLM Safety Refusal Behaviors},
  author = {Alexandra Souly and Qingyuan Lu and Dillon Bowen and Tu Trinh and Elvis Hsieh and Sana Pandey and Pieter Abbeel and Justin Svegliato and Scott Emmons and Olivia Watkins and Sam Toyer},
  year = {2024},
  journal = {arXiv preprint},
  url = {https://arxiv.org/abs/2402.10260},
}

Metadata

Added
2026-04-14
Added by
manual
Source
manual
arxiv_id
2402.10260