paper reviewed open access llmsec-2024-00051

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer

2024 — NeurIPS 2023 520 citations

View Resource PDF DOI

Abstract

Comprehensive trustworthiness evaluation of GPT models across 8 dimensions including toxicity, bias, robustness, privacy, fairness, and machine ethics.

Framework Mappings

NIST AI RMF: MEASURE NIST AI RMF: MAP

Cite This Resource

@article{llmsec202400051,
  title = {DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models},
  author = {Boxin Wang and Weixin Chen and Hengzhi Pei and Chulin Xie and Mintong Kang and Chenhui Zhang and Chejian Xu and Zidi Xiong and Ritik Dutta and Rylan Schaeffer},
  year = {2024},
  journal = {NeurIPS 2023},
  doi = {10.52202/075280-1361},
  url = {https://arxiv.org/abs/2306.11698},
}

Metadata

Added: 2026-04-14
Added by: manual
Source: manual
arxiv_id: 2306.11698
doi: 10.52202/075280-1361

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Abstract

Categories

Tags

Framework Mappings

Cite This Resource

Metadata