← Back to search
paper reviewed open access llmsec-2024-00051

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer

2024 — NeurIPS 2023 520 citations

Abstract

Comprehensive trustworthiness evaluation of GPT models across 8 dimensions including toxicity, bias, robustness, privacy, fairness, and machine ethics.

Categories

Tags

trustworthinessGPTcomprehensive-evaluation

Framework Mappings

NIST AI RMF: MEASURE NIST AI RMF: MAP

Cite This Resource

@article{llmsec202400051,
  title = {DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models},
  author = {Boxin Wang and Weixin Chen and Hengzhi Pei and Chulin Xie and Mintong Kang and Chenhui Zhang and Chejian Xu and Zidi Xiong and Ritik Dutta and Rylan Schaeffer},
  year = {2024},
  journal = {NeurIPS 2023},
  url = {https://arxiv.org/abs/2306.11698},
}

Metadata

Added
2026-04-14
Added by
manual
Source
manual
arxiv_id
2306.11698