← Back to search
paper reviewed open access llmsec-2024-00051
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer
2024 — NeurIPS 2023 520 citations
Abstract
Comprehensive trustworthiness evaluation of GPT models across 8 dimensions including toxicity, bias, robustness, privacy, fairness, and machine ethics.
Framework Mappings
NIST AI RMF: MEASURE NIST AI RMF: MAP
Cite This Resource
@article{llmsec202400051,
title = {DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models},
author = {Boxin Wang and Weixin Chen and Hengzhi Pei and Chulin Xie and Mintong Kang and Chenhui Zhang and Chejian Xu and Zidi Xiong and Ritik Dutta and Rylan Schaeffer},
year = {2024},
journal = {NeurIPS 2023},
url = {https://arxiv.org/abs/2306.11698},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2306.11698