← Back to search
paper reviewed open access llmsec-2024-00035
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, Danqi Chen
2024-02 — ICLR 2024 110 citations
Abstract
Demonstrates that safety alignment in LLMs is brittle and can be undermined through simple weight pruning or low-rank modifications without any fine-tuning data.
Framework Mappings
OWASP LLM: LLM04 MITRE ATLAS: AML.T0015
Cite This Resource
@article{llmsec202400035,
title = {Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications},
author = {Yangsibo Huang and Samyak Gupta and Mengzhou Xia and Kai Li and Danqi Chen},
year = {2024},
journal = {ICLR 2024},
url = {https://arxiv.org/abs/2402.05162},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2402.05162