← Back to search
paper reviewed open access llmsec-2024-00034
LoRA Fine-Tuning Efficiently Undoes Safety Training in Llama 2-Chat
Simon Lermen, Charlie Rogers-Smith, Jeffrey Ladish
2023-10 — arXiv preprint 190 citations
Abstract
Shows that LoRA fine-tuning with as few as 100 examples can remove safety guardrails from Llama 2-Chat, raising concerns about fine-tuning access to aligned models.
Framework Mappings
OWASP LLM: LLM04 MITRE ATLAS: AML.T0018
Cite This Resource
@article{llmsec202400034,
title = {LoRA Fine-Tuning Efficiently Undoes Safety Training in Llama 2-Chat},
author = {Simon Lermen and Charlie Rogers-Smith and Jeffrey Ladish},
year = {2023},
journal = {arXiv preprint},
url = {https://arxiv.org/abs/2310.20624},
} Metadata
- Added
- 2026-04-14
- Added by
- manual
- Source
- manual
- arxiv_id
- 2310.20624