paper reviewed open access llmsec-2024-00034

LoRA Fine-Tuning Efficiently Undoes Safety Training in Llama 2-Chat

Simon Lermen, Charlie Rogers-Smith, Jeffrey Ladish

2023-10 — arXiv preprint 190 citations

Abstract

Shows that LoRA fine-tuning with as few as 100 examples can remove safety guardrails from Llama 2-Chat, raising concerns about fine-tuning access to aligned models.

Framework Mappings

OWASP LLM: LLM04 MITRE ATLAS: AML.T0018

Cite This Resource

@article{llmsec202400034,
  title = {LoRA Fine-Tuning Efficiently Undoes Safety Training in Llama 2-Chat},
  author = {Simon Lermen and Charlie Rogers-Smith and Jeffrey Ladish},
  year = {2023},
  journal = {arXiv preprint},
  url = {https://arxiv.org/abs/2310.20624},
}

Metadata

Added: 2026-04-14
Added by: manual
Source: manual
arxiv_id: 2310.20624

LoRA Fine-Tuning Efficiently Undoes Safety Training in Llama 2-Chat

Abstract

Categories

Tags

Framework Mappings

Cite This Resource

Metadata