상세 컨텐츠

본문 제목

PEFT techniques 1: LoRA

Generative AI with Large Language Models

by Taeyoon.Kim.DS 2023. 8. 23. 00:25

본문

https://www.coursera.org/learn/generative-ai-with-llms/lecture/NZOVw/peft-techniques-1-lora

 

PEFT techniques 1: LoRA - Week 2 | Coursera

Video created by deeplearning.ai, Amazon Web Services for the course "Generative AI with Large Language Models". Fine-tuning and evaluating large language models

www.coursera.org

 

Low-rank Adaptation (LoRA) is a parameter-efficient fine-tuning technique falling under the re-parameterization category. LoRA aims to reduce the number of parameters required for fine-tuning, addressing memory constraints. It does this by freezing all original model parameters and introducing two low-rank decomposition matrices alongside the original weights. These matrices have smaller dimensions but maintain the product dimensions of the weights they modify. During fine-tuning, these low-rank matrices are trained while keeping the original LLM weights frozen. For inference, they are multiplied to create a matrix that is added to the original weights, updating the model for a specific task. LoRA typically targets the self-attention layers in LLMs, which contain most of the parameters.

A practical example demonstrates that applying LoRA with a rank of eight, for instance, reduces the trainable parameters from 32,768 to 4,608, an 86% reduction. This memory-efficient method allows fine-tuning on a single GPU and easy switching between tasks by swapping out LoRA matrices. The small size of LoRA matrices enables training for numerous tasks without excessive storage requirements.

LoRA fine-tuned models show competitive performance compared to fully fine-tuned models, with only a slight reduction in scores. The choice of LoRA rank, typically between 4-32, is critical, as lower ranks reduce parameters but maintaining performance may require a balance.

 

저랭크 적응 (Low-rank Adaptation 또는 LoRA)은 재매개 변수화 범주에 속하는 매개 변수 효율적 미세 조정 기술입니다. LoRA는 미세 조정에 필요한 매개 변수 수를 줄이는 것을 목표로 하며 메모리 제약을 해결합니다. 이를 위해 모든 원본 모델 매개 변수를 고정시키고 원래 가중치 옆에 두 개의 저랭크 분해 행렬을 도입합니다. 이러한 행렬은 더 작은 차원을 가지지만 수정하려는 가중치의 곱 차원을 유지합니다. 미세 조정 중에는 이러한 저랭크 행렬이 원본 LLM 가중치를 고정 상태로 유지하면서 훈련됩니다. 추론 단계에서 이러한 저랭크 행렬을 곱하여 원래 가중치에 추가하여 특정 작업에 대한 모델을 업데이트합니다. LoRA는 주로 LLM의 대부분 매개 변수가 포함된 자기 주의 계층을 대상으로 합니다.

실제 예제를 통해 랭크가 여덟인 LoRA를 적용하면 훈련 가능한 매개 변수가 32,768에서 4,608로 86% 감소합니다. 이 메모리 효율적인 방법은 하나의 GPU에서 미세 조정을 허용하고 LoRA 행렬을 교체하여 작업 간에 쉽게 전환할 수 있습니다. LoRA 행렬의 작은 크기로 인해 너무 많은 저장 공간이 필요하지 않은 상태에서 많은 작업에 대한 교육이 가능합니다.

LoRA로 미세 조정된 모델은 완전히 미세 조정된 모델과 경쟁력 있는 성능을 보이며 점수는 약간 감소합니다. 일반적으로 4에서 32 사이의 랭크 선택이 중요하며, 더 낮은 랭크는 매개 변수를 줄이지만 성능 유지에 균형이 필요할 수 있습니다.

'Generative AI with Large Language Models' 카테고리의 다른 글

Lab 2 walkthrough  (0) 2023.08.23
PEFT techniques 2: Soft prompts  (0) 2023.08.23
Parameter efficient fine-tuning (PEFT)  (0) 2023.08.23
Model evaluation  (0) 2023.08.22
Multi-task instruction fine-tuning  (0) 2023.08.22

관련글 더보기