Transformers: Attention is all you need

Generative AI with Large Language Models

by Taeyoon.Kim.DS 2023. 8. 21. 18:53

https://www.coursera.org/learn/generative-ai-with-llms/supplement/Il7wV/transformers-attention-is-all-you-need

The research paper titled "Attention is All You Need," published in 2017 by Google researchers, introduced the Transformer model, a groundbreaking architecture that transformed the field of natural language processing (NLP). This architecture laid the foundation for modern large language models like GPT and PaLM. Unlike traditional models relying on recurrent neural networks (RNNs) or convolutional neural networks (CNNs), the Transformer model is entirely based on attention mechanisms.

The Transformer model utilizes self-attention to compute representations of input sequences, enabling it to capture long-range dependencies and effectively parallelize computations. The paper demonstrated that the Transformer outperformed previous models, particularly in machine translation tasks.

The Transformer architecture comprises an encoder and a decoder, each consisting of multiple layers. Each layer consists of two sub-layers: multi-head self-attention, allowing the model to focus on different parts of the input sequence, and a feed-forward neural network for individual position processing.

To enhance training and prevent overfitting, the Transformer model incorporates residual connections and layer normalization. Additionally, the authors introduced a positional encoding scheme to indicate the position of each token in the input sequence, enabling the model to understand sequence order without relying on recurrent or convolutional operations.

구글 연구원들이 2017년에 발표한 "Attention is All You Need" 논문은 Transformer 모델을 소개한 연구입니다. 이 모델은 자연어 처리(NLP) 분야를 혁신적으로 변화시킨 혁신적인 아키텍처로, 현재의 대규모 언어 모델인 GPT와 PaLM 등의 기반 역할을 했습니다. 이 논문은 기존의 순환 신경망(RNN)이나 합성곱 신경망(CNN) 대신 완전히 어텐션 기반 메커니즘을 도입한 신경망 아키텍처를 제안했습니다.

Transformer 모델은 입력 시퀀스의 표현을 계산하기 위해 자기 어텐션(self-attention)을 사용하며, 이를 통해 장기 의존성을 포착하고 계산을 효율적으로 병렬화할 수 있습니다. 논문 저자들은 Transformer 모델이 기계 번역 작업에서 이전 모델을 능가하는 성능을 보여주었습니다.

Transformer 아키텍처는 인코더와 디코더로 구성되며, 각각 여러 개의 레이어로 이루어져 있습니다. 각 레이어는 두 개의 하위 레이어로 구성되어 있으며, 다중 헤드 자기 어텐션 메커니즘과 포인트별 완전 연결 레이어인 피드포워드 신경망이 포함되어 있습니다.

Transformer 모델은 훈련을 용이하게 하고 과적합을 방지하기 위해 잔여 연결(residual connections)과 레이어 정규화(layer normalization)도 사용합니다. 또한 논문 저자들은 입력 시퀀스의 각 토큰 위치를 인코딩하는 위치 인코딩 방식을 도입하여 순서를 재현할 수 있게 했으며, 이를 통해 순환 신경망이나 합성곱 연산 없이도 시퀀스 순서를 파악할 수 있게 되었습니다.

저작자표시 비영리 변경금지

'Generative AI with Large Language Models' 카테고리의 다른 글

Generative configuration (0)	2023.08.21
Prompting and prompt engineering (0)	2023.08.21
Generating text with transformers (0)	2023.08.21
Transformers architecture (0)	2023.08.21
Text generation before transformers (0)	2023.08.21